Tool-Using Agents
This lesson is part of the AI track. Use it to build practical Python skill for AI engineering, LLM workflows, embeddings, agents, guardrails, and production AI systems.
Why This Matters
Python is useful because it can move across domains without changing languages: local scripts, server runbooks, web APIs, data processing, and AI applications can share the same testing, packaging, logging, and configuration habits.
Tool-Using Agents matters when you need code that is readable enough for a teammate, predictable enough for automation, safe enough for servers, and structured enough for AI or data workflows.
Use this lesson with three operating questions:
- What input does the program trust, validate, or reject?
- What state does the program inspect, transform, or change?
- What output proves the work succeeded or failed?
Core Concepts
| Concept | Practical Meaning |
|---|---|
| Explicit inputs | Arguments, config, environment variables, files, API payloads, prompts, or datasets are declared clearly |
| Bounded work | Network calls, subprocesses, model calls, and loops have limits or timeouts |
| Observable behavior | Logs, metrics, reports, and exit codes make outcomes visible |
| Safe defaults | Dry runs, read-only checks, validation, and small changes reduce blast radius |
| Reusable structure | Functions, modules, tests, and project layout make the code maintainable |
Practical Pattern
Start with a small program shape that can grow without becoming fragile:
from __future__ import annotations
import argparse
import logging
from dataclasses import dataclass
LOG = logging.getLogger(__name__)
@dataclass(frozen=True)
class Settings:
dry_run: bool = False
verbose: bool = False
def parse_args() -> Settings:
parser = argparse.ArgumentParser(description="Design Python agents that call tools, inspect results, and stop safely.")
parser.add_argument("--dry-run", action="store_true", help="show intended work without changing state")
parser.add_argument("--verbose", action="store_true", help="enable debug logging")
args = parser.parse_args()
return Settings(dry_run=args.dry_run, verbose=args.verbose)
def configure_logging(settings: Settings) -> None:
level = logging.DEBUG if settings.verbose else logging.INFO
logging.basicConfig(level=level, format="%(asctime)s %(levelname)s %(message)s")
Domain Applications
| Domain | How This Lesson Applies |
|---|---|
| AI | Prepare structured inputs, call models safely, validate outputs, and record evaluations |
| Automation | Turn repeatable manual work into scripts with arguments, dry runs, and reports |
| Server | Inspect services, files, logs, resources, and deployments with careful error handling |
| Data | Clean records, transform formats, summarize results, and catch invalid values |
| APIs | Send requests with timeouts, handle failures, parse responses, and respect rate limits |
Example Implementation
The example below is intentionally generic. Replace the placeholder logic with the lesson-specific operation, but keep the same safety structure.
from __future__ import annotations
import logging
from pathlib import Path
LOG = logging.getLogger(__name__)
def run_01_tool_using_agents_workflow(target: Path, dry_run: bool = False) -> dict[str, object]:
result: dict[str, object] = {
"target": str(target),
"exists": target.exists(),
"dry_run": dry_run,
"changed": False,
}
LOG.info("planning workflow for %s", target)
if dry_run:
LOG.info("dry run enabled; reporting plan only")
return result
if target.exists():
result["changed"] = False
else:
LOG.warning("target does not exist: %s", target)
return result
AI Considerations
When applying this lesson to AI workflows, pay attention to:
- Prompt and input versioning so results are reproducible.
- Token, latency, and cost limits so automation cannot run uncontrolled.
- Structured output validation before another script or server trusts a model response.
- Redaction of secrets, personal data, logs, and customer content before sending data to a model.
- Human review for high-impact actions such as deleting data, changing infrastructure, or contacting users.
Minimal structured-output validation pattern:
def require_keys(payload: dict[str, object], required: set[str]) -> None:
missing = required - payload.keys()
if missing:
raise ValueError(f"missing required keys: {sorted(missing)}")
Automation Considerations
When applying this lesson to automation workflows, design for repeatability:
- Support
--dry-runfor state-changing scripts. - Produce a summary at the end of each run.
- Separate planning from execution.
- Make expected failures clear, such as missing files or empty result sets.
- Use non-zero exit codes when another scheduler, CI job, or runbook should stop.
Useful result-summary pattern:
summary = {"checked": 0, "changed": 0, "failed": 0}
LOG.info("summary checked=%s changed=%s failed=%s", summary["checked"], summary["changed"], summary["failed"])
Server Considerations
When applying this lesson to servers, assume the environment is different from your terminal:
- Cron and systemd may have a smaller
PATH. - Relative paths may point somewhere unexpected.
- Permissions may differ between users.
- Services may be unhealthy before your script starts.
- Logs and exit codes are often the only evidence available later.
Pre-flight snippet:
import os
import socket
import sys
from pathlib import Path
print(f"host={socket.gethostname()}")
print(f"python={sys.executable}")
print(f"cwd={Path.cwd()}")
print(f"path={os.environ.get('PATH', '')}")
Validation Commands
Run these from the shell after changing or testing a script:
python3 --version
python3 -m py_compile script.py
python3 script.py --dry-run
printf 'exit_status=%s
' "$?"
For server-facing scripts, add service and log checks:
systemctl --failed 2>/dev/null || true
journalctl -p warning -n 50 --no-pager 2>/dev/null || true
For AI-facing scripts, add evaluation or schema checks:
python3 -m pytest tests/ 2>/dev/null || true
python3 scripts/evaluate.py --sample 20 2>/dev/null || true
Common Failure Modes
| Failure | Typical Cause | Prevention |
|---|---|---|
| Works locally, fails in cron | Different environment or relative paths | Use absolute paths and log runtime context |
| API or model call hangs | Missing timeout | Set explicit timeouts and retries |
| Bad model output breaks automation | Unvalidated LLM response | Validate JSON/schema before use |
| Script changes too much | No plan/dry-run stage | Separate planning from execution |
| Secrets appear in logs | Logging full config or environment | Redact sensitive values before logging |
Troubleshooting Flow
- Re-run with
--dry-runand--verboseif available. - Print the Python executable, current directory, and resolved config path.
- Reduce the input to the smallest failing example.
- Check whether the failure is input validation, dependency, network, permission, or service state.
- Add a regression test or evaluation case before fixing the bug.
Debug helper:
def debug_context() -> dict[str, str]:
import os
import sys
from pathlib import Path
return {
"executable": sys.executable,
"cwd": str(Path.cwd()),
"path": os.environ.get("PATH", ""),
}
Practice Lab
Use a disposable project directory:
- Create a script that implements the core pattern from this lesson.
- Add
--dry-runand prove it does not change state. - Add one validation check for input, config, or model/API response.
- Add one log line that would help during an incident.
- Add one test or sample evaluation that catches a realistic failure.
Review Questions
- How does Tool-Using Agents apply differently to AI, automation, and server work?
- What should be validated before another system trusts the output?
- What is the safest default behavior for a production script?
- What logs, metrics, or reports would help a teammate debug this later?
- What would you test before scheduling or deploying this code?
Field Notes
Broad Python skill comes from combining the same engineering habits across domains. AI code still needs timeouts and validation. Automation still needs tests and logs. Server code still benefits from clean functions and data structures.
Prefer boring, explicit, reviewable Python when the code can affect users, infrastructure, cost, data, or security.
What's Next
- Return to the Python course overview.