Tool-Using Agents

Learning Focus

This lesson is part of the AI track. Use it to build practical Python skill for AI engineering, LLM workflows, embeddings, agents, guardrails, and production AI systems.

Why This Matters

Python is useful because it can move across domains without changing languages: local scripts, server runbooks, web APIs, data processing, and AI applications can share the same testing, packaging, logging, and configuration habits.

Tool-Using Agents matters when you need code that is readable enough for a teammate, predictable enough for automation, safe enough for servers, and structured enough for AI or data workflows.

Use this lesson with three operating questions:

What input does the program trust, validate, or reject?
What state does the program inspect, transform, or change?
What output proves the work succeeded or failed?

Core Concepts

Concept	Practical Meaning
Explicit inputs	Arguments, config, environment variables, files, API payloads, prompts, or datasets are declared clearly
Bounded work	Network calls, subprocesses, model calls, and loops have limits or timeouts
Observable behavior	Logs, metrics, reports, and exit codes make outcomes visible
Safe defaults	Dry runs, read-only checks, validation, and small changes reduce blast radius
Reusable structure	Functions, modules, tests, and project layout make the code maintainable

Practical Pattern

Start with a small program shape that can grow without becoming fragile:

from __future__ import annotations

import argparse
import logging
from dataclasses import dataclass


LOG = logging.getLogger(__name__)


@dataclass(frozen=True)
class Settings:
    dry_run: bool = False
    verbose: bool = False


def parse_args() -> Settings:
    parser = argparse.ArgumentParser(description="Design Python agents that call tools, inspect results, and stop safely.")
    parser.add_argument("--dry-run", action="store_true", help="show intended work without changing state")
    parser.add_argument("--verbose", action="store_true", help="enable debug logging")
    args = parser.parse_args()
    return Settings(dry_run=args.dry_run, verbose=args.verbose)


def configure_logging(settings: Settings) -> None:
    level = logging.DEBUG if settings.verbose else logging.INFO
    logging.basicConfig(level=level, format="%(asctime)s %(levelname)s %(message)s")

Domain Applications

Domain	How This Lesson Applies
AI	Prepare structured inputs, call models safely, validate outputs, and record evaluations
Automation	Turn repeatable manual work into scripts with arguments, dry runs, and reports
Server	Inspect services, files, logs, resources, and deployments with careful error handling
Data	Clean records, transform formats, summarize results, and catch invalid values
APIs	Send requests with timeouts, handle failures, parse responses, and respect rate limits

Example Implementation

The example below is intentionally generic. Replace the placeholder logic with the lesson-specific operation, but keep the same safety structure.

from __future__ import annotations

import logging
from pathlib import Path

LOG = logging.getLogger(__name__)


def run_01_tool_using_agents_workflow(target: Path, dry_run: bool = False) -> dict[str, object]:
    result: dict[str, object] = {
        "target": str(target),
        "exists": target.exists(),
        "dry_run": dry_run,
        "changed": False,
    }

    LOG.info("planning workflow for %s", target)

    if dry_run:
        LOG.info("dry run enabled; reporting plan only")
        return result

    if target.exists():
        result["changed"] = False
    else:
        LOG.warning("target does not exist: %s", target)

    return result

AI Considerations

When applying this lesson to AI workflows, pay attention to:

Prompt and input versioning so results are reproducible.
Token, latency, and cost limits so automation cannot run uncontrolled.
Structured output validation before another script or server trusts a model response.
Redaction of secrets, personal data, logs, and customer content before sending data to a model.
Human review for high-impact actions such as deleting data, changing infrastructure, or contacting users.

Minimal structured-output validation pattern:

def require_keys(payload: dict[str, object], required: set[str]) -> None:
    missing = required - payload.keys()
    if missing:
        raise ValueError(f"missing required keys: {sorted(missing)}")

Automation Considerations

When applying this lesson to automation workflows, design for repeatability:

Support --dry-run for state-changing scripts.
Produce a summary at the end of each run.
Separate planning from execution.
Make expected failures clear, such as missing files or empty result sets.
Use non-zero exit codes when another scheduler, CI job, or runbook should stop.

Useful result-summary pattern:

summary = {"checked": 0, "changed": 0, "failed": 0}
LOG.info("summary checked=%s changed=%s failed=%s", summary["checked"], summary["changed"], summary["failed"])

Server Considerations

When applying this lesson to servers, assume the environment is different from your terminal:

Cron and systemd may have a smaller PATH.
Relative paths may point somewhere unexpected.
Permissions may differ between users.
Services may be unhealthy before your script starts.
Logs and exit codes are often the only evidence available later.

Pre-flight snippet:

import os
import socket
import sys
from pathlib import Path

print(f"host={socket.gethostname()}")
print(f"python={sys.executable}")
print(f"cwd={Path.cwd()}")
print(f"path={os.environ.get('PATH', '')}")

Validation Commands

Run these from the shell after changing or testing a script:

python3 --version
python3 -m py_compile script.py
python3 script.py --dry-run
printf 'exit_status=%s
' "$?"

For server-facing scripts, add service and log checks:

systemctl --failed 2>/dev/null || true
journalctl -p warning -n 50 --no-pager 2>/dev/null || true

For AI-facing scripts, add evaluation or schema checks:

python3 -m pytest tests/ 2>/dev/null || true
python3 scripts/evaluate.py --sample 20 2>/dev/null || true

Common Failure Modes

Failure	Typical Cause	Prevention
Works locally, fails in cron	Different environment or relative paths	Use absolute paths and log runtime context
API or model call hangs	Missing timeout	Set explicit timeouts and retries
Bad model output breaks automation	Unvalidated LLM response	Validate JSON/schema before use
Script changes too much	No plan/dry-run stage	Separate planning from execution
Secrets appear in logs	Logging full config or environment	Redact sensitive values before logging

Troubleshooting Flow

Re-run with --dry-run and --verbose if available.
Print the Python executable, current directory, and resolved config path.
Reduce the input to the smallest failing example.
Check whether the failure is input validation, dependency, network, permission, or service state.
Add a regression test or evaluation case before fixing the bug.

Debug helper:

def debug_context() -> dict[str, str]:
    import os
    import sys
    from pathlib import Path

    return {
        "executable": sys.executable,
        "cwd": str(Path.cwd()),
        "path": os.environ.get("PATH", ""),
    }

Practice Lab

Use a disposable project directory:

Create a script that implements the core pattern from this lesson.
Add --dry-run and prove it does not change state.
Add one validation check for input, config, or model/API response.
Add one log line that would help during an incident.
Add one test or sample evaluation that catches a realistic failure.

Review Questions

How does Tool-Using Agents apply differently to AI, automation, and server work?
What should be validated before another system trusts the output?
What is the safest default behavior for a production script?
What logs, metrics, or reports would help a teammate debug this later?
What would you test before scheduling or deploying this code?

Field Notes

Broad Python skill comes from combining the same engineering habits across domains. AI code still needs timeouts and validation. Automation still needs tests and logs. Server code still benefits from clean functions and data structures.

Prefer boring, explicit, reviewable Python when the code can affect users, infrastructure, cost, data, or security.

What's Next

Return to the Python course overview.

Why This Matters​

Core Concepts​

Practical Pattern​

Domain Applications​

Example Implementation​

AI Considerations​

Automation Considerations​

Server Considerations​

Validation Commands​

Common Failure Modes​

Troubleshooting Flow​

Practice Lab​

Review Questions​

Field Notes​

What's Next​