Project Architecture Patterns
What You'll Learn
How to structure a Python project as it grows from a single script to a multi-module application, and which patterns to use at each stage.
Stage 1: The Single Script
Perfect for: personal automation, one-off tasks, less than 100 lines
task.py
#!/usr/bin/env python3
"""Process and report on daily sales data."""
import csv
import sys
from pathlib import Path
def load_sales(path: Path) -> list[dict]:
with open(path, newline="", encoding="utf-8") as f:
return list(csv.DictReader(f))
def summarize(records: list[dict]) -> dict:
total = sum(float(r["amount"]) for r in records)
return {"count": len(records), "total": total}
def main() -> int:
if len(sys.argv) != 2:
print("Usage: task.py <sales.csv>", file=sys.stderr)
return 1
records = load_sales(Path(sys.argv[1]))
summary = summarize(records)
print(f"Records: {summary['count']}, Total: ${summary['total']:.2f}")
return 0
if __name__ == "__main__":
sys.exit(main())
Stage 2: Script + Helpers
Perfect for: scripts with reusable logic, 100–500 lines
project/
├── main.py
├── utils.py
├── config.py
└── requirements.txt
Move reusable functions into utils.py, config into config.py:
# config.py
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class Config:
input_dir: Path = Path("data/input")
output_dir: Path = Path("data/output")
timeout: int = 30
# utils.py
import logging
log = logging.getLogger(__name__)
def setup_logging(verbose: bool = False) -> None:
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(
level=level,
format="%(asctime)s %(levelname)s %(name)s: %(message)s"
)
# main.py
import sys
from config import Config
from utils import setup_logging
def main() -> int:
cfg = Config()
setup_logging(verbose="--verbose" in sys.argv)
...
return 0
if __name__ == "__main__":
sys.exit(main())
Stage 3: Package Structure
Perfect for: shared library, CLI tool, web service backend, 500+ lines
myapp/
├── pyproject.toml ← project metadata and deps
├── README.md
├── .gitignore
├── src/
│ └── myapp/
│ ├── __init__.py
│ ├── __main__.py ← python -m myapp entry
│ ├── cli.py ← argument parsing
│ ├── core.py ← business logic
│ ├── models.py ← data classes / types
│ ├── storage.py ← file / database I/O
│ └── utils.py ← shared helpers
└── tests/
├── conftest.py
├── test_core.py
└── test_storage.py
__main__.py enables python -m myapp:
# src/myapp/__main__.py
from .cli import main
import sys
sys.exit(main())
Separation of Concerns
Keep these responsibilities in separate files:
| Layer | What it does | Example file |
|---|---|---|
| CLI / Entry | Parse arguments, call logic | cli.py |
| Core / Logic | Business rules, transformations | core.py |
| Models | Data classes, types | models.py |
| Storage / IO | Files, DB, APIs | storage.py |
| Utils | Shared helpers | utils.py |
| Config | Settings, env vars | config.py |
# ❌ Everything in one function
def run(args):
config = load_config()
records = open_database(config.db_url)
processed = [transform(r) for r in records]
save_to_file(processed)
send_email(processed)
# ✅ Each step is a separate, testable function
def load_config() -> Config: ...
def fetch_records(config: Config) -> list[Record]: ...
def transform_records(records: list[Record]) -> list[Result]: ...
def save_results(results: list[Result], path: Path) -> None: ...
def notify(results: list[Result], config: Config) -> None: ...
Configuration Patterns
From environment variables (for servers/containers)
import os
from dataclasses import dataclass
@dataclass(frozen=True)
class Config:
db_url: str = os.environ.get("DATABASE_URL", "sqlite:///local.db")
api_key: str = os.environ.get("API_KEY", "")
debug: bool = os.environ.get("DEBUG", "false").lower() == "true"
workers: int = int(os.environ.get("WORKERS", "4"))
def validate(self) -> None:
if not self.api_key:
raise ValueError("API_KEY environment variable required")
From a .env file (with python-dotenv)
pip install python-dotenv
from dotenv import load_dotenv
load_dotenv() # loads .env into os.environ
config = Config()
.env:
DATABASE_URL=postgresql://localhost/myapp
API_KEY=secret123
DEBUG=false
The Repository Pattern (for storage)
Decouple your logic from storage details:
# storage.py
from pathlib import Path
import json
class UserRepository:
def __init__(self, path: Path):
self.path = path
def load_all(self) -> list[dict]:
if not self.path.exists():
return []
return json.loads(self.path.read_text(encoding="utf-8"))
def save(self, users: list[dict]) -> None:
self.path.write_text(json.dumps(users, indent=2), encoding="utf-8")
def find(self, user_id: int) -> dict | None:
return next((u for u in self.load_all() if u["id"] == user_id), None)
Your logic only calls repo.find(), repo.save() — it doesn't know if the data is in a file, database, or API.
Common Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
Giant main() function | Hard to test | Split into small functions |
| Hardcoded paths/secrets | Not portable | Use env vars and pathlib |
| Global state (globals) | Unpredictable | Pass data as arguments |
| Mixing IO and logic | Hard to test logic | Separate load(), process(), save() |
| No tests | Breaks silently | Write tests from the start |
Quick Reference
Stage 1: script.py
Stage 2: main.py + utils.py + config.py
Stage 3: src/myapp/ package with separated layers
Layers:
cli.py → argparse, entry point
core.py → business logic
models.py → @dataclass types
storage.py → IO operations
config.py → settings from env vars
Config:
@dataclass(frozen=True) class Config
read from os.environ.get(...)
load_dotenv() for .env files
Entry:
if __name__ == "__main__": sys.exit(main())
__main__.py for python -m myapp