Project Architecture Patterns

What You'll Learn

How to structure a Python project as it grows from a single script to a multi-module application, and which patterns to use at each stage.

Stage 1: The Single Script

Perfect for: personal automation, one-off tasks, less than 100 lines

task.py

#!/usr/bin/env python3
"""Process and report on daily sales data."""

import csv
import sys
from pathlib import Path


def load_sales(path: Path) -> list[dict]:
    with open(path, newline="", encoding="utf-8") as f:
        return list(csv.DictReader(f))


def summarize(records: list[dict]) -> dict:
    total = sum(float(r["amount"]) for r in records)
    return {"count": len(records), "total": total}


def main() -> int:
    if len(sys.argv) != 2:
        print("Usage: task.py <sales.csv>", file=sys.stderr)
        return 1

    records = load_sales(Path(sys.argv[1]))
    summary = summarize(records)
    print(f"Records: {summary['count']}, Total: ${summary['total']:.2f}")
    return 0


if __name__ == "__main__":
    sys.exit(main())

Stage 2: Script + Helpers

Perfect for: scripts with reusable logic, 100–500 lines

project/
├── main.py
├── utils.py
├── config.py
└── requirements.txt

Move reusable functions into utils.py, config into config.py:

# config.py
from dataclasses import dataclass
from pathlib import Path

@dataclass(frozen=True)
class Config:
    input_dir: Path = Path("data/input")
    output_dir: Path = Path("data/output")
    timeout: int = 30

# utils.py
import logging

log = logging.getLogger(__name__)

def setup_logging(verbose: bool = False) -> None:
    level = logging.DEBUG if verbose else logging.INFO
    logging.basicConfig(
        level=level,
        format="%(asctime)s %(levelname)s %(name)s: %(message)s"
    )

# main.py
import sys
from config import Config
from utils import setup_logging

def main() -> int:
    cfg = Config()
    setup_logging(verbose="--verbose" in sys.argv)
    ...
    return 0

if __name__ == "__main__":
    sys.exit(main())

Stage 3: Package Structure

Perfect for: shared library, CLI tool, web service backend, 500+ lines

myapp/
├── pyproject.toml          ← project metadata and deps
├── README.md
├── .gitignore
├── src/
│   └── myapp/
│       ├── __init__.py
│       ├── __main__.py     ← python -m myapp entry
│       ├── cli.py          ← argument parsing
│       ├── core.py         ← business logic
│       ├── models.py       ← data classes / types
│       ├── storage.py      ← file / database I/O
│       └── utils.py        ← shared helpers
└── tests/
    ├── conftest.py
    ├── test_core.py
    └── test_storage.py

__main__.py enables python -m myapp:

# src/myapp/__main__.py
from .cli import main
import sys

sys.exit(main())

Separation of Concerns

Keep these responsibilities in separate files:

Layer	What it does	Example file
CLI / Entry	Parse arguments, call logic	`cli.py`
Core / Logic	Business rules, transformations	`core.py`
Models	Data classes, types	`models.py`
Storage / IO	Files, DB, APIs	`storage.py`
Utils	Shared helpers	`utils.py`
Config	Settings, env vars	`config.py`

# ❌ Everything in one function
def run(args):
    config = load_config()
    records = open_database(config.db_url)
    processed = [transform(r) for r in records]
    save_to_file(processed)
    send_email(processed)

# ✅ Each step is a separate, testable function
def load_config() -> Config: ...
def fetch_records(config: Config) -> list[Record]: ...
def transform_records(records: list[Record]) -> list[Result]: ...
def save_results(results: list[Result], path: Path) -> None: ...
def notify(results: list[Result], config: Config) -> None: ...

Configuration Patterns

From environment variables (for servers/containers)

import os
from dataclasses import dataclass

@dataclass(frozen=True)
class Config:
    db_url: str = os.environ.get("DATABASE_URL", "sqlite:///local.db")
    api_key: str = os.environ.get("API_KEY", "")
    debug: bool = os.environ.get("DEBUG", "false").lower() == "true"
    workers: int = int(os.environ.get("WORKERS", "4"))

    def validate(self) -> None:
        if not self.api_key:
            raise ValueError("API_KEY environment variable required")

From a .env file (with python-dotenv)

pip install python-dotenv

from dotenv import load_dotenv
load_dotenv()  # loads .env into os.environ
config = Config()

.env:

DATABASE_URL=postgresql://localhost/myapp
API_KEY=secret123
DEBUG=false

The Repository Pattern (for storage)

Decouple your logic from storage details:

# storage.py
from pathlib import Path
import json

class UserRepository:
    def __init__(self, path: Path):
        self.path = path

    def load_all(self) -> list[dict]:
        if not self.path.exists():
            return []
        return json.loads(self.path.read_text(encoding="utf-8"))

    def save(self, users: list[dict]) -> None:
        self.path.write_text(json.dumps(users, indent=2), encoding="utf-8")

    def find(self, user_id: int) -> dict | None:
        return next((u for u in self.load_all() if u["id"] == user_id), None)

Your logic only calls repo.find(), repo.save() — it doesn't know if the data is in a file, database, or API.

Common Anti-Patterns

Anti-Pattern	Problem	Fix
Giant `main()` function	Hard to test	Split into small functions
Hardcoded paths/secrets	Not portable	Use env vars and `pathlib`
Global state (globals)	Unpredictable	Pass data as arguments
Mixing IO and logic	Hard to test logic	Separate `load()`, `process()`, `save()`
No tests	Breaks silently	Write tests from the start

Quick Reference

Stage 1: script.py
Stage 2: main.py + utils.py + config.py
Stage 3: src/myapp/ package with separated layers

Layers:
  cli.py    → argparse, entry point
  core.py   → business logic
  models.py → @dataclass types
  storage.py → IO operations
  config.py → settings from env vars

Config:
  @dataclass(frozen=True) class Config
  read from os.environ.get(...)
  load_dotenv() for .env files

Entry:
  if __name__ == "__main__": sys.exit(main())
  __main__.py for python -m myapp

What's Next

→ Module 5: Files, Config, and CLI

What You'll Learn​

Stage 1: The Single Script​

Stage 2: Script + Helpers​

Stage 3: Package Structure​

Separation of Concerns​

Configuration Patterns​

From environment variables (for servers/containers)​

From a .env file (with python-dotenv)​

The Repository Pattern (for storage)​

Common Anti-Patterns​

Quick Reference​

What's Next​