Skip to main content

Project Architecture Patterns

What You'll Learn

How to structure a Python project as it grows from a single script to a multi-module application, and which patterns to use at each stage.

Stage 1: The Single Script

Perfect for: personal automation, one-off tasks, less than 100 lines

task.py
#!/usr/bin/env python3
"""Process and report on daily sales data."""

import csv
import sys
from pathlib import Path


def load_sales(path: Path) -> list[dict]:
with open(path, newline="", encoding="utf-8") as f:
return list(csv.DictReader(f))


def summarize(records: list[dict]) -> dict:
total = sum(float(r["amount"]) for r in records)
return {"count": len(records), "total": total}


def main() -> int:
if len(sys.argv) != 2:
print("Usage: task.py <sales.csv>", file=sys.stderr)
return 1

records = load_sales(Path(sys.argv[1]))
summary = summarize(records)
print(f"Records: {summary['count']}, Total: ${summary['total']:.2f}")
return 0


if __name__ == "__main__":
sys.exit(main())

Stage 2: Script + Helpers

Perfect for: scripts with reusable logic, 100–500 lines

project/
├── main.py
├── utils.py
├── config.py
└── requirements.txt

Move reusable functions into utils.py, config into config.py:

# config.py
from dataclasses import dataclass
from pathlib import Path

@dataclass(frozen=True)
class Config:
input_dir: Path = Path("data/input")
output_dir: Path = Path("data/output")
timeout: int = 30
# utils.py
import logging

log = logging.getLogger(__name__)

def setup_logging(verbose: bool = False) -> None:
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(
level=level,
format="%(asctime)s %(levelname)s %(name)s: %(message)s"
)
# main.py
import sys
from config import Config
from utils import setup_logging

def main() -> int:
cfg = Config()
setup_logging(verbose="--verbose" in sys.argv)
...
return 0

if __name__ == "__main__":
sys.exit(main())

Stage 3: Package Structure

Perfect for: shared library, CLI tool, web service backend, 500+ lines

myapp/
├── pyproject.toml ← project metadata and deps
├── README.md
├── .gitignore
├── src/
│ └── myapp/
│ ├── __init__.py
│ ├── __main__.py ← python -m myapp entry
│ ├── cli.py ← argument parsing
│ ├── core.py ← business logic
│ ├── models.py ← data classes / types
│ ├── storage.py ← file / database I/O
│ └── utils.py ← shared helpers
└── tests/
├── conftest.py
├── test_core.py
└── test_storage.py

__main__.py enables python -m myapp:

# src/myapp/__main__.py
from .cli import main
import sys

sys.exit(main())

Separation of Concerns

Keep these responsibilities in separate files:

LayerWhat it doesExample file
CLI / EntryParse arguments, call logiccli.py
Core / LogicBusiness rules, transformationscore.py
ModelsData classes, typesmodels.py
Storage / IOFiles, DB, APIsstorage.py
UtilsShared helpersutils.py
ConfigSettings, env varsconfig.py
# ❌ Everything in one function
def run(args):
config = load_config()
records = open_database(config.db_url)
processed = [transform(r) for r in records]
save_to_file(processed)
send_email(processed)

# ✅ Each step is a separate, testable function
def load_config() -> Config: ...
def fetch_records(config: Config) -> list[Record]: ...
def transform_records(records: list[Record]) -> list[Result]: ...
def save_results(results: list[Result], path: Path) -> None: ...
def notify(results: list[Result], config: Config) -> None: ...

Configuration Patterns

From environment variables (for servers/containers)

import os
from dataclasses import dataclass

@dataclass(frozen=True)
class Config:
db_url: str = os.environ.get("DATABASE_URL", "sqlite:///local.db")
api_key: str = os.environ.get("API_KEY", "")
debug: bool = os.environ.get("DEBUG", "false").lower() == "true"
workers: int = int(os.environ.get("WORKERS", "4"))

def validate(self) -> None:
if not self.api_key:
raise ValueError("API_KEY environment variable required")

From a .env file (with python-dotenv)

pip install python-dotenv
from dotenv import load_dotenv
load_dotenv() # loads .env into os.environ
config = Config()

.env:

DATABASE_URL=postgresql://localhost/myapp
API_KEY=secret123
DEBUG=false

The Repository Pattern (for storage)

Decouple your logic from storage details:

# storage.py
from pathlib import Path
import json

class UserRepository:
def __init__(self, path: Path):
self.path = path

def load_all(self) -> list[dict]:
if not self.path.exists():
return []
return json.loads(self.path.read_text(encoding="utf-8"))

def save(self, users: list[dict]) -> None:
self.path.write_text(json.dumps(users, indent=2), encoding="utf-8")

def find(self, user_id: int) -> dict | None:
return next((u for u in self.load_all() if u["id"] == user_id), None)

Your logic only calls repo.find(), repo.save() — it doesn't know if the data is in a file, database, or API.

Common Anti-Patterns

Anti-PatternProblemFix
Giant main() functionHard to testSplit into small functions
Hardcoded paths/secretsNot portableUse env vars and pathlib
Global state (globals)UnpredictablePass data as arguments
Mixing IO and logicHard to test logicSeparate load(), process(), save()
No testsBreaks silentlyWrite tests from the start

Quick Reference

Stage 1: script.py
Stage 2: main.py + utils.py + config.py
Stage 3: src/myapp/ package with separated layers

Layers:
cli.py → argparse, entry point
core.py → business logic
models.py → @dataclass types
storage.py → IO operations
config.py → settings from env vars

Config:
@dataclass(frozen=True) class Config
read from os.environ.get(...)
load_dotenv() for .env files

Entry:
if __name__ == "__main__": sys.exit(main())
__main__.py for python -m myapp

What's Next

Module 5: Files, Config, and CLI