Skip to content

Testing


Running tests

# All tests
python -m pytest tests/ -v

# With coverage (enforces 80% minimum)
python -m pytest tests/ --cov=src --cov-report=term-missing --cov-fail-under=80

# Single test file
python -m pytest tests/test_transform_students.py -v

# Single test by name
python -m pytest tests/ -k "test_active_only_filter" -v

Test layout

File What it covers
test_transform_students.py StudentTransformer — active filter, grade mapping, email generation
test_transform_staff.py StaffTransformer — roster join, role mapping, deduplication
test_transform_family.py FamilyTransformer — contact extraction, missing email handling
test_transform_classes.py ClassTransformer — homeroom generation, subject class joins
test_transform_enrollments.py EnrollmentTransformer — student + teacher enrollment rows
test_blended_classes.py BlendedClassService — same teacher/time/grade detection
test_grade_mapping.py grade_to_ceds() — all grade code variants
test_email_generation.py generate_student_email() — format string substitution
test_class_generation.py generate_class_name(), generate_class_id()
test_enrollment_status.py Active/inactive/pre-reg status filtering
test_role_mapping.py map_role() — Y/N → teacher/administrator
test_school_year.py determine_school_year() — data-derived and date-fallback
test_source_config.py normalize_source_config() — dict/list/list-of-dict formats
test_extractor.py DataExtractor — encoding fallback, delimiter detection
test_loader.py DataLoader — transactional write, atomic commit, rollback on failure
test_config.py YAML loading, Pydantic validation, inheritance, cycle detection
test_registry.py Registry — correct transformer returned, DefaultTransformer fallback
test_quality_report.py DataQualityReport — missing fields, duplicates, orphan detection
test_pipeline_e2e.py Full pipeline — all 5 entities, dry-run, diff, quality flags
test_pipeline_e2e_districts.py SD48 + SD74 full pipeline with district-specific file naming
test_cli.py CLI flags — --dry-run, --diff, --quality, transactional write
test_benchmarks.py Performance benchmarks — excluded from CI, run manually

Conventions

No file I/O in unit tests

Unit tests create DataFrames directly. They do not write or read .txt / .csv files. Use tmp_path (pytest built-in) only in integration / E2E tests.

# Good
df = pd.DataFrame({"student number": ["1001"], "grade": ["10"]})
result = StudentTransformer().transform(df, mapping, context)

# Avoid in unit tests
df = pd.read_csv("tests/fixtures/students.txt")

Fixtures in conftest.py

Shared fixtures (base mappings, standard DataFrames, a default TransformContext) live in tests/conftest.py. Import them by name in any test file without explicit imports — pytest discovers them automatically.

School year mocking

BaseTransformer.determine_school_year() falls back to datetime.now() when no school year data is in the source files. Mock it in tests that need a deterministic year:

from unittest.mock import patch

def test_school_year_fallback():
    with patch("src.etl.transformers.base.datetime") as mock_dt:
        mock_dt.now.return_value.year = 2025
        mock_dt.now.return_value.month = 9   # September → school year 2025
        year = BaseTransformer().determine_school_year({}, {})
    assert year == 2025

TransformContext construction

from src.etl.transformers.context import TransformContext

context = TransformContext(
    raw_data={"StudentSchedule.txt": schedule_df, "CourseInformation.txt": course_df},
    school_year=2025,
    academic_start="2025-08-25",
    academic_end="2026-07-25",
    students_output=None,   # set to a DataFrame if the entity under test needs it
)

Testing config inheritance

from src.config.loader import load_config
from pathlib import Path

def test_sd48_inherits_base(tmp_path):
    # Write minimal base config
    (tmp_path / "myedbc_mapping.yaml").write_text("...")
    (tmp_path / "sd48myedbc_mapping.yaml").write_text("_base: myedbc\n...")
    cfg = load_config("sd48myedbc", config_dir=tmp_path)
    assert cfg.sis == "MyEducationBC"

Always pass config_dir=tmp_path in config tests so they don't read from config/mappings/.


Coverage configuration

pyproject.toml omits certain modules from coverage:

[tool.coverage.run]
omit = [
    "src/utils/logger.py",   # logging configuration only
    "src/ui/*",              # Streamlit UI — not unit-testable
    "src/ui/launcher.py",
]

Benchmarks

tests/test_benchmarks.py uses pytest-benchmark with a 5,000-row synthetic dataset. These tests are excluded from CI via the not benchmark marker:

# pyproject.toml
[tool.pytest.ini_options]
addopts = "-m 'not benchmark'"

Run manually when profiling:

python -m pytest tests/test_benchmarks.py -v --benchmark-only

CI matrix

CI runs on Python 3.9, 3.11, and 3.13 on ubuntu-latest (see .github/workflows/ci.yml). All tests must pass on all three versions before a PR can be merged.

CI also runs the following quality gates on each push:

Step Command
Format check ruff format --check src/ tests/
Type check mypy src/ --exclude 'src/ui' (UI pages excluded)
Security scan bandit -r src/ -q
Config validation make validate-config (all 5 district YAML configs)

Testing district configs with non-standard filenames

When writing E2E tests for district configs that use non-standard filenames (e.g., SD40's CSV files with SD-40_ prefix), create fixture files in tmp_path using the exact filenames the district config expects. See tests/test_pipeline_e2e_districts.py for examples.