Reproducibility Infrastructure

This document describes the reproducibility infrastructure for TNFR benchmarks.

Overview

The TNFR project now includes deterministic pipeline execution and artifact traceability through: - Global seed management for benchmarks - SHA256 checksum generation for all outputs - Manifest-based verification - CI integration for reproducibility testing

Quick Start

Run all benchmarks with deterministic seeds

make reproduce

This will: 1. Run all configured benchmarks with seed=42 (default) 2. Generate output artifacts in artifacts/ directory 3. Create artifacts/manifest.json with checksums

Verify checksums

make reproduce-verify

This verifies that all artifacts match the checksums in the manifest.

Advanced Usage

Run specific benchmarks

python scripts/run_reproducible_benchmarks.py \
  --benchmarks comprehensive_cache_profiler full_pipeline_profile \
  --seed 123 \
  --output-dir my_artifacts

Custom verification

python scripts/run_reproducible_benchmarks.py \
  --verify my_artifacts/manifest.json \
  --verbose

Configured Benchmarks

The following benchmarks are configured for reproducible execution:

comprehensive_cache_profiler - Tracks buffer allocation effectiveness across TNFR hot paths
full_pipeline_profile - Full telemetry + ΔNFR pipeline profiling
cache_hot_path_profiler - Cache metrics for hot execution paths
compute_si_profile - Sense Index profiling (vectorized vs fallback)

Manifest Format

The manifest file (manifest.json) contains:

{
  "seed": 42,
  "benchmarks": {
    "benchmark_name": {
      "status": "success",
      "output_files": ["path/to/output.json"],
      "checksums": {
        "output.json": "sha256_checksum_here"
      }
    }
  }
}

CI Integration

The reproducibility CI workflow (.github/workflows/reproducibility.yml) runs on: - Push to main/master - Pull requests - Manual trigger

It verifies that benchmarks: 1. Complete successfully 2. Generate valid output artifacts 3. Produce consistent manifest structure

TNFR Compliance

This infrastructure follows TNFR principles:

Controlled Determinism (Invariant #8): Seeds ensure reproducible execution
Structural Traceability: Checksums provide artifact verification
Operational Fractality: No changes to core TNFR operators
Trans-scale Neutrality: Infrastructure works across all benchmark scales

Troubleshooting

Benchmark fails to run

Check that all dependencies are installed:

pip install .[test,numpy,yaml,orjson]

Checksum mismatch

Some benchmarks may include timing information that varies between runs. This is expected behavior. The important part is: 1. Benchmarks run successfully with consistent seeds 2. Structural outputs are deterministic 3. Manifest structure is valid

Missing benchmark script

Ensure you're running from the repository root and the benchmark script exists in benchmarks/.

Adding New Benchmarks

To add a new benchmark to the reproducibility suite:

Ensure the benchmark script accepts --seed parameter
Add configuration to BENCHMARK_CONFIGS in scripts/run_reproducible_benchmarks.py
Test with: python scripts/run_reproducible_benchmarks.py --benchmarks your_benchmark
Update this documentation

References

scripts/README.md - Script documentation
benchmarks/README.md - Benchmark usage guide
AGENTS.md - TNFR paradigm compliance