Concepts

Background material on architecture, design decisions, and Meridian integration boundaries.

Pages

  • Architecturemeridian-tools is a companion package designed for agency teams that use Google Meridian as their client-facing MMM (Marketing Mix Modelling) engine. It provides a stricter, more reproducible workflow around Meridian without forking the upstream library.
  • Design decisions — This document records the key design decisions in meridian-tools and the reasoning behind them. It is intended for maintainers and contributors who need to understand why things are built the way they are.
  • Meridian integration — This document describes how meridian-tools integrates with Google Meridian, the boundaries of that integration, and the risks associated with different coupling levels.

Subsections of Concepts

Architecture

meridian-tools is a companion package designed for agency teams that use Google Meridian as their client-facing MMM (Marketing Mix Modelling) engine. It provides a stricter, more reproducible workflow around Meridian without forking the upstream library.

Core philosophy

  1. No forkingmeridian-tools strictly wraps Meridian. It does not modify Meridian’s internal code or model implementations.
  2. Reproducibility — All runs are driven by typed YAML configurations, ensuring that models can be perfectly reproduced.
  3. Structured workflow — The package enforces a staged execution pipeline (validation, model fit, assessment, decomposition, response curves, optimisation).
  4. Lifecycle management — Runs are treated as immutable artefacts with rich metadata, allowing for easy comparison, refreshing, and storage.

Module map

meridian_tools/
├── __init__.py          Lazy-loading package exports
├── artifacts.py         Manifest and JSON helpers
├── cli.py               CLI entry point (argparse)
├── config.py            Pydantic YAML models
├── cv.py                Validation split logic
├── demo.py              Bundled demo discovery
├── diagnostics.py       Diagnostics export
├── exports.py           Meridian analysis surface wrappers
├── launcher.py          Run execution wrapper
├── lifecycle.py         Post-run record management
├── log_likelihood.py    Log-likelihood reconstruction adapter
├── model_selection.py   ArviZ LOO/WAIC wrappers
├── terminal.py          CLI presentation and warning grouping
└── version.py           Static version

Layered import design

Meridian and TensorFlow are never imported at module level in the configuration, validation, or CLI layers. This means lightweight operations respond instantly:

Operation Imports loaded
meridian-tools --help pydantic, yaml
load_yaml_config(path) pydantic, yaml
build_validation_plan(...) numpy
run_pipeline(...) Everything (Meridian, TF, ArviZ, etc.)

The __init__.py uses __getattr__-based lazy loading so that import meridian_tools does not trigger heavy dependency imports.

Pipeline execution model

The runner executes stages sequentially. Each stage:

  1. Creates a StageRecord and appends it to the in-memory manifest.
  2. Calls the stage function, which returns a dict[str, Path] of artefacts.
  3. Normalises artefact paths to be relative to the run directory.
  4. Writes the updated manifest to disk.

This design means a crash mid-pipeline leaves a readable partial manifest on disk. The last entry in the stages array is the last successfully completed stage.

┌─────────────────────┐
│  00_run_metadata    │  Archive source + resolved configs
├─────────────────────┤
│  10_validation      │  Write validation spec (if applicable)
├─────────────────────┤
│  20_model_fit       │  Build data → build model → sample posterior
├─────────────────────┤
│  30_model_assessment│  Diagnostics + model selection + summary
├─────────────────────┤
│  40_decomposition   │  Summary metrics (NetCDF + CSV)
├─────────────────────┤
│  60_response_curves │  Response curves (if configured)
├─────────────────────┤
│  70_optimisation    │  Budget optimisation (if configured)
└─────────────────────┘

The numbering gap at 50 reserves space for future stages without renumbering.

Configuration architecture

The separation between authored YAML and runtime-only config is strict:

  • MeridianToolsConfig — Pydantic model for the YAML file. Owns project metadata, data paths, model spec, fit settings, validation strategy, and export switches.
  • PipelineRunConfig — Frozen dataclass for runtime options. Owns output directory, run name, and concrete validation spec.

The runner writes two config copies to each run directory:

  • config.source.yaml — Verbatim copy of the input YAML.
  • config.resolved.yaml — After relative path resolution. Never includes runtime-only fields.

Artefact path normalisation

All artefact paths in manifests are stored relative to the run directory through normalize_artifact_paths. This makes run directories portable across machines. The lifecycle layer resolves them back to absolute paths at load time.

Meridian coupling boundaries

Coupling level Modules Surface used
Public API runner.py, exports.py Meridian, ModelSpec, CsvDataLoader, Analyzer, Summarizer, BudgetOptimizer
Semi-public log_likelihood.py, exports.py model_context, inference_data, input_data
Private log_likelihood.py _get_joint_dist_unpinned, _prepare_latents_for_reconstruction, _reconstruct_posteriors

The private-API coupling is confined to log_likelihood.py and wrapped in comprehensive error handling. See Meridian integration for details.

Data flow

  1. Input — A typed YAML file defines the entire run scope.
  2. Initialisation — The runner resolves the config and creates a timestamped run directory.
  3. Execution — The pipeline steps through stages, maintaining a central state dictionary with the fitted model and intermediate results.
  4. Export — Each stage writes specific artefacts to disk within the run directory.
  5. Finalisation — The manifest is completed with status: "completed" and finished_at, locking the run state.
  6. Lifecycle — Downstream processes or analysts consume artefacts or use lifecycle tools to compare, refresh, or audit runs.

Design decisions

This document records the key design decisions in meridian-tools and the reasoning behind them. It is intended for maintainers and contributors who need to understand why things are built the way they are.

No IID cross-validation

Decision: meridian-tools does not implement random-shuffle or naive k-fold cross-validation.

Reasoning: MMM data is time series. Random IID splits break temporal structure, leading to data leakage where future observations inform training and past observations appear in the test set. This produces optimistic accuracy estimates that do not reflect real-world forecasting performance.

The package provides two time-respecting alternatives:

  • Blocked tail — reserves the most recent observations as a single test block.
  • Rolling origin — expanding-window forward-chaining that respects temporal ordering at every split.

Non-overlapping rolling-origin test windows

Decision: step_size must equal test_size for rolling-origin splits.

Reasoning: Overlapping test windows would mean the same observation appears in multiple test sets. This violates the independence assumption needed for comparing validation scores across splits and complicates the interpretation of aggregate metrics. Non-overlapping windows ensure each observation is evaluated exactly once across the split plan.

Minimum two splits for rolling origin

Decision: build_rolling_origin_splits requires at least two splits.

Reasoning: A single rolling-origin split is functionally identical to a blocked-tail holdout and provides no comparative signal. If your data only supports one split, use blocked_tail instead — it communicates the intent more clearly.

Holdout restriction for model selection

Decision: LOO and WAIC are only available for models where holdout_id is None.

Reasoning: LOO and WAIC estimate expected log predictive density (ELPD) using the full observed likelihood surface. A model fitted with a holdout mask has a modified likelihood that excludes held-out observations. Computing LOO on this truncated likelihood would produce ELPD estimates that are not comparable to those from full-sample fits.

The correct workflow is:

  1. Use validation splits for candidate evaluation.
  2. Select the best specification based on holdout performance.
  3. Refit the chosen specification on the full dataset.
  4. Compute LOO/WAIC on the full-sample fit for model quality reporting.

Separation of validation fits and final fits

Decision: Validation runs and final production fits are separate pipeline executions that produce separate run directories.

Reasoning: A validation fit is trained on a subset of the data. Its posterior reflects that subset and should not be used as the production artefact. Keeping them as separate runs prevents accidental use of a validation fit for downstream analysis or reporting.

Lazy imports for CLI responsiveness

Decision: Heavy dependencies (TensorFlow, NumPy, Meridian, ArviZ) are not imported at module level in the config, CLI, or validation layers.

Reasoning: TensorFlow alone takes several seconds to import. The CLI must respond instantly for --help and --list operations. The __init__.py uses __getattr__-based lazy loading, and the test suite verifies that build_parser() only loads pydantic and yaml.

Pydantic extra="forbid" everywhere

Decision: All configuration models reject unexpected keys.

Reasoning: Silent acceptance of unknown keys is a common source of misconfiguration in YAML-driven tools. A typo like export_pridictive_accuracy would be silently ignored without extra="forbid", leading to unexpected default behaviour. Strict rejection catches these errors at config load time with clear error messages.

Relative artefact paths in manifests

Decision: All artefact paths in run_manifest.json are stored relative to the run directory.

Reasoning: Absolute paths would tie run directories to a specific machine or filesystem layout. Relative paths make run directories portable — they can be copied, archived, or moved between machines without breaking the manifest contract.

Non-destructive lifecycle operations

Decision: refresh_run creates a new sibling directory rather than overwriting the source.

Reasoning: Overwriting a validated production run would destroy the audit trail. Creating a sibling preserves the original for comparison and rollback. The lifecycle layer explicitly validates that source directories are not mutated by refresh operations.

Manifest-per-stage persistence

Decision: The manifest is written to disk after each stage completes, not only at the end of the pipeline.

Reasoning: MCMC sampling can run for minutes to hours. If the process crashes or is killed during a later stage, the partial manifest on disk reflects what completed successfully. This aids debugging and allows partial runs to be inspected without special tooling.

Stage numbering with gaps

Decision: Pipeline stages use numbers 00, 10, 20, 30, 40, 60, 70 with a gap at 50.

Reasoning: The gaps allow future stages to be inserted at natural positions (e.g. a stage 50 for custom analysis) without renumbering existing stages. Renumbering would break backward compatibility with stored manifests and any downstream tooling that references stage names.

Config source vs. resolved archival

Decision: Both the verbatim source YAML and the resolved YAML are archived in every run directory.

Reasoning: The source YAML shows what the analyst authored (including relative paths). The resolved YAML shows the runtime interpretation (absolute paths, defaults applied). Both are needed for reproducibility:

  • The source is needed to understand intent.
  • The resolved config is needed to reproduce the exact execution.

Runtime-only fields (output_dir, run_name, validation_spec) are deliberately excluded from the resolved config because they are not part of the reproducible model specification.

Structured model selection errors

Decision: Model selection failures produce ModelSelectionError with a machine-readable reason_code rather than generic exceptions.

Reasoning: The pipeline needs to distinguish between “model selection is not possible for this run type” (expected) and “something is broken” (unexpected). Structured reason codes allow:

  • The runner to write model_selection_status.json without failing the run.
  • The lifecycle layer to compare model selection availability across runs.
  • Downstream consumers to programmatically handle different failure modes.

Meridian integration

This document describes how meridian-tools integrates with Google Meridian, the boundaries of that integration, and the risks associated with different coupling levels.

Integration philosophy

meridian-tools wraps Meridian without forking it. Meridian remains the modelling engine; meridian-tools adds workflow orchestration, validation, diagnostics bundling, model selection, and lifecycle management on top.

This approach means:

  • Meridian upgrades can be adopted without merging fork changes.
  • The upstream project’s API stability directly affects meridian-tools.
  • Any use of Meridian-internal APIs must be explicitly managed.

Coupling levels

Public API (low risk)

These are documented, versioned Meridian surfaces:

Surface Used by
Meridian (model class) runner.py
ModelSpec runner.py
CsvDataLoader, CoordToColumns runner.py
Analyzer exports.py, diagnostics.py
Summarizer exports.py
BudgetOptimizer exports.py
ModelReviewer diagnostics.py
MediaEffects, MediaSummary, ModelDiagnostics, ModelFit exports.py
save_meridian (schema serde) exports.py

These are unlikely to break without a Meridian major version bump. The exact google-meridian==1.5.3 pin keeps these assumptions aligned with the validated release baseline.

Semi-public API (medium risk)

These are accessible attributes on Meridian model objects that are used but not formally documented as stable:

Surface Used by Purpose
model.inference_data log_likelihood.py, model_selection.py Access ArviZ InferenceData
model.model_context log_likelihood.py, exports.py Access model structure
model.input_data exports.py Access input data for spend computation
model.posterior_sampler_callable log_likelihood.py Access posterior sampler

These are stable in practice (they are used by Meridian’s own analysis surfaces) but are not guaranteed to be stable across versions.

Private API (high risk)

These are _-prefixed methods on Meridian’s posterior_sampler_callable, used exclusively in log_likelihood.py for log-likelihood reconstruction:

_get_joint_dist_unpinned
_prepare_latents_for_reconstruction
_reconstruct_posteriors

These methods are Meridian-internal and may change or be removed in any Meridian release, including patch versions. They are necessary because Meridian does not provide a public API for pointwise log-likelihood computation.

Risk mitigation

Compatibility guard

log_likelihood.py checks for the presence of all three private methods before attempting reconstruction:

required_sampler_methods = (
    "_get_joint_dist_unpinned",
    "_prepare_latents_for_reconstruction",
    "_reconstruct_posteriors",
)
if any(not hasattr(posterior_sampler, method) for method in required_sampler_methods):
    raise ModelSelectionError(
        "...",
        reason_code="meridian_internal_seam_incompatible",
    )

If any method is missing, the error is caught and recorded as a model_selection_status.json artefact with reason_code: meridian_internal_seam_incompatible. The rest of the pipeline continues normally.

Graceful degradation

Model selection incompatibility is non-fatal at every level:

  1. log_likelihood.py raises ModelSelectionError with a structured code.
  2. model_selection.py propagates the error.
  3. runner.py catches it, writes model_selection_status.json, and continues.
  4. The manifest records the assessment stage as completed.
  5. The lifecycle layer can inspect model_selection_status to understand why model selection was unavailable.

Version pinning

The pyproject.toml pins Meridian to google-meridian[schema]==1.5.3. Any Meridian upgrade must refresh the private log-likelihood reconstruction baseline before the version guard is relaxed.

Integration testing

The test suite includes a gated live Meridian verification command:

MERIDIAN_TOOLS_ENABLE_REAL_FIT=1 pytest tests/test_demo_integration.py::test_real_pipeline_refresh_smoke tests/test_log_likelihood.py::test_compute_log_likelihood_dataset_real_posterior_smoke -m real_fit -v

This command proves two different real seams:

  • one reduced real pipeline run over bundled demo data, including stored-run refresh after the original YAML is removed
  • the lower-level live log-likelihood reconstruction path

It is excluded from the default test suite because it requires real MCMC sampling, but it should be run after every Meridian version upgrade.

Constants dependency

log_likelihood.py uses Meridian constants for posterior parameter names:

from meridian import constants
# constants.BETA_GM, constants.TAU_G, constants.ETA_M, etc.

These are stable string constants but are not versioned. A Meridian release that renames these constants would cause import-time failures.

Unsaved posterior parameter recovery

Meridian does not persist all posterior parameters to InferenceData. The _recover_unsaved_state function in log_likelihood.py reconstructs:

  • tau_g_excl_baseline — Recovered from the posterior’s tau_g variable by slicing out the baseline geo index (concatenating the elements before and after baseline_geo_idx).
  • Geo deviations — Recovered from the posterior by solving deviation = (target - base) / scale for normal effects, or deviation = (log(target) - base) / scale for log-normal effects, with a scale == 0 guard that maps to zero.

This recovery is mathematically correct for the supported model families (log-normal and normal media effects). It is tested against both geo-panel and national models in test_log_likelihood.py.

What breaks on a Meridian upgrade

Change type Impact Detection
Public API signature change runner.py, exports.py break Default test suite
Semi-public attribute rename log_likelihood.py, exports.py break Default test suite
Private method removal/rename Model selection disabled Live smoke test or model_selection_status.json
Constant rename Import-time failure Default test suite
New posterior parameter Log-likelihood may be incorrect Manual review + live smoke test
Changed likelihood formula Log-likelihood may be incorrect Live smoke test
  1. Pin the new Meridian version in a branch.
  2. Run the full default test suite: pytest tests/ -v.
  3. Run the live Meridian verification command: MERIDIAN_TOOLS_ENABLE_REAL_FIT=1 pytest tests/test_demo_integration.py::test_real_pipeline_refresh_smoke tests/test_log_likelihood.py::test_compute_log_likelihood_dataset_real_posterior_smoke -m real_fit -v.
  4. If model selection breaks, check model_selection_status.json for the reason code.
  5. If private methods changed, update log_likelihood.py to match the new Meridian internals or accept graceful degradation.
  6. Update docs/project/release-baseline.md with the new verified state.