meridian_tools.cv

Cross-validation and holdout orchestration utilities.

Module: meridian_tools.cv

Functions

build_last_window_holdout_mask

def build_last_window_holdout_mask(
    time_index: Sequence[Any],
    holdout_size: int,
    geo_index: Sequence[Any] | None = None,
) -> np.ndarray

Build a blocked-tail holdout mask for Meridian’s holdout_id.

Returns a 1-D boolean mask for national data and a 2-D (n_geos, n_times) mask when geo_index is provided. The last holdout_size time periods are marked as True (held out).

Parameters:

  • time_index — Strictly increasing sequence of time period identifiers.
  • holdout_size — Number of tail periods to hold out. Must be positive and less than the length of time_index.
  • geo_index — Optional sequence of geo identifiers. If provided, the mask is broadcast across geos.

Returns: Boolean NumPy array.

Raises: ValueError for non-monotonic indices, undersized indices, or impossible holdout sizes.


build_rolling_origin_splits

def build_rolling_origin_splits(
    time_index: Sequence[Any],
    *,
    initial_train_size: int,
    test_size: int,
    step_size: int | None = None,
    max_splits: int | None = None,
) -> list[BlockedTimeSplit]

Create expanding-window blocked time splits for rolling-origin validation.

Parameters:

  • time_index — Strictly increasing sequence of time period identifiers.
  • initial_train_size — Size of the first training window.
  • test_size — Size of each test window.
  • step_size — Step between splits. Must equal test_size. Defaults to test_size.
  • max_splits — Maximum number of splits to generate. Must be >= 2 if set.

Returns: List of BlockedTimeSplit instances (at least 2).

Raises: ValueError for invalid parameters or if fewer than 2 splits can be generated.


build_validation_splits

def build_validation_splits(
    validation_config: ValidationConfig,
    time_index: Sequence[Any],
) -> list[BlockedTimeSplit]

Build deterministic split definitions from the typed validation config.

Dispatches to the appropriate split builder based on validation_config.strategy. Returns an empty list for strategy: none.

Parameters:

  • validation_config — A validated ValidationConfig instance.
  • time_index — Strictly increasing sequence of time period identifiers.

Returns: List of BlockedTimeSplit instances (empty for none).


build_validation_plan

def build_validation_plan(
    validation_config: ValidationConfig,
    time_index: Sequence[Any],
    geo_index: Sequence[Any] | None = None,
) -> ValidationPlan

Materialise concrete validation and final-fit run specs from one config.

For strategy: none, returns a plan with no validation runs and no final-fit run. For blocked_tail or rolling_origin, returns one ValidationRunSpec per split plus a final_fit_run spec that trains on the full time axis with no holdout.

Parameters:

  • validation_config — A validated ValidationConfig instance.
  • time_index — Strictly increasing sequence of time period identifiers.
  • geo_index — Optional sequence of geo identifiers for geo-panel models.

Returns: A ValidationPlan instance.

Example:

from meridian_tools.config import load_yaml_config
from meridian_tools.cv import build_validation_plan

config = load_yaml_config("project.yml")
plan = build_validation_plan(
    config.validation,
    time_index=["2024-01-01", "2024-01-08", "..."],
    geo_index=["US-CA", "US-NY"],
)

for run_spec in plan.validation_runs:
    print(run_spec.split_label, len(run_spec.train_indices), len(run_spec.test_indices))

if plan.final_fit_run:
    print("Final fit:", plan.final_fit_run.split_label)

Classes

BlockedTimeSplit

@dataclass(frozen=True)
class BlockedTimeSplit

One blocked time split for validation.

Attribute Type Description
label str Human-readable split label (e.g. "blocked_tail", "split_01").
train_indices tuple[int, ...] Integer indices into the time axis for training.
test_indices tuple[int, ...] Integer indices into the time axis for testing.
train_dates tuple[str, ...] Date values for training periods.
test_dates tuple[str, ...] Date values for test periods.

ValidationRunSpec

@dataclass(frozen=True)
class ValidationRunSpec

One concrete validation or final-fit run derived from a split plan. Passed to PipelineRunConfig.validation_spec to control a single pipeline execution.

Attribute Type Description
mode "validation" | "final_fit" Run mode.
strategy str Validation strategy.
split_label str Human-readable split identifier.
holdout_source str How the holdout mask was produced.
generated_holdout bool Whether the holdout was auto-generated.
holdout_id np.ndarray | None Concrete holdout mask (immutable).
train_indices tuple[int, ...] Training time indices.
test_indices tuple[int, ...] Test time indices.
train_dates tuple[str, ...] Training date values.
test_dates tuple[str, ...] Test date values.
run_name_suffix str Suffix for the run directory name.

Methods:

  • to_artifact_payload() — Returns the JSON-serialisable dictionary written to validation_spec.json.

ValidationPlan

@dataclass(frozen=True)
class ValidationPlan

Concrete validation runs and the separate final-fit run for one config.

Attribute Type Description
validation_runs tuple[ValidationRunSpec, ...] One spec per validation split.
final_fit_run ValidationRunSpec | None Full-sample final-fit spec. None for strategy: none.