flexmeasures.data.models.forecasting.pipelines.base

Classes

class flexmeasures.data.models.forecasting.pipelines.base.BasePipeline(target_sensor: Sensor, future_regressors: list[Sensor], past_regressors: list[Sensor], n_steps_to_predict: int, max_forecast_horizon: int, forecast_frequency: int, event_starts_after: datetime | None = None, event_ends_before: datetime | None = None, save_belief_time: datetime | None = None, predict_start: datetime | None = None, predict_end: datetime | None = None, missing_threshold: float = 1.0)

Base class for Train and Predict pipelines.

This class handles loading and preprocessing time series data for training or prediction, including missing value handling and splitting into regressors (X) and target (y).

## Covariate semantics Data for past_covariates and future_covariates is loaded broadly enough to cover: - from the beginning of the training/predict period, - through the end of the predict period, - plus max_forecast_horizon (needed for the last forecast step).

Later, split_data_all_beliefs and _generate_splits slice this superset into per-horizon inputs:

Past covariates: realized (historical) data aligned up to each split’s target_end (just before the predicted step), selecting the most recent belief per event_start.
Future covariates: realized data up to target_end plus forecasts up to target_end + max_forecast_horizon, selecting the most recent belief per event_start.
Target series: realized target values from target_start through target_end (the conditioning context for forecasting).

Parameters

past_regressorslist[str] | None: Sensor names used only as historical (past) covariates.
future_regressorslist[str]: Sensor names used as future covariates (with forecast data).
targetstr: Name of the target sensor (key in sensors).
n_steps_to_predictint: Number of forecast iterations (steps at target resolution).
max_forecast_horizonint: Maximum look-ahead horizon, in steps of the target resolution.
event_starts_after / event_ends_beforedatetime | None: Time boundaries for loading sensor events.

__init__(target_sensor: Sensor, future_regressors: list[Sensor], past_regressors: list[Sensor], n_steps_to_predict: int, max_forecast_horizon: int, forecast_frequency: int, event_starts_after: datetime | None = None, event_ends_before: datetime | None = None, save_belief_time: datetime | None = None, predict_start: datetime | None = None, predict_end: datetime | None = None, missing_threshold: float = 1.0) → None

detect_and_fill_missing_values(df: DataFrame, sensors: list[Sensor], sensor_names: list[str], start: datetime, end: datetime, interpolate_kwargs: dict | None = None, fill: float = 0.0) → TimeSeries

Detects and fills missing values in a time series using the Darts MissingValuesFiller transformer.

This method interpolates missing values in the time series using the pd.DataFrame.interpolate() method.

Parameters: - df (pd.DataFrame): The input dataframe containing time series data with a “time” column. - sensors (list[Sensor]): The list of sensors (used for logging). - start (datetime): The desired start time of the time series. - end (datetime): The desired end time of the time series. - interpolate_kwargs (dict, optional): Additional keyword arguments passed to MissingValuesFiller,

which internally calls pd.DataFrame.interpolate(). For more details, see the Darts documentation.

fill (float): value used to fill gaps in case there is no data at all.

Returns: - TimeSeries: The time series with missing values filled.

Raises: - ValueError: If the input dataframe is empty. - logging.warning: If missing values are detected and filled using pd.DataFrame.interpolate().

load_data_all_beliefs() → DataFrame

This function fetches data for each sensor. If a sensor is listed as a future regressor, it fetches all available beliefs (including forecasts).

Returns: - pd.DataFrame: A DataFrame containing all the data from each sensor.

split_data_all_beliefs(df: pd.DataFrame, is_predict_pipeline: bool = False) → tuple[list[TimeSeries] | None, list[TimeSeries] | None, list[TimeSeries], list[pd.Timestamp]]

Split the loaded sensor DataFrame into past covariates, future covariates, and target series across one or more simulated forecast times (“belief times”).

This is the main entry point for preparing model inputs. It: - Handles the autoregressive case (no regressors). - Or delegates to _generate_splits, which applies the sliding-window logic to produce per-belief covariate/target slices.

Parameters

dfpd.DataFrame: The full sensor data (from load_data_all_beliefs), with columns [event_start, belief_time, …].
is_predict_pipelinebool, default False: If True, generate splits for all prediction steps in n_steps_to_predict. If False, only generate one split (used in training).

Returns

past_covariates_listlist[TimeSeries] | None: Past regressors up to each belief_time, or None if not used.
future_covariates_listlist[TimeSeries] | None: Future regressors (realized + forecasted) up to each forecast_end, or None if not used.
target_listlist[TimeSeries]: Target series truncated at each belief_time.
belief_timestamps_listlist[pd.Timestamp]: The simulated “now” timestamps when forecasts are issued, used to as the forecasts belief_time when saving to db.

Notes

The detailed semantics of how past/future covariates and targets are constructed for each split are documented in _generate_splits.