Generators¶

Base Class¶

Generator ¶

Generator(*, name: Optional[str] = None, debug: bool = False)

Bases: ABC

Abstract base class for all synthetic generation methods.

All generator implementations should inherit from this class. Follows the scikit-learn pattern: __init__ configures the algorithm, fit(Q_obs) learns from data, generate() produces synthetic flows.

Class Attributes

supports_multisite : bool Whether this generator supports multiple sites. Default False. supported_frequencies : tuple of str Pandas frequency strings this generator accepts (e.g., ('MS',)).

Initialize the generator with algorithm configuration.

Subclasses add algorithm-specific keyword-only parameters before name and debug. Data is not passed here — use fit(Q_obs) or preprocessing(Q_obs) instead.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name identifier for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`

is_fitted `property` ¶

is_fitted: bool

Check if generator is fitted.

is_preprocessed `property` ¶

is_preprocessed: bool

Check if preprocessing is complete.

n_sites `property` ¶

n_sites: int

Number of sites in the generator.

Returns:

Type	Description
`int`	Number of sites.

Raises:

Type	Description
`ValueError`	If preprocessing not yet run.

sites `property` ¶

sites: List[str]

List of site names.

Returns:

Type	Description
`List[str]`	Site identifiers.

Raises:

Type	Description
`ValueError`	If preprocessing not yet run.

output_frequency `abstractmethod` `property` ¶

output_frequency: str

Temporal frequency of generated output.

Returns:

Type	Description
`str`	Pandas frequency string (e.g., 'MS' for monthly, 'D' for daily).

validate_input_data ¶

validate_input_data(data: Union[Series, DataFrame]) -> pd.DataFrame

Validate and standardize input data format.

Checks type, DatetimeIndex, NaN content, negative values, data frequency, and minimum record length.

Parameters:

Name	Type	Description	Default
`data`	`Series or DataFrame`	Input time series data	required

Returns:

Type	Description
`DataFrame`	Validated and standardized data

Raises:

Type	Description
`ValueError`	If data format is invalid
`TypeError`	If data type is unsupported

validate_preprocessing ¶

validate_preprocessing() -> None

Check if preprocessing has been completed.

Raises:

Type	Description
`ValueError`	If preprocessing() has not been run.

validate_fit ¶

validate_fit() -> None

Check if generator has been fitted.

Raises:

Type	Description
`ValueError`	If fit() has not been run.

update_state ¶

update_state(preprocessed: Optional[bool] = None, fitted: Optional[bool] = None) -> None

Update generator state flags.

Parameters:

Name	Type	Description	Default
`preprocessed`	`bool`	Set preprocessing state.	`None`
`fitted`	`bool`	Set fitted state.	`None`

get_params ¶

get_params(deep: bool = True) -> Dict[str, Any]

Get initialization parameters (scikit-learn style).

Returns only constructor/configuration parameters, not fitted values. Following scikit-learn convention for compatibility.

Parameters:

Name	Type	Description	Default
`deep`	`bool`	If True, return deep copy of parameters.	`True`

Returns:

Type	Description
`Dict[str, Any]`	Dictionary of initialization parameters.

get_fitted_params ¶

get_fitted_params() -> Dict[str, Any]

Get parameters learned from data during fit().

Returns:

Type	Description
`Dict[str, Any]`	Dictionary of fitted parameters (all keys end with underscore).

Raises:

Type	Description
`ValueError`	If generator has not been fitted yet.

summary ¶

summary(show_fitted: bool = True) -> str

Generate comprehensive summary of generator configuration and fit.

Parameters:

Name	Type	Description	Default
`show_fitted`	`bool`	Whether to include fitted parameters in summary.	`True`

Returns:

Type	Description
`str`	Formatted summary string.

get_state_info ¶

get_state_info() -> Dict[str, Any]

Get complete state information including params and metadata.

Returns:

Type	Description
`Dict[str, Any]`	Dictionary containing all generator state, parameters, and metadata.

save ¶

save(filepath: str) -> None

Save fitted generator to file using pickle.

Parameters:

Name	Type	Description	Default
`filepath`	`str`	Path to save the generator.	required

Raises:

Type	Description
`ValueError`	If generator is not fitted.

load `classmethod` ¶

load(filepath: str) -> Generator

Load fitted generator from file.

Parameters:

Name	Type	Description	Default
`filepath`	`str`	Path to saved generator file.	required

Returns:

Type	Description
`Generator`	Loaded generator instance.

preprocessing `abstractmethod` ¶

preprocessing(Q_obs: Union[Series, DataFrame], *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Preprocess and validate observed flow data.

Implementations should: 1. Call _store_obs_data(Q_obs, sites) to validate and store data 2. Perform generator-specific data preparation 3. Call update_state(preprocessed=True) at end

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed historical flow data.	required
`sites`	`list of str`	Sites to use. If None, uses all columns.	`None`
`**kwargs`	`Any`	Additional preprocessing parameters.	`{}`

fit `abstractmethod` ¶

fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Fit the generator to observed flow data.

If Q_obs is provided, preprocessing() is called automatically. If omitted, a prior call to preprocessing() is required.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed data. If provided, runs preprocessing automatically.	`None`
`sites`	`list of str`	Sites to use (only when Q_obs is provided).	`None`
`**kwargs`	`Any`	Additional fitting parameters.	`{}`

generate `abstractmethod` ¶

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble

Generate synthetic streamflow realizations.

Implementations should: 1. Call validate_fit() at start 2. Set random seed if provided 3. Generate synthetic flows 4. Return Ensemble object containing all realizations

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of synthetic realizations to generate.	`1`
`n_years`	`int`	Number of years to generate (alternative to n_timesteps).	`None`
`n_timesteps`	`int`	Number of timesteps to generate explicitly.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`Any`	Additional generation parameters.	`{}`

Returns:

Type	Description
`Ensemble`	Generated synthetic flows as an Ensemble object.

KirschGenerator¶

KirschGenerator ¶

KirschGenerator(*, generate_using_log_flow=True, matrix_repair_method='spectral', name=None, debug=False, **kwargs)

Bases: Generator

Kirsch nonparametric bootstrap generator for monthly streamflow synthesis.

Generates monthly synthetic flows using bootstrap resampling with correlation preservation via Cholesky decomposition.

References

Kirsch, B.R., Characklis, G.W., and Zeff, H.B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements. Journal of Water Resources Planning and Management, 139(4), 396-406.

Initialize Kirsch generator.

Parameters:

Name	Type	Description	Default
`generate_using_log_flow`	`bool`	If True, generates in log-space for better handling of skewed distributions.	`True`
`matrix_repair_method`	`str`	Method for repairing non-positive-definite correlation matrices.	`'spectral'`
`name`	`str`	Name for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`

output_frequency `property` ¶

output_frequency: str

Kirsch generator produces monthly output.

Q_obs_monthly `property` ¶

Q_obs_monthly

Get observed monthly data (alias for Qm for consistency with other generators).

preprocessing ¶

preprocessing(Q_obs, *, sites=None, timestep='monthly', **kwargs)

Preprocess observed data for Kirsch generation.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame`	Observed historical flow data with DatetimeIndex.	required
`sites`	`list`	Sites to use. If None, uses all sites.	`None`
`timestep`	`str`	Currently only 'monthly' is supported.	`'monthly'`
`**kwargs`		Additional preprocessing parameters.	`{}`

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs)

Fit Kirsch generator to preprocessed data.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame`	If provided, calls preprocessing automatically.	`None`
`sites`	`list`	Sites to use (passed to preprocessing if Q_obs provided).	`None`
`**kwargs`		Additional fitting parameters.	`{}`

generate_single_series ¶

generate_single_series(n_years, M=None, as_array=True, synthetic_index=None, rng=None)

Generate a single synthetic time series.

Parameters:

Name	Type	Description	Default
`n_years`	`int`	Number of years for the synthetic time series.	required
`M`	`ndarray`	Bootstrap indices for the synthetic time series. If None, random indices will be generated.	`None`
`as_array`	`bool`	If True, returns a numpy array; if False, returns a pandas DataFrame.	`True`
`synthetic_index`	`DatetimeIndex`	Custom index for the synthetic time series. If None, a default index will be generated.	`None`

Returns:

Type	Description
`ndarray or DataFrame`	Synthetic time series data.

generate ¶

generate(n_realizations=1, n_years=None, n_timesteps=None, seed=None, **kwargs)

Generate an ensemble of synthetic monthly flows.

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of synthetic time series to generate.	`1`
`n_years`	`int`	Number of years for each synthetic time series. If None, uses the number of historic years.	`None`
`n_timesteps`	`int`	Not used (Kirsch generates by years, not timesteps).	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`		Additional generation parameters.	`{}`

Returns:

Type	Description
`Ensemble`	Ensemble object containing all generated realizations.

KNNBootstrapGenerator¶

KNNBootstrapGenerator ¶

KNNBootstrapGenerator(*, n_neighbors: Optional[int] = None, feature_cols: Optional[List[str]] = None, index_site: Optional[str] = None, block_size: int = 1, name: Optional[str] = None, debug: bool = False, **kwargs: Any)

Bases: Generator

K-Nearest Neighbor bootstrap generator for synthetic streamflow.

Conditionally resamples from historical record by finding K nearest neighbors to the current state and selecting successor values with Lall-Sharma kernel weights.

References

Lall, U., and Sharma, A. (1996). A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research, 32(3), 679-693.

Initialize KNN Bootstrap generator.

Parameters:

Name	Type	Description	Default
`n_neighbors`	`int`	Number of neighbors K. If None, uses ceil(sqrt(n)) where n is the number of historical timesteps.	`None`
`feature_cols`	`list`	Column names to use as features for KNN search. If None, uses all columns.	`None`
`index_site`	`str`	Site name to use for distance computation in multisite mode. If None, uses multivariate distance across all feature columns.	`None`
`block_size`	`int`	Number of consecutive timesteps to resample as a block (1 = standard KNN).	`1`
`name`	`str`	Name for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`
`**kwargs`	`Any`	Additional parameters (stored but not used).	`{}`

output_frequency `property` ¶

output_frequency: str

Return temporal frequency of generated output.

Detected from input data frequency (monthly or annual).

preprocessing ¶

preprocessing(Q_obs, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Preprocess and validate observed flow data.

Constructs feature vectors for KNN search and successor pairs. Also detects the temporal frequency of the data.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed historical flow data with DatetimeIndex.	required
`sites`	`list`	Sites to use. If None, uses all columns.	`None`
`**kwargs`	`Any`	Additional preprocessing parameters.	`{}`

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs: Any) -> None

Fit KNN model(s) to preprocessed data.

For monthly data, fits 12 separate KNN models — one per calendar month — so that the neighbor search is conditioned on month (Rajagopalan & Lall 1999). For annual or daily data, fits a single global model.

Also computes Lall-Sharma kernel weights for neighbor selection.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed historical flow data. If provided, preprocessing is called automatically.	`None`
`sites`	`list of str`	Sites to use (only when Q_obs is provided).	`None`
`**kwargs`	`Any`	Additional fitting parameters.	`{}`

generate ¶

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble

Generate synthetic streamflow realizations.

Uses KNN bootstrap with Lall-Sharma kernel weighting to conditionally resample from historical record.

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of synthetic realizations to generate.	`1`
`n_years`	`int`	Number of years to generate. If None, uses number of observed years.	`None`
`n_timesteps`	`int`	Number of timesteps to generate explicitly. Overrides n_years if provided.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`Any`	Additional generation parameters.	`{}`

Returns:

Type	Description
`Ensemble`	Generated synthetic flows with metadata.

PhaseRandomizationGenerator¶

PhaseRandomizationGenerator ¶

PhaseRandomizationGenerator(*, marginal: str = 'kappa', win_h_length: int = 15, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Phase randomization generator for synthetic streamflow using Brunner et al. (2019).

Generates synthetic daily streamflow time series using Fourier transform phase randomization combined with the four-parameter kappa distribution. The method preserves both short- and long-range temporal dependence by conserving the power spectrum while randomizing phases.

Attributes:

Name	Type	Description
`par_day_`	`dict`	Fitted kappa distribution parameters for each day of year (1-365). Each entry contains {'xi', 'alfa', 'k', 'h'}.
`modulus_`	`ndarray`	Amplitude spectrum (modulus of FFT) from fitted data.
`phases_`	`ndarray`	Phase spectrum from fitted data.
`norm_`	`ndarray`	Normalized/deseasonalized data after normal score transform.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.nonparametric import PhaseRandomizationGenerator
>>> Q_daily = pd.read_csv('daily_flows.csv', index_col=0, parse_dates=True)
>>> gen = PhaseRandomizationGenerator(marginal='kappa')
>>> gen.preprocessing(Q_daily)
>>> gen.fit()
>>> ensemble = gen.generate(n_realizations=100, seed=42)

Notes

Requires at least 2 years (730 days) of daily data
February 29 observations are removed to ensure consistent 365-day years
The method generates series of the same length as the observed data

Initialize the PhaseRandomizationGenerator.

Parameters:

Name	Type	Description	Default
`marginal`	`str`	Marginal distribution type for back-transformation: - 'kappa': Four-parameter kappa distribution (default, allows extrapolation) - 'empirical': Empirical distribution (no extrapolation beyond observed)	`'kappa'`
`win_h_length`	`int`	Half-window length for daily distribution fitting. Values within +-win_h_length days are used, giving a total window of 2*win_h_length+1 days.	`15`
`name`	`str`	Name identifier for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

output_frequency `property` ¶

output_frequency: str

Phase randomization generates daily output.

preprocessing ¶

preprocessing(Q_obs, *, sites=None, **kwargs) -> None

Preprocess observed data for phase randomization generation.

Validates input data, removes leap days, and creates day-of-year index.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed daily streamflow data with DatetimeIndex.	required
`sites`	`list`	Sites to keep. If None, uses all columns.	`None`
`**kwargs`	`dict`	Additional preprocessing parameters (currently unused).	`{}`

Raises:

Type	Description
`ValueError`	If data has fewer than 730 days or has missing days.

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit the phase randomization model to observed data.

This method: 1. Fits kappa distribution parameters for each day of year (if marginal='kappa') 2. Applies normal score transform per day of year 3. Computes FFT and extracts modulus/phases

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	If provided, calls preprocessing automatically.	`None`
`sites`	`list`	Sites to keep. Passed to preprocessing if Q_obs is provided.	`None`
`**kwargs`	`dict`	Additional fitting parameters (currently unused).	`{}`

generate ¶

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic streamflow realizations using phase randomization.

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of synthetic realizations to generate.	`1`
`n_years`	`int`	Not used (generates same length as observed data).	`None`
`n_timesteps`	`int`	Not used (generates same length as observed data).	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`dict`	Additional generation parameters (currently unused).	`{}`

Returns:

Type	Description
`Ensemble`	Generated synthetic flows as an Ensemble object.

ThomasFieringGenerator¶

ThomasFieringGenerator ¶

ThomasFieringGenerator(*, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Thomas-Fiering autoregressive model for monthly streamflow generation.

Generates synthetic monthly streamflows using a lag-1 autoregressive model with Stedinger-Taylor normalization. Preserves monthly means, standard deviations, and lag-1 serial correlations.

Note: Thomas-Fiering is a univariate method (single site only).

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generate.parametric.thomas_fiering import ThomasFieringGenerator
>>> Q_monthly = pd.read_csv('monthly_flows.csv', index_col=0, parse_dates=True)
>>> tf = ThomasFieringGenerator()
>>> tf.fit(Q_monthly.iloc[:, 0])
>>> ensemble = tf.generate(n_years=10, n_realizations=5)

References

Thomas, H.A., and Fiering, M.B. (1962). Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation.

Stedinger, J.R., and Taylor, M.R. (1982). Synthetic streamflow generation: 1. Model verification and validation. Water Resources Research, 18(4), 909-918.

Initialize the ThomasFieringGenerator.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

output_frequency `property` ¶

output_frequency: str

Thomas-Fiering generator produces monthly output.

preprocessing ¶

preprocessing(Q_obs, *, sites: Optional[list] = None, **kwargs) -> None

Preprocess observed data for Thomas-Fiering generation.

Validates input, resamples to monthly if needed, and applies Stedinger-Taylor normalization.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Streamflow data with DatetimeIndex. Must be single site.	required
`sites`	`list`	Not used (Thomas-Fiering is univariate).	`None`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Estimate Thomas-Fiering model parameters from normalized flows.

Calculates monthly means, standard deviations, and lag-1 serial correlations from normalized flows.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	If provided, calls preprocessing automatically.	`None`
`sites`	`list`	Sites to use (passed to preprocessing if Q_obs provided).	`None`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

generate ¶

generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic monthly streamflows.

Parameters:

Name	Type	Description	Default
`n_years`	`int`	Number of years to generate per realization. If None, uses the length of historic data.	`None`
`n_realizations`	`int`	Number of synthetic realizations to generate.	`1`
`n_timesteps`	`int`	Number of monthly timesteps to generate. If provided, overrides n_years.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

Returns:

Type	Description
`Ensemble`	Ensemble object containing all realizations.

Raises:

Type	Description
`ValueError`	If neither n_years nor n_timesteps is provided.

MATALASGenerator¶

MATALASGenerator ¶

MATALASGenerator(*, log_transform: bool = True, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Matalas (1967) multi-site monthly lag-1 autoregressive (MAR(1)) model.

The standard classical baseline for parametric multi-site stochastic generation. Extends the Thomas-Fiering univariate model to n sites using matrix autoregression, preserving contemporaneous cross-site correlations and lag-1 temporal structure at each site.

For each monthly transition m → m+1, generates:

Z(t+1) = A(m) · Z(t) + B(m) · ε(t+1)

where Z are standardized flows across all sites, ε ~ N(0, I), and A, B are coefficient matrices fitted from observed cross-correlations.

Parameters:

Name	Type	Description	Default
`log_transform`	`bool`	Apply log(Q + 1) transformation before standardization to reduce skewness and improve normality assumption.	`True`
`name`	`str`	Name for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`

Notes

The coefficient matrices are derived from the lag-0 and lag-1 cross-correlation matrices of the standardized flows:

A(m) = S₁(m) · S₀(m)⁻¹
B(m) · B(m)ᵀ = S₀(m+1) - A(m) · S₀(m) · A(m)ᵀ

where S₀(m) is the contemporaneous correlation matrix at month m and S₁(m) is the lag-1 cross-correlation between months m+1 and m. B(m) is the lower Cholesky factor of the residual covariance.

Examples:

>>> gen = MATALASGenerator(log_transform=True)
>>> gen.fit(Q_monthly)
>>> ensemble = gen.generate(n_years=100, n_realizations=50, seed=42)

References

Matalas, N. C. (1967). Mathematical assessment of synthetic hydrology. Water Resources Research, 3(4), 937–945.

Salas, J. D., Delleur, J. W., Yevjevich, V., & Lane, W. L. (1980). Applied Modeling of Hydrologic Time Series. Water Resources Publications.

preprocessing ¶

preprocessing(Q_obs, *, sites: Optional[list] = None, **kwargs) -> None

Validate input and resample to monthly frequency.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame or Series`	Monthly streamflow with DatetimeIndex. Columns are sites.	required
`sites`	`list`	Subset of site columns to use. Uses all columns if None.	`None`
`**kwargs`	`dict`	Unused.	`{}`

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Estimate MAR(1) coefficient matrices from observed monthly flows.

For each of the 12 monthly transitions, computes lag-0 (S0) and lag-1 (S1) cross-correlation matrices then solves for A and B.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame or Series`	If provided, calls preprocessing automatically.	`None`
`sites`	`list`	Sites to use (passed to preprocessing if Q_obs provided).	`None`
`**kwargs`	`dict`	Unused.	`{}`

generate ¶

generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic monthly streamflows at all sites.

Parameters:

Name	Type	Description	Default
`n_years`	`int`	Years per realization. Defaults to length of historic record.	`None`
`n_realizations`	`int`	Number of independent synthetic sequences.	`1`
`n_timesteps`	`int`	Total monthly timesteps; overrides n_years when provided.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`dict`	Unused.	`{}`

Returns:

Type	Description
`Ensemble`	Collection of synthetic realizations.

MultiSiteHMMGenerator¶

MultiSiteHMMGenerator ¶

MultiSiteHMMGenerator(*, n_states: int = 2, offset: float = 1.0, max_iterations: int = 1000, covariance_type: str = 'full', name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Multi-site Hidden Markov Model generator for synthetic streamflow.

Generates synthetic streamflow using a Gaussian Mixture Model HMM that models temporal dependencies through hidden states and spatial correlations through multivariate Gaussian emissions with state-specific covariance matrices.

The method is particularly suited for capturing drought dynamics across multiple sites/basins simultaneously.

Parameters:

Name	Type	Description	Default
`n_states`	`int`	Number of hidden states. Default is 2 (dry/wet states).	`2`
`offset`	`float`	Small value added before log transformation to handle zeros. Recommended: 1.0 for flows in standard units.	`1.0`
`max_iterations`	`int`	Maximum iterations for HMM fitting convergence.	`1000`
`covariance_type`	`str`	Type of covariance matrix: - 'full': Full covariance matrix (captures all correlations) - 'diag': Diagonal covariance (independent sites) - 'spherical': Single variance for all dimensions	`'full'`
`name`	`str`	Name identifier for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`

Attributes:

Name	Type	Description
`means_`	`ndarray`	State means for each site. Shape: (n_states, n_sites).
`covariances_`	`ndarray`	Covariance matrices for each state. Shape: (n_states, n_sites, n_sites).
`transition_matrix_`	`ndarray`	State transition probability matrix. Shape: (n_states, n_states).
`stationary_distribution_`	`ndarray`	Stationary distribution of states. Shape: (n_states,).
`Q_log_`	`ndarray`	Log-transformed observed flows used for fitting.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.parametric import MultiSiteHMMGenerator
>>>
>>> # Load multi-site annual flows
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>>
>>> # Initialize generator
>>> gen = MultiSiteHMMGenerator(n_states=2)
>>> gen.preprocessing(Q_annual)
>>> gen.fit()
>>>
>>> # Generate 100 realizations of 50 years each
>>> ensemble = gen.generate(n_realizations=100, n_years=50, seed=42)

Notes

Designed for annual timestep data (can handle other frequencies)
Log transformation ensures positive emissions
Full covariance preserves spatial correlations between sites
State ordering: states sorted by mean (low mean = dry state)

Initialize the MultiSiteHMMGenerator.

output_frequency `property` ¶

output_frequency: str

Output frequency matches input frequency.

Typically used for annual data ('YS' or 'AS'), but flexible.

preprocessing ¶

preprocessing(Q_obs, *, sites: Optional[List[str]] = None, **kwargs) -> None

Preprocess observed data for HMM fitting.

Applies offset and log transformation to handle zeros and ensure positive values for fitting.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed streamflow data with DatetimeIndex.	required
`sites`	`List[str]`	Subset of sites to use. If None, uses all columns.	`None`
`**kwargs`	`dict`	Additional preprocessing parameters (currently unused).	`{}`

Raises:

Type	Description
`ValueError`	If data has fewer than 2 sites for multi-site modeling.

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit the multi-site HMM to observed data.

Estimates hidden states, transition probabilities, state-specific means, and covariance matrices using the GMMHMM algorithm.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed streamflow data. If provided, preprocessing is called automatically.	`None`
`sites`	`list of str`	Sites to use (only when Q_obs is provided).	`None`
`**kwargs`	`dict`	Additional fitting parameters. May include `random_state` for reproducible fitting.	`{}`

Notes

States are automatically ordered by mean (ascending), so state 0 represents the dry state and higher-numbered states represent progressively wetter states.

generate ¶

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic streamflow realizations.

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of synthetic realizations to generate.	`1`
`n_years`	`int`	Number of years to generate. If provided with annual data, this equals n_timesteps.	`None`
`n_timesteps`	`int`	Number of timesteps to generate explicitly. Takes precedence over n_years if both provided.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`dict`	Additional generation parameters (currently unused).	`{}`

Returns:

Type	Description
`Ensemble`	Generated synthetic flows as an Ensemble object.

Raises:

Type	Description
`ValueError`	If neither n_years nor n_timesteps is provided.

WARMGenerator¶

WARMGenerator ¶

WARMGenerator(*, wavelet: str = 'morl', scales: int = 64, ar_order: int = 1, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Wavelet Auto-Regressive Method (WARM) for non-stationary streamflow generation.

Implements the 4-step WARM methodology: 1. Wavelet transform decomposition into periodic components 2. Scale Averaged Wavelet Power (SAWP) calculation for time-varying normalization 3. AR model fitting to scaled wavelet coefficients 4. Stochastic generation with inverse wavelet transform

The SAWP approach enables preservation of non-stationary spectral characteristics and time-varying variability in synthetic sequences.

Note: WARM is designed for annual streamflow generation (univariate).

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.parametric.warm import WARMGenerator
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>> warm = WARMGenerator(wavelet='morl', scales=64)
>>> warm.preprocessing(Q_annual.iloc[:, 0])
>>> warm.fit()
>>> ensemble = warm.generate(n_years=100, n_realizations=50, seed=42)

References

Nowak, K., Rajagopalan, B., & Zagona, E. (2011). A Wavelet Auto-Regressive Method (WARM) for multi-site streamflow simulation of data with non-stationary trends. Journal of Hydrology, 410(1-2), 1-12.

Kwon, H.-H., Lall, U., & Khalil, A. F. (2007). Stochastic simulation model for nonstationary time series using an autoregressive wavelet decomposition: Applications to rainfall and temperature. Water Resources Research, 43(5).

Initialize the WARM Generator.

Parameters:

Name	Type	Description	Default
`wavelet`	`str`	Wavelet type for continuous wavelet transform. Options: 'morl' (Morlet), 'mexh' (Mexican Hat), 'gaus1'-'gaus8'. Morlet wavelet recommended for hydrologic applications.	`'morl'`
`scales`	`int`	Number of scales for wavelet decomposition. Higher values capture more frequency components but increase computational cost.	`64`
`ar_order`	`int`	Order of autoregressive model for each wavelet scale. Default AR(1) preserves temporal persistence.	`1`
`name`	`str`	Name for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

Raises:

Type	Description
`ValueError`	If scales < 2 or ar_order < 1.

output_frequency `property` ¶

output_frequency: str

WARM generator produces annual output.

preprocessing ¶

preprocessing(Q_obs, *, sites=None, **kwargs) -> None

Preprocess observed data for WARM generation.

Validates input data and ensures annual frequency. WARM is designed for annual streamflow generation.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Annual streamflow data with DatetimeIndex.	required
`sites`	`list`	Sites to keep. If None, uses all columns.	`None`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit WARM model to observed annual flows.

Implements the 4-step WARM methodology: 1. Continuous wavelet transform 2. Scale Averaged Wavelet Power (SAWP) calculation 3. Normalization by SAWP 4. AR model fitting to scaled coefficients

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	If provided, calls preprocessing automatically.	`None`
`sites`	`list`	Sites to keep. Passed to preprocessing if Q_obs is provided.	`None`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

generate ¶

generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic annual streamflows using WARM.

Parameters:

Name	Type	Description	Default
`n_years`	`int`	Number of years to generate per realization. If None, uses the length of historic data.	`None`
`n_realizations`	`int`	Number of synthetic realizations to generate.	`1`
`n_timesteps`	`int`	Number of annual timesteps to generate. If provided, overrides n_years. For WARM, n_timesteps = n_years (annual data).	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`dict`	Additional parameters (currently unused).	`{}`

Returns:

Type	Description
`Ensemble`	Ensemble object containing all realizations.

Raises:

Type	Description
`ValueError`	If neither n_years nor n_timesteps is provided.

ARFIMAGenerator¶

ARFIMAGenerator ¶

ARFIMAGenerator(*, p: int = 1, q: int = 0, d_method: str = 'whittle', truncation_lag: int = 100, deseasonalize: bool = True, auto_order: bool = False, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Autoregressive Fractionally Integrated Moving Average (ARFIMA) generator for synthetic monthly/annual streamflow generation.

Generates synthetic streamflows using an ARFIMA model that captures long-range dependence through fractional differencing parameter d in (0, 0.5). The model preserves Hurst exponent, seasonal patterns (if monthly), and autocorrelation structure.

The Hurst exponent H relates to the fractional differencing parameter via H = d + 0.5, providing direct parameterization of long-memory behavior.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.parametric.arfima import ARFIMAGenerator
>>> Q_monthly = pd.read_csv('monthly_flows.csv', index_col=0, parse_dates=True)
>>> arfima = ARFIMAGenerator()
>>> arfima.preprocessing(Q_monthly.iloc[:, 0])
>>> arfima.fit()
>>> ensemble = arfima.generate(n_years=50, n_realizations=100)

References

Hosking, J.R.M. (1984). Modeling persistence in hydrological time series using fractional differencing. Water Resources Research, 20(12), 1898-1908. https://doi.org/10.1029/WR020i012p01898

Initialize the ARFIMAGenerator.

Parameters:

Name	Type	Description	Default
`p`	`int`	AR order for the short-memory ARMA(p,q) component.	`1`
`q`	`int`	MA order for the short-memory ARMA(p,q) component.	`0`
`d_method`	`str`	Method for estimating d: 'whittle' (frequency domain MLE), 'gph' (Geweke-Porter-Hudak), or 'rs' (R/S analysis).	`'whittle'`
`truncation_lag`	`int`	Truncation lag K for fractional differencing coefficients.	`100`
`deseasonalize`	`bool`	Remove seasonal component (monthly means/stds) before fitting. Set False for annual data.	`True`
`auto_order`	`bool`	If True, select (p, q) via BIC grid search over p in {0, 1, 2} and q in {0, 1, 2}. Overrides user-supplied p and q values. Uses BIC which is proven consistent for ARFIMA (Huang et al. 2022, Annals of Statistics).	`False`
`name`	`str`	Name identifier for this generator instance.	`None`
`debug`	`bool`	Enable debug logging.	`False`
`**kwargs`	`dict`	Additional parameters (stored in init_params).	`{}`

output_frequency `property` ¶

output_frequency: str

Return output frequency based on input data.

preprocessing ¶

preprocessing(Q_obs, *, sites=None, **kwargs) -> None

Preprocess observed data for ARFIMA generation.

Validates input, ensures univariate data, optionally deseasonalizes for monthly data, and checks stationarity.

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	Observed historical flow data.	required
`sites`	`list`	Sites to keep. If None, uses all columns.	`None`
`**kwargs`	`dict`	Additional preprocessing parameters.	`{}`

Raises:

Type	Description
`ValueError`	If data has insufficient length or multiple sites.

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Estimate ARFIMA model parameters from preprocessed data.

Sequence: 1. Estimate fractional differencing parameter d using specified method 2. Apply fractional differencing to obtain differenced series 3. Fit ARMA(p,q) to differenced series using Yule-Walker equations 4. Store all fitted parameters

Parameters:

Name	Type	Description	Default
`Q_obs`	`Series or DataFrame`	If provided, calls preprocessing automatically.	`None`
`sites`	`list`	Sites to keep. Passed to preprocessing if Q_obs is provided.	`None`
`**kwargs`	`dict`	Additional fitting parameters.	`{}`

Raises:

Type	Description
`ValueError`	If fitting fails (e.g., ARMA estimation error).

generate ¶

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic streamflow realizations.

Sequence: 1. Generate white noise innovations 2. Apply AR recursion to obtain ARMA differenced series W_t 3. Invert fractional differencing via MA convolution (FIR filter) to recover X_t 4. Re-seasonalize if monthly 5. Return as Ensemble

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of synthetic realizations to generate.	`1`
`n_years`	`int`	Number of years to generate. If None, uses length of training data.	`None`
`n_timesteps`	`int`	Number of timesteps to generate. Overrides n_years if provided.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`
`**kwargs`	`dict`	Additional parameters (unused).	`{}`

Returns:

Type	Description
`Ensemble`	Generated synthetic flows as an Ensemble object.

Raises:

Type	Description
`ValueError`	If neither n_years nor n_timesteps is provided.

GaussianCopulaGenerator¶

GaussianCopulaGenerator ¶

GaussianCopulaGenerator(*, copula_type: str = 'gaussian', marginal_method: str = 'parametric', log_transform: bool = False, offset: float = 1.0, matrix_repair_method: str = 'spectral', name: Optional[str] = None, debug: bool = False)

Bases: Generator

Multi-site monthly generator based on elliptical copulas.

Fits per-(month, site) marginal distributions and an n_sites x n_sites copula correlation matrix per calendar month. Supports Gaussian copula (zero tail dependence) and Student-t copula (symmetric tail dependence).

Parameters:

Name	Type	Description	Default
`copula_type`	`str`	`"gaussian"` or `"t"`. The t-copula adds a degrees-of-freedom parameter that controls symmetric tail dependence.	`"gaussian"`
`marginal_method`	`str`	`"parametric"` fits gamma and log-normal per (month, site) and selects the winner by BIC. `"empirical"` uses the Hazen plotting- position CDF via :class:`NormalScoreTransform`.	`"parametric"`
`log_transform`	`bool`	Apply `log(Q + offset)` before fitting. Usually unnecessary with parametric marginals but may help empirical marginals.	`False`
`offset`	`float`	Additive offset for the log transform.	`1.0`
`matrix_repair_method`	`str`	Method passed to :func:`repair_correlation_matrix`.	`"spectral"`
`name`	`str`	Instance name.	`None`
`debug`	`bool`	Enable debug logging.	`False`

References

Genest & Favre (2007), Chen et al. (2015), Tootoonchi et al. (2022).

preprocessing ¶

preprocessing(Q_obs, *, sites: Optional[List[str]] = None, **kwargs) -> None

Validate input and prepare monthly flow data.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame or Series`	Monthly streamflow with DatetimeIndex.	required
`sites`	`list of str`	Subset of sites to use.	`None`

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit marginal distributions and copula correlation structure.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame or Series`	If provided, calls preprocessing automatically.	`None`
`sites`	`list of str`	Sites to use.	`None`

generate ¶

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic monthly streamflow realizations.

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of independent realizations.	`1`
`n_years`	`int`	Years per realization. Defaults to length of the historic record.	`None`
`n_timesteps`	`int`	Total monthly timesteps; overrides n_years if provided.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`Ensemble`	Synthetic flow realizations.

VineCopulaGenerator¶

VineCopulaGenerator ¶

VineCopulaGenerator(*, vine_type: str = 'rvine', family_set: Union[str, List[str]] = 'all', selection_criterion: str = 'aic', marginal_method: str = 'parametric', log_transform: bool = False, offset: float = 1.0, trunc_level: Optional[int] = None, name: Optional[str] = None, debug: bool = False)

Bases: Generator

Multi-site monthly generator based on vine copulas.

Fits per-(month, site) marginal distributions and a vine copula per calendar month on the PAR(1) residuals. Supports R-vine, C-vine, and D-vine structures with automatic family and structure selection via pyvinecopulib.

Parameters:

Name	Type	Description	Default
`vine_type`	`str`	`"rvine"`, `"cvine"`, or `"dvine"`. Controls the vine tree structure constraint.	`"rvine"`
`family_set`	`str or list of str`	Bivariate copula families to consider. `"all"` uses all available parametric families. A list of family names can also be provided (e.g., `["gaussian", "clayton", "gumbel", "frank"]`).	`"all"`
`selection_criterion`	`str`	`"aic"` or `"bic"` for bivariate copula family selection at each edge.	`"aic"`
`marginal_method`	`str`	`"parametric"` fits gamma and log-normal per (month, site) and selects the winner by BIC. `"empirical"` uses the Hazen plotting-position CDF via :class:`NormalScoreTransform`.	`"parametric"`
`log_transform`	`bool`	Apply `log(Q + offset)` before fitting.	`False`
`offset`	`float`	Additive offset for the log transform.	`1.0`
`trunc_level`	`int or None`	Truncation level for the vine tree. `None` means no truncation (all trees are fitted). Setting `trunc_level=1` truncates after the first tree, replacing higher trees with independence copulas.	`None`
`name`	`str`	Instance name.	`None`
`debug`	`bool`	Enable debug logging.	`False`

References

Yu et al. (2025), Wang & Shen (2023), Wang et al. (2024), Pereira et al. (2017).

preprocessing ¶

preprocessing(Q_obs, *, sites: Optional[List[str]] = None, **kwargs) -> None

Validate input and prepare monthly flow data.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame or Series`	Monthly streamflow with DatetimeIndex.	required
`sites`	`list of str`	Subset of sites to use.	`None`

fit ¶

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit marginal distributions and vine copula dependence structure.

Parameters:

Name	Type	Description	Default
`Q_obs`	`DataFrame or Series`	If provided, calls preprocessing automatically.	`None`
`sites`	`list of str`	Sites to use.	`None`

generate ¶

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic monthly streamflow realizations.

Parameters:

Name	Type	Description	Default
`n_realizations`	`int`	Number of independent realizations.	`1`
`n_years`	`int`	Years per realization. Defaults to length of the historic record.	`None`
`n_timesteps`	`int`	Total monthly timesteps; overrides n_years if provided.	`None`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`Ensemble`	Synthetic flow realizations.

Generators¶

Base Class¶

Generator ¶

is_fitted property ¶

is_preprocessed property ¶

n_sites property ¶

sites property ¶

output_frequency abstractmethod property ¶

validate_input_data ¶

validate_preprocessing ¶

validate_fit ¶

update_state ¶

get_params ¶

get_fitted_params ¶

summary ¶

get_state_info ¶

save ¶

load classmethod ¶

preprocessing abstractmethod ¶

fit abstractmethod ¶

generate abstractmethod ¶

KirschGenerator¶

KirschGenerator ¶

output_frequency property ¶

Q_obs_monthly property ¶

preprocessing ¶

fit ¶

generate_single_series ¶

generate ¶

KNNBootstrapGenerator¶

KNNBootstrapGenerator ¶

output_frequency property ¶

preprocessing ¶

fit ¶

generate ¶

PhaseRandomizationGenerator¶

PhaseRandomizationGenerator ¶

output_frequency property ¶

preprocessing ¶

fit ¶

generate ¶

ThomasFieringGenerator¶

ThomasFieringGenerator ¶

output_frequency property ¶

preprocessing ¶

fit ¶

generate ¶

MATALASGenerator¶

MATALASGenerator ¶

preprocessing ¶

fit ¶

generate ¶

MultiSiteHMMGenerator¶

MultiSiteHMMGenerator ¶

output_frequency property ¶

preprocessing ¶

fit ¶

generate ¶

WARMGenerator¶

WARMGenerator ¶

output_frequency property ¶

preprocessing ¶

fit ¶

generate ¶

ARFIMAGenerator¶

ARFIMAGenerator ¶

output_frequency property ¶

preprocessing ¶

fit ¶

generate ¶

GaussianCopulaGenerator¶

GaussianCopulaGenerator ¶

preprocessing ¶

fit ¶

generate ¶

VineCopulaGenerator¶

VineCopulaGenerator ¶

preprocessing ¶

fit ¶

generate ¶

is_fitted `property` ¶

is_preprocessed `property` ¶

n_sites `property` ¶

sites `property` ¶

output_frequency `abstractmethod` `property` ¶

load `classmethod` ¶

preprocessing `abstractmethod` ¶

fit `abstractmethod` ¶

generate `abstractmethod` ¶

output_frequency `property` ¶

Q_obs_monthly `property` ¶

output_frequency `property` ¶

output_frequency `property` ¶

output_frequency `property` ¶

output_frequency `property` ¶

output_frequency `property` ¶

output_frequency `property` ¶