Skip to content

Generators

Base Class

Generator

Generator(*, name: Optional[str] = None, debug: bool = False)

Bases: ABC

Abstract base class for all synthetic generation methods.

All generator implementations should inherit from this class. Follows the scikit-learn pattern: __init__ configures the algorithm, fit(Q_obs) learns from data, generate() produces synthetic flows.

Class Attributes

supports_multisite : bool Whether this generator supports multiple sites. Default False. supported_frequencies : tuple of str Pandas frequency strings this generator accepts (e.g., ('MS',)).

Initialize the generator with algorithm configuration.

Subclasses add algorithm-specific keyword-only parameters before name and debug. Data is not passed here — use fit(Q_obs) or preprocessing(Q_obs) instead.

Parameters:

Name Type Description Default
name str

Name identifier for this generator instance.

None
debug bool

Enable debug logging.

False

is_fitted property

is_fitted: bool

Check if generator is fitted.

is_preprocessed property

is_preprocessed: bool

Check if preprocessing is complete.

n_sites property

n_sites: int

Number of sites in the generator.

Returns:

Type Description
int

Number of sites.

Raises:

Type Description
ValueError

If preprocessing not yet run.

sites property

sites: List[str]

List of site names.

Returns:

Type Description
List[str]

Site identifiers.

Raises:

Type Description
ValueError

If preprocessing not yet run.

output_frequency abstractmethod property

output_frequency: str

Temporal frequency of generated output.

Returns:

Type Description
str

Pandas frequency string (e.g., 'MS' for monthly, 'D' for daily).

validate_input_data

validate_input_data(data: Union[Series, DataFrame]) -> pd.DataFrame

Validate and standardize input data format.

Checks type, DatetimeIndex, NaN content, negative values, data frequency, and minimum record length.

Parameters:

Name Type Description Default
data Series or DataFrame

Input time series data

required

Returns:

Type Description
DataFrame

Validated and standardized data

Raises:

Type Description
ValueError

If data format is invalid

TypeError

If data type is unsupported

validate_preprocessing

validate_preprocessing() -> None

Check if preprocessing has been completed.

Raises:

Type Description
ValueError

If preprocessing() has not been run.

validate_fit

validate_fit() -> None

Check if generator has been fitted.

Raises:

Type Description
ValueError

If fit() has not been run.

update_state

update_state(preprocessed: Optional[bool] = None, fitted: Optional[bool] = None) -> None

Update generator state flags.

Parameters:

Name Type Description Default
preprocessed bool

Set preprocessing state.

None
fitted bool

Set fitted state.

None

get_params

get_params(deep: bool = True) -> Dict[str, Any]

Get initialization parameters (scikit-learn style).

Returns only constructor/configuration parameters, not fitted values. Following scikit-learn convention for compatibility.

Parameters:

Name Type Description Default
deep bool

If True, return deep copy of parameters.

True

Returns:

Type Description
Dict[str, Any]

Dictionary of initialization parameters.

get_fitted_params

get_fitted_params() -> Dict[str, Any]

Get parameters learned from data during fit().

Returns:

Type Description
Dict[str, Any]

Dictionary of fitted parameters (all keys end with underscore).

Raises:

Type Description
ValueError

If generator has not been fitted yet.

summary

summary(show_fitted: bool = True) -> str

Generate comprehensive summary of generator configuration and fit.

Parameters:

Name Type Description Default
show_fitted bool

Whether to include fitted parameters in summary.

True

Returns:

Type Description
str

Formatted summary string.

get_state_info

get_state_info() -> Dict[str, Any]

Get complete state information including params and metadata.

Returns:

Type Description
Dict[str, Any]

Dictionary containing all generator state, parameters, and metadata.

save

save(filepath: str) -> None

Save fitted generator to file using pickle.

Parameters:

Name Type Description Default
filepath str

Path to save the generator.

required

Raises:

Type Description
ValueError

If generator is not fitted.

load classmethod

load(filepath: str) -> Generator

Load fitted generator from file.

Parameters:

Name Type Description Default
filepath str

Path to saved generator file.

required

Returns:

Type Description
Generator

Loaded generator instance.

preprocessing abstractmethod

preprocessing(Q_obs: Union[Series, DataFrame], *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Preprocess and validate observed flow data.

Implementations should: 1. Call _store_obs_data(Q_obs, sites) to validate and store data 2. Perform generator-specific data preparation 3. Call update_state(preprocessed=True) at end

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed historical flow data.

required
sites list of str

Sites to use. If None, uses all columns.

None
**kwargs Any

Additional preprocessing parameters.

{}

fit abstractmethod

fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Fit the generator to observed flow data.

If Q_obs is provided, preprocessing() is called automatically. If omitted, a prior call to preprocessing() is required.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed data. If provided, runs preprocessing automatically.

None
sites list of str

Sites to use (only when Q_obs is provided).

None
**kwargs Any

Additional fitting parameters.

{}

generate abstractmethod

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble

Generate synthetic streamflow realizations.

Implementations should: 1. Call validate_fit() at start 2. Set random seed if provided 3. Generate synthetic flows 4. Return Ensemble object containing all realizations

Parameters:

Name Type Description Default
n_realizations int

Number of synthetic realizations to generate.

1
n_years int

Number of years to generate (alternative to n_timesteps).

None
n_timesteps int

Number of timesteps to generate explicitly.

None
seed int

Random seed for reproducibility.

None
**kwargs Any

Additional generation parameters.

{}

Returns:

Type Description
Ensemble

Generated synthetic flows as an Ensemble object.


KirschGenerator

KirschGenerator

KirschGenerator(*, generate_using_log_flow=True, matrix_repair_method='spectral', name=None, debug=False, **kwargs)

Bases: Generator

Kirsch nonparametric bootstrap generator for monthly streamflow synthesis.

Generates monthly synthetic flows using bootstrap resampling with correlation preservation via Cholesky decomposition.

References

Kirsch, B.R., Characklis, G.W., and Zeff, H.B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements. Journal of Water Resources Planning and Management, 139(4), 396-406.

Initialize Kirsch generator.

Parameters:

Name Type Description Default
generate_using_log_flow bool

If True, generates in log-space for better handling of skewed distributions.

True
matrix_repair_method str

Method for repairing non-positive-definite correlation matrices.

'spectral'
name str

Name for this generator instance.

None
debug bool

Enable debug logging.

False

output_frequency property

output_frequency: str

Kirsch generator produces monthly output.

Q_obs_monthly property

Q_obs_monthly

Get observed monthly data (alias for Qm for consistency with other generators).

preprocessing

preprocessing(Q_obs, *, sites=None, timestep='monthly', **kwargs)

Preprocess observed data for Kirsch generation.

Parameters:

Name Type Description Default
Q_obs DataFrame

Observed historical flow data with DatetimeIndex.

required
sites list

Sites to use. If None, uses all sites.

None
timestep str

Currently only 'monthly' is supported.

'monthly'
**kwargs

Additional preprocessing parameters.

{}

fit

fit(Q_obs=None, *, sites=None, **kwargs)

Fit Kirsch generator to preprocessed data.

Parameters:

Name Type Description Default
Q_obs DataFrame

If provided, calls preprocessing automatically.

None
sites list

Sites to use (passed to preprocessing if Q_obs provided).

None
**kwargs

Additional fitting parameters.

{}

generate_from_indices

generate_from_indices(indices, n_years=None, as_array=True, synthetic_index=None)

Generate synthetic flows by directly specifying historical year indices.

This method allows external code (e.g., MOEA-FIND) to inject decision variables (year indices) instead of random sampling. Runs the full post-bootstrap pipeline: Cholesky, normal-score inversion, re-seasonalization.

Parameters:

Name Type Description Default
indices ndarray

Array of historical year indices to resample. Shape (n_years+1, 12) where each entry is in [0, n_historic_years). The extra year allows Dec-Jan cross-year correlation handling. Can be floats (will be cast to int).

required
n_years int

Number of years for the synthetic output. If None, inferred from indices.shape[0] - 1.

None
as_array bool

If True, returns numpy array; if False, returns pandas DataFrame.

True
synthetic_index DatetimeIndex

Custom DatetimeIndex for the output. If None, a default index is generated.

None

Returns:

Type Description
ndarray or DataFrame

Synthetic monthly flows with shape (n_years * 12, n_sites) if as_array=True, otherwise a pandas DataFrame.

Notes

This method assumes the generator has been fitted. Indices are treated as indices into the historic years array (self.historic_years or [0, 1, ..., n-1]).

generate_from_residuals

generate_from_residuals(residuals, as_array=True, synthetic_index=None)

Generate synthetic flows from pre-computed standardized residuals.

This method allows external code (e.g., MOEA-FIND) to inject decision variables (standardized residuals) directly, bypassing the bootstrap resampling step. Runs steps 4-8 of the Kirsch pipeline: normal-score transform, Cholesky, inverse normal-score, Dec-Jan combination, and re-seasonalization.

Parameters:

Name Type Description Default
residuals ndarray

Array of standardized residuals with shape (n_years, 12, n_sites). Each residual should be approximately N(0,1) or representable as such within month-specific empirical distributions.

required
as_array bool

If True, returns numpy array; if False, returns pandas DataFrame.

True
synthetic_index DatetimeIndex

Custom DatetimeIndex for the output. If None, a default index is generated.

None

Returns:

Type Description
ndarray or DataFrame

Synthetic monthly flows with shape (n_years * 12, n_sites) if as_array=True, otherwise a pandas DataFrame.

Notes

This method assumes the generator has been fitted. Residuals are assumed to be standardized residuals; they will be normal-score transformed, processed through Cholesky factors, and combined to preserve Dec-Jan correlations.

generate_single_series

generate_single_series(n_years, M=None, as_array=True, synthetic_index=None, rng=None)

Generate a single synthetic time series.

Parameters:

Name Type Description Default
n_years int

Number of years for the synthetic time series.

required
M ndarray

Bootstrap indices for the synthetic time series. If None, random indices will be generated.

None
as_array bool

If True, returns a numpy array; if False, returns a pandas DataFrame.

True
synthetic_index DatetimeIndex

Custom index for the synthetic time series. If None, a default index will be generated.

None

Returns:

Type Description
ndarray or DataFrame

Synthetic time series data.

generate

generate(n_realizations=1, n_years=None, n_timesteps=None, seed=None, **kwargs)

Generate an ensemble of synthetic monthly flows.

Parameters:

Name Type Description Default
n_realizations int

Number of synthetic time series to generate.

1
n_years int

Number of years for each synthetic time series. If None, uses the number of historic years.

None
n_timesteps int

Not used (Kirsch generates by years, not timesteps).

None
seed int

Random seed for reproducibility.

None
**kwargs

Additional generation parameters.

{}

Returns:

Type Description
Ensemble

Ensemble object containing all generated realizations.


KNNBootstrapGenerator

KNNBootstrapGenerator

KNNBootstrapGenerator(*, n_neighbors: Optional[int] = None, feature_cols: Optional[List[str]] = None, index_site: Optional[str] = None, block_size: int = 1, name: Optional[str] = None, debug: bool = False, **kwargs: Any)

Bases: Generator

K-Nearest Neighbor bootstrap generator for synthetic streamflow.

Conditionally resamples from historical record by finding K nearest neighbors to the current state and selecting successor values with Lall-Sharma kernel weights.

References

Lall, U., and Sharma, A. (1996). A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research, 32(3), 679-693.

Initialize KNN Bootstrap generator.

Parameters:

Name Type Description Default
n_neighbors int

Number of neighbors K. If None, uses ceil(sqrt(n)) where n is the number of historical timesteps.

None
feature_cols list

Column names to use as features for KNN search. If None, uses all columns.

None
index_site str

Site name to use for distance computation in multisite mode. If None, uses multivariate distance across all feature columns.

None
block_size int

Number of consecutive timesteps to resample as a block (1 = standard KNN).

1
name str

Name for this generator instance.

None
debug bool

Enable debug logging.

False
**kwargs Any

Additional parameters (stored but not used).

{}

output_frequency property

output_frequency: str

Return temporal frequency of generated output.

Detected from input data frequency (monthly or annual).

preprocessing

preprocessing(Q_obs, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Preprocess and validate observed flow data.

Constructs feature vectors for KNN search and successor pairs. Also detects the temporal frequency of the data.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed historical flow data with DatetimeIndex.

required
sites list

Sites to use. If None, uses all columns.

None
**kwargs Any

Additional preprocessing parameters.

{}

fit

fit(Q_obs=None, *, sites=None, **kwargs: Any) -> None

Fit KNN model(s) to preprocessed data.

For monthly data, fits 12 separate KNN models — one per calendar month — so that the neighbor search is conditioned on month (Rajagopalan & Lall 1999). For annual or daily data, fits a single global model.

Also computes Lall-Sharma kernel weights for neighbor selection.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed historical flow data. If provided, preprocessing is called automatically.

None
sites list of str

Sites to use (only when Q_obs is provided).

None
**kwargs Any

Additional fitting parameters.

{}

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble

Generate synthetic streamflow realizations.

Uses KNN bootstrap with Lall-Sharma kernel weighting to conditionally resample from historical record.

Parameters:

Name Type Description Default
n_realizations int

Number of synthetic realizations to generate.

1
n_years int

Number of years to generate. If None, uses number of observed years.

None
n_timesteps int

Number of timesteps to generate explicitly. Overrides n_years if provided.

None
seed int

Random seed for reproducibility.

None
**kwargs Any

Additional generation parameters.

{}

Returns:

Type Description
Ensemble

Generated synthetic flows with metadata.


PhaseRandomizationGenerator

PhaseRandomizationGenerator

PhaseRandomizationGenerator(*, marginal: str = 'kappa', win_h_length: int = 15, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Phase randomization generator for synthetic streamflow using Brunner et al. (2019).

Generates synthetic daily streamflow time series using Fourier transform phase randomization combined with the four-parameter kappa distribution. The method preserves both short- and long-range temporal dependence by conserving the power spectrum while randomizing phases.

Attributes:

Name Type Description
par_day_ dict

Fitted kappa distribution parameters for each day of year (1-365). Each entry contains {'xi', 'alfa', 'k', 'h'}.

modulus_ ndarray

Amplitude spectrum (modulus of FFT) from fitted data.

phases_ ndarray

Phase spectrum from fitted data.

norm_ ndarray

Normalized/deseasonalized data after normal score transform.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.nonparametric import PhaseRandomizationGenerator
>>> Q_daily = pd.read_csv('daily_flows.csv', index_col=0, parse_dates=True)
>>> gen = PhaseRandomizationGenerator(marginal='kappa')
>>> gen.preprocessing(Q_daily)
>>> gen.fit()
>>> ensemble = gen.generate(n_realizations=100, seed=42)
Notes
  • Requires at least 2 years (730 days) of daily data
  • February 29 observations are removed to ensure consistent 365-day years
  • The method generates series of the same length as the observed data

Initialize the PhaseRandomizationGenerator.

Parameters:

Name Type Description Default
marginal str

Marginal distribution type for back-transformation: - 'kappa': Four-parameter kappa distribution (default, allows extrapolation) - 'empirical': Empirical distribution (no extrapolation beyond observed)

'kappa'
win_h_length int

Half-window length for daily distribution fitting. Values within +-win_h_length days are used, giving a total window of 2*win_h_length+1 days.

15
name str

Name identifier for this generator instance.

None
debug bool

Enable debug logging.

False
**kwargs dict

Additional parameters (currently unused).

{}

output_frequency property

output_frequency: str

Phase randomization generates daily output.

preprocessing

preprocessing(Q_obs, *, sites=None, **kwargs) -> None

Preprocess observed data for phase randomization generation.

Validates input data, removes leap days, and creates day-of-year index.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed daily streamflow data with DatetimeIndex.

required
sites list

Sites to keep. If None, uses all columns.

None
**kwargs dict

Additional preprocessing parameters (currently unused).

{}

Raises:

Type Description
ValueError

If data has fewer than 730 days or has missing days.

fit

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit the phase randomization model to observed data.

This method: 1. Fits kappa distribution parameters for each day of year (if marginal='kappa') 2. Applies normal score transform per day of year 3. Computes FFT and extracts modulus/phases

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

If provided, calls preprocessing automatically.

None
sites list

Sites to keep. Passed to preprocessing if Q_obs is provided.

None
**kwargs dict

Additional fitting parameters (currently unused).

{}

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic streamflow realizations using phase randomization.

Parameters:

Name Type Description Default
n_realizations int

Number of synthetic realizations to generate.

1
n_years int

Target length of each realization in years (365-day years, no leap days). When provided, independent phase-randomized chunks are concatenated until the target length is reached, then trimmed. When None the output length equals the observed record length.

None
n_timesteps int

Not used. Length is controlled via n_years.

None
seed int

Random seed for reproducibility.

None
**kwargs dict

Additional generation parameters (currently unused).

{}

Returns:

Type Description
Ensemble

Generated synthetic flows as an Ensemble object.


ThomasFieringGenerator

ThomasFieringGenerator

ThomasFieringGenerator(*, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Thomas-Fiering autoregressive model for monthly streamflow generation.

Generates synthetic monthly streamflows using a lag-1 autoregressive model with Stedinger-Taylor normalization. Preserves monthly means, standard deviations, and lag-1 serial correlations.

Note: Thomas-Fiering is a univariate method (single site only).

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generate.parametric.thomas_fiering import ThomasFieringGenerator
>>> Q_monthly = pd.read_csv('monthly_flows.csv', index_col=0, parse_dates=True)
>>> tf = ThomasFieringGenerator()
>>> tf.fit(Q_monthly.iloc[:, 0])
>>> ensemble = tf.generate(n_years=10, n_realizations=5)
References

Thomas, H.A., and Fiering, M.B. (1962). Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation.

Stedinger, J.R., and Taylor, M.R. (1982). Synthetic streamflow generation: 1. Model verification and validation. Water Resources Research, 18(4), 909-918.

Initialize the ThomasFieringGenerator.

Parameters:

Name Type Description Default
name str

Name for this generator instance.

None
debug bool

Enable debug logging.

False
**kwargs dict

Additional parameters (currently unused).

{}

output_frequency property

output_frequency: str

Thomas-Fiering generator produces monthly output.

preprocessing

preprocessing(Q_obs, *, sites: Optional[list] = None, **kwargs) -> None

Preprocess observed data for Thomas-Fiering generation.

Validates input, resamples to monthly if needed, and applies Stedinger-Taylor normalization.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Streamflow data with DatetimeIndex. Must be single site.

required
sites list

Not used (Thomas-Fiering is univariate).

None
**kwargs dict

Additional parameters (currently unused).

{}

fit

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Estimate Thomas-Fiering model parameters from normalized flows.

Calculates monthly means, standard deviations, and lag-1 serial correlations from normalized flows.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

If provided, calls preprocessing automatically.

None
sites list

Sites to use (passed to preprocessing if Q_obs provided).

None
**kwargs dict

Additional parameters (currently unused).

{}

generate

generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic monthly streamflows.

Parameters:

Name Type Description Default
n_years int

Number of years to generate per realization. If None, uses the length of historic data.

None
n_realizations int

Number of synthetic realizations to generate.

1
n_timesteps int

Number of monthly timesteps to generate. If provided, overrides n_years.

None
seed int

Random seed for reproducibility.

None
**kwargs dict

Additional parameters (currently unused).

{}

Returns:

Type Description
Ensemble

Ensemble object containing all realizations.

Raises:

Type Description
ValueError

If neither n_years nor n_timesteps is provided.


MatalasGenerator

MatalasGenerator

MatalasGenerator(*, log_transform: bool = True, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Matalas (1967) multi-site monthly lag-1 autoregressive (MAR(1)) model.

The standard classical baseline for parametric multi-site stochastic generation. Extends the Thomas-Fiering univariate model to n sites using matrix autoregression, preserving contemporaneous cross-site correlations and lag-1 temporal structure at each site.

For each monthly transition m → m+1, generates:

Z(t+1) = A(m) · Z(t) + B(m) · ε(t+1)

where Z are standardized flows across all sites, ε ~ N(0, I), and A, B are coefficient matrices fitted from observed cross-correlations.

Parameters:

Name Type Description Default
log_transform bool

Apply log(Q + 1) transformation before standardization to reduce skewness and improve normality assumption.

True
name str

Name for this generator instance.

None
debug bool

Enable debug logging.

False
Notes

The coefficient matrices are derived from the lag-0 and lag-1 cross-correlation matrices of the standardized flows:

A(m) = S₁(m) · S₀(m)⁻¹
B(m) · B(m)ᵀ = S₀(m+1) - A(m) · S₀(m) · A(m)ᵀ

where S₀(m) is the contemporaneous correlation matrix at month m and S₁(m) is the lag-1 cross-correlation between months m+1 and m. B(m) is the lower Cholesky factor of the residual covariance.

Examples:

>>> gen = MatalasGenerator(log_transform=True)
>>> gen.fit(Q_monthly)
>>> ensemble = gen.generate(n_years=100, n_realizations=50, seed=42)
References

Matalas, N. C. (1967). Mathematical assessment of synthetic hydrology. Water Resources Research, 3(4), 937–945.

Salas, J. D., Delleur, J. W., Yevjevich, V., & Lane, W. L. (1980). Applied Modeling of Hydrologic Time Series. Water Resources Publications.

preprocessing

preprocessing(Q_obs, *, sites: Optional[list] = None, **kwargs) -> None

Validate input and resample to monthly frequency.

Parameters:

Name Type Description Default
Q_obs DataFrame or Series

Monthly streamflow with DatetimeIndex. Columns are sites.

required
sites list

Subset of site columns to use. Uses all columns if None.

None
**kwargs dict

Unused.

{}

fit

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Estimate MAR(1) coefficient matrices from observed monthly flows.

For each of the 12 monthly transitions, computes lag-0 (S0) and lag-1 (S1) cross-correlation matrices then solves for A and B.

Parameters:

Name Type Description Default
Q_obs DataFrame or Series

If provided, calls preprocessing automatically.

None
sites list

Sites to use (passed to preprocessing if Q_obs provided).

None
**kwargs dict

Unused.

{}

generate

generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic monthly streamflows at all sites.

Parameters:

Name Type Description Default
n_years int

Years per realization. Defaults to length of historic record.

None
n_realizations int

Number of independent synthetic sequences.

1
n_timesteps int

Total monthly timesteps; overrides n_years when provided.

None
seed int

Random seed for reproducibility.

None
**kwargs dict

Unused.

{}

Returns:

Type Description
Ensemble

Collection of synthetic realizations.


MultiSiteHMMGenerator

MultiSiteHMMGenerator

MultiSiteHMMGenerator(*, n_states: int = 2, offset: float = 1.0, max_iterations: int = 1000, covariance_type: str = 'full', name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Multi-site Hidden Markov Model generator for synthetic streamflow.

Generates synthetic streamflow using a Gaussian Mixture Model HMM that models temporal dependencies through hidden states and spatial correlations through multivariate Gaussian emissions with state-specific covariance matrices.

The method is particularly suited for capturing drought dynamics across multiple sites/basins simultaneously.

Parameters:

Name Type Description Default
n_states int

Number of hidden states. Default is 2 (dry/wet states).

2
offset float

Small value added before log transformation to handle zeros. Recommended: 1.0 for flows in standard units.

1.0
max_iterations int

Maximum iterations for HMM fitting convergence.

1000
covariance_type str

Type of covariance matrix: - 'full': Full covariance matrix (captures all correlations) - 'diag': Diagonal covariance (independent sites) - 'spherical': Single variance for all dimensions

'full'
name str

Name identifier for this generator instance.

None
debug bool

Enable debug logging.

False

Attributes:

Name Type Description
means_ ndarray

State means for each site. Shape: (n_states, n_sites).

covariances_ ndarray

Covariance matrices for each state. Shape: (n_states, n_sites, n_sites).

transition_matrix_ ndarray

State transition probability matrix. Shape: (n_states, n_states).

stationary_distribution_ ndarray

Stationary distribution of states. Shape: (n_states,).

Q_log_ ndarray

Log-transformed observed flows used for fitting.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.parametric import MultiSiteHMMGenerator
>>>
>>> # Load multi-site annual flows
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>>
>>> # Initialize generator
>>> gen = MultiSiteHMMGenerator(n_states=2)
>>> gen.preprocessing(Q_annual)
>>> gen.fit()
>>>
>>> # Generate 100 realizations of 50 years each
>>> ensemble = gen.generate(n_realizations=100, n_years=50, seed=42)
Notes
  • Designed for annual timestep data (can handle other frequencies)
  • Log transformation ensures positive emissions
  • Full covariance preserves spatial correlations between sites
  • State ordering: states sorted by mean (low mean = dry state)

Initialize the MultiSiteHMMGenerator.

output_frequency property

output_frequency: str

Output frequency matches input frequency.

Typically used for annual data ('YS' or 'AS'), but flexible.

preprocessing

preprocessing(Q_obs, *, sites: Optional[List[str]] = None, **kwargs) -> None

Preprocess observed data for HMM fitting.

Applies offset and log transformation to handle zeros and ensure positive values for fitting.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed streamflow data with DatetimeIndex.

required
sites List[str]

Subset of sites to use. If None, uses all columns.

None
**kwargs dict

Additional preprocessing parameters (currently unused).

{}

Raises:

Type Description
ValueError

If data has fewer than 2 sites for multi-site modeling.

fit

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit the multi-site HMM to observed data.

Estimates hidden states, transition probabilities, state-specific means, and covariance matrices using the GMMHMM algorithm.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed streamflow data. If provided, preprocessing is called automatically.

None
sites list of str

Sites to use (only when Q_obs is provided).

None
**kwargs dict

Additional fitting parameters. May include random_state for reproducible fitting.

{}
Notes

States are automatically ordered by mean (ascending), so state 0 represents the dry state and higher-numbered states represent progressively wetter states.

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic streamflow realizations.

Parameters:

Name Type Description Default
n_realizations int

Number of synthetic realizations to generate.

1
n_years int

Number of years to generate. If provided with annual data, this equals n_timesteps.

None
n_timesteps int

Number of timesteps to generate explicitly. Takes precedence over n_years if both provided.

None
seed int

Random seed for reproducibility.

None
**kwargs dict

Additional generation parameters (currently unused).

{}

Returns:

Type Description
Ensemble

Generated synthetic flows as an Ensemble object.

Raises:

Type Description
ValueError

If neither n_years nor n_timesteps is provided.


WARMGenerator

WARMGenerator

WARMGenerator(*, wavelet: str = 'morl', scales: Optional[NDArray] = None, n_octaves: Optional[float] = None, n_voices: int = 8, s0: Optional[float] = None, ar_order: int = 1, n_ar_max: int = 5, ar_select: str = 'fixed', bands: Optional[List[Tuple[float, float]]] = None, background_spectrum: str = 'red', significance_level: float = 0.95, min_band_scales: int = 1, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Wavelet Auto-Regressive Method (WARM) for non-stationary streamflow generation.

Implements the enhanced WARM framework of Nowak et al. (2011). The procedure decomposes an observed annual flow record into significant spectral bands via the continuous wavelet transform, removes time-varying envelope by dividing each band-reconstructed signal by the square root of its Scale-Averaged Wavelet Power (SAWP), fits AR(p) models to the resulting stationary signals (one per band plus a noise residual), and reverses the process to synthesize new traces with the same non-stationary spectral structure as the historic record.

Significance of spectral peaks is assessed using the chi-squared background spectrum framework of Torrence and Compo (1998), with either a white-noise or AR(1) red-noise background.

Notes

The WARMGenerator is univariate. For multi-site simulation as described in Nowak et al. (2011, Section 2.4), apply this generator to an aggregate gauge time series and then disaggregate spatially using the proportional KNN method of Nowak et al. (2010), available in SynHydro as synhydro.methods.disaggregation.spatial.NowakDisaggregator.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.parametric.warm import WARMGenerator
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>> warm = WARMGenerator(wavelet='morl', background_spectrum='red')
>>> warm.fit(Q_annual.iloc[:, [0]])
>>> ensemble = warm.generate(n_years=100, n_realizations=50, seed=42)
References

Nowak, K., Rajagopalan, B., and Zagona, E. (2011). A Wavelet Auto-Regressive Method (WARM) for multi-site streamflow simulation of data with non-stationary spectra. Journal of Hydrology, 410(1-2), 1-12.

Torrence, C., and Compo, G.P. (1998). A practical guide to wavelet analysis. Bulletin of the American Meteorological Society, 79(1), 61-78.

Initialize the WARM Generator.

Parameters:

Name Type Description Default
wavelet str

Wavelet type for the continuous wavelet transform. Supported with tabulated reconstruction constants: 'morl' (Morlet) and 'mexh' (Mexican Hat). Other PyWavelets continuous wavelets are accepted but will fall back to Morlet constants and emit a warning.

'morl'
scales array-like of float

Explicit scales (in units of the sampling period) at which to evaluate the CWT. If None, scales are constructed geometrically a la Torrence and Compo (1998) using s0, n_voices, and n_octaves.

None
n_octaves float

Number of powers-of-two of scale to span. If None, defaults to log2(N / (2 * s0)) where N is the record length, capping the largest scale at half the record length.

None
n_voices int

Number of voices per octave. Setting delta_j = 1 / n_voices controls scale resolution. Default of 8 matches the Torrence and Compo (1998) recommendation for the Morlet wavelet.

8
s0 float

Smallest scale, in units of the sampling period. Defaults to 2, corresponding to a Fourier period of approximately 2 * dt.

None
ar_order int

Order of the autoregressive model fitted to each band's stationary component when ar_select='fixed'. Per Nowak et al. (2011), low-order AR models are usually adequate for the smooth band reconstructions.

1
n_ar_max int

Maximum AR order considered when ar_select='aic'.

5
ar_select (fixed, aic)

Strategy for choosing AR order. 'fixed' uses ar_order for every band. 'aic' selects the order in [1, n_ar_max] minimizing Akaike's information criterion.

'fixed'
bands list of (period_low, period_high) tuples

Explicit Fourier-period bands (in years) to model. Each tuple specifies the inclusive low and high period bounds of a band. If None (default), bands are auto-detected from contiguous significant peaks in the global wavelet spectrum at the chosen significance_level against the chosen background_spectrum.

None
background_spectrum (red, white)

Background spectrum for the chi-squared significance test of Torrence and Compo (1998). 'red' uses a theoretical AR(1) spectrum with lag-1 coefficient estimated from the record; 'white' uses a flat spectrum.

'red'
significance_level float

Confidence level (0 < level < 1) used to threshold the global wavelet spectrum for band detection.

0.95
min_band_scales int

Minimum number of contiguous scales above the significance threshold required to declare a band. Increase to suppress narrow single-scale spurious peaks.

1
name str

Name for this generator instance.

None
debug bool

Enable debug logging.

False
**kwargs dict

Additional parameters; ignored.

{}

Raises:

Type Description
ValueError

If ar_order < 1, n_ar_max < 1, ar_select not in {'fixed', 'aic'}, background_spectrum not in {'red', 'white'}, significance_level not in (0, 1), or wavelet not a recognized continuous wavelet.

output_frequency property

output_frequency: str

Pandas frequency string of generated output (annual, year-start).

preprocessing

preprocessing(Q_obs, *, sites=None, **kwargs) -> None

Preprocess observed data for WARM fitting.

Validates input, ensures (or resamples to) annual frequency, and stores the resulting series on self.Q_obs_annual.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed streamflow with a DatetimeIndex.

required
sites list of str

Sites to keep. If None, uses all columns; only one site is permitted because WARM is univariate.

None
**kwargs dict

Ignored.

{}

fit

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Fit the WARM model to observed annual flows.

Steps follow Nowak et al. (2011) Sections 2.1-2.3:

  1. Compute the continuous wavelet transform on the mean-centered flow series.
  2. Compute the global wavelet spectrum and its chi-squared significance threshold against the chosen background spectrum (Torrence and Compo 1998).
  3. Identify significant spectral bands as contiguous runs of scales exceeding the threshold (or use user-supplied bands).
  4. For each band, compute the band-restricted SAWP (Eq. 5) and the band-reconstructed time-domain signal via the inverse CWT (Eq. 4).
  5. Divide the band-reconstructed signal by the square root of SAWP to obtain a stationary series and fit an AR(p) model.
  6. Form the noise residual as the observed series minus the sum of all band reconstructions, and fit an AR model to it.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

If provided, preprocessing is called automatically.

None
sites list of str

Forwarded to preprocessing if Q_obs is given.

None
**kwargs dict

Ignored.

{}

generate

generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic annual streamflows.

Parameters:

Name Type Description Default
n_years int

Number of years per realization. Defaults to the historical record length.

None
n_realizations int

Number of synthetic realizations to produce.

1
n_timesteps int

Synonym for n_years; if both are given, n_timesteps wins.

None
seed int

Seed for the random number generator (NumPy default_rng).

None
**kwargs dict

Ignored.

{}

Returns:

Type Description
Ensemble

Ensemble object containing all realizations.

Raises:

Type Description
ValueError

If n_years resolves to a non-positive value.


ARFIMAGenerator

ARFIMAGenerator

ARFIMAGenerator(*, p: int = 1, q: int = 0, d_method: str = 'whittle', truncation_lag: int = 100, deseasonalize: bool = True, auto_order: bool = False, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

Autoregressive Fractionally Integrated Moving Average (ARFIMA) generator for synthetic monthly/annual streamflow generation.

Generates synthetic streamflows using an ARFIMA model that captures long-range dependence through fractional differencing parameter d in (0, 0.5). The model preserves Hurst exponent, seasonal patterns (if monthly), and autocorrelation structure.

The Hurst exponent H relates to the fractional differencing parameter via H = d + 0.5, providing direct parameterization of long-memory behavior.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.parametric.arfima import ARFIMAGenerator
>>> Q_monthly = pd.read_csv('monthly_flows.csv', index_col=0, parse_dates=True)
>>> arfima = ARFIMAGenerator()
>>> arfima.preprocessing(Q_monthly.iloc[:, 0])
>>> arfima.fit()
>>> ensemble = arfima.generate(n_years=50, n_realizations=100)
References

Hosking, J.R.M. (1984). Modeling persistence in hydrological time series using fractional differencing. Water Resources Research, 20(12), 1898-1908. https://doi.org/10.1029/WR020i012p01898

Initialize the ARFIMAGenerator.

Parameters:

Name Type Description Default
p int

AR order for the short-memory ARMA(p,q) component.

1
q int

MA order for the short-memory ARMA(p,q) component.

0
d_method str

Method for estimating d: 'whittle' (frequency domain MLE), 'gph' (Geweke-Porter-Hudak), or 'rs' (R/S analysis).

'whittle'
truncation_lag int

Truncation lag K for fractional differencing coefficients.

100
deseasonalize bool

Remove seasonal component (monthly means/stds) before fitting. Set False for annual data.

True
auto_order bool

If True, select (p, q) via BIC grid search over p in {0, 1, 2} and q in {0, 1, 2}. Overrides user-supplied p and q values. Uses BIC which is proven consistent for ARFIMA (Huang et al. 2022, Annals of Statistics).

False
name str

Name identifier for this generator instance.

None
debug bool

Enable debug logging.

False
**kwargs dict

Additional parameters (stored in init_params).

{}

output_frequency property

output_frequency: str

Return output frequency based on input data.

preprocessing

preprocessing(Q_obs, *, sites=None, **kwargs) -> None

Preprocess observed data for ARFIMA generation.

Validates input, ensures univariate data, optionally deseasonalizes for monthly data, and checks stationarity.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed historical flow data.

required
sites list

Sites to keep. If None, uses all columns.

None
**kwargs dict

Additional preprocessing parameters.

{}

Raises:

Type Description
ValueError

If data has insufficient length or multiple sites.

fit

fit(Q_obs=None, *, sites=None, **kwargs) -> None

Estimate ARFIMA model parameters from preprocessed data.

Sequence: 1. Estimate fractional differencing parameter d using specified method 2. Apply fractional differencing to obtain differenced series 3. Fit ARMA(p,q) to differenced series using Yule-Walker equations 4. Store all fitted parameters

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

If provided, calls preprocessing automatically.

None
sites list

Sites to keep. Passed to preprocessing if Q_obs is provided.

None
**kwargs dict

Additional fitting parameters.

{}

Raises:

Type Description
ValueError

If fitting fails (e.g., ARMA estimation error).

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic streamflow realizations.

Sequence: 1. Generate white noise innovations 2. Apply AR recursion to obtain ARMA differenced series W_t 3. Invert fractional differencing via MA convolution (FIR filter) to recover X_t 4. Re-seasonalize if monthly 5. Return as Ensemble

Parameters:

Name Type Description Default
n_realizations int

Number of synthetic realizations to generate.

1
n_years int

Number of years to generate. If None, uses length of training data.

None
n_timesteps int

Number of timesteps to generate. Overrides n_years if provided.

None
seed int

Random seed for reproducibility.

None
**kwargs dict

Additional parameters (unused).

{}

Returns:

Type Description
Ensemble

Generated synthetic flows as an Ensemble object.

Raises:

Type Description
ValueError

If neither n_years nor n_timesteps is provided.


SPARTAGenerator

SPARTAGenerator

SPARTAGenerator(*, nataf_method: str = 'GH', nataf_n_eval: int = 9, nataf_poly_deg: int = 6, nataf_gh_nodes: int = 21, marginal_method: str = 'parametric', matrix_repair_method: str = 'spectral', name: Optional[str] = None, debug: bool = False, **kwargs: Any)

Bases: Generator

Stochastic Periodic AutoRegressive To Anything generator.

Generates multisite cyclostationary synthetic timeseries at monthly resolution with per-month marginal distributions and PAR(1)-N auxiliary Gaussian model with Nataf ICDF mapping.

Parameters:

Name Type Description Default
nataf_method str

Nataf evaluation method: "GH" (default), "MC", or "Int".

'GH'
nataf_n_eval int

Number of support points for Nataf polynomial fitting (default 9).

9
nataf_poly_deg int

Polynomial degree for Nataf approximation (default 6).

6
nataf_gh_nodes int

Gauss-Hermite quadrature nodes (default 21).

21
marginal_method str

Marginal fitting: "parametric" (default, gamma/lognorm BIC).

'parametric'
matrix_repair_method str

Method for repairing non-PD matrices (default "spectral").

'spectral'
name str

Generator name.

None
debug bool

Enable debug logging (default False).

False

output_frequency property

output_frequency: str

Monthly frequency.

preprocessing

preprocessing(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Validate and prepare monthly data.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed monthly streamflow.

None
sites list of str

Subset of site names.

None

fit

fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Fit the SPARTA model to observed monthly data.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

If provided, calls preprocessing first.

None
sites list of str

Subset of site names.

None

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble

Generate synthetic monthly timeseries.

Parameters:

Name Type Description Default
n_realizations int

Number of realizations (default 1).

1
n_years int

Number of years. Defaults to observed length.

None
n_timesteps int

Total months. Overrides n_years.

None
seed int

Random seed.

None

Returns:

Type Description
Ensemble

Generated synthetic data.


SMARTAGenerator

SMARTAGenerator

SMARTAGenerator(*, acf_model: str = 'cas', sma_order: int = 512, nataf_method: str = 'GH', nataf_n_eval: int = 9, nataf_poly_deg: int = 8, nataf_gh_nodes: int = 21, marginal_method: str = 'parametric', matrix_repair_method: str = 'spectral', name: Optional[str] = None, debug: bool = False, **kwargs: Any)

Bases: Generator

Symmetric Moving Average (neaRly) To Anything generator.

Generates multisite stationary synthetic timeseries at annual resolution with arbitrary marginal distributions and any-range autocorrelation structure via the SMA model with Nataf ICDF mapping.

Parameters:

Name Type Description Default
acf_model str

Autocorrelation model: "cas" (default), "hurst", or "custom".

'cas'
sma_order int

SMA truncation order q (default 512, should be power of 2).

512
nataf_method str

Nataf evaluation method: "GH" (default), "MC", or "Int".

'GH'
nataf_n_eval int

Number of support points for Nataf polynomial fitting (default 9).

9
nataf_poly_deg int

Polynomial degree for Nataf approximation (default 8).

8
nataf_gh_nodes int

Gauss-Hermite quadrature nodes (default 21).

21
marginal_method str

Marginal fitting method: "parametric" (default, gamma/lognorm BIC).

'parametric'
matrix_repair_method str

Method for repairing non-PD matrices: "spectral" (default), "nearest", or "hypersphere".

'spectral'
name str

Generator name.

None
debug bool

Enable debug logging (default False).

False

output_frequency property

output_frequency: str

Annual frequency.

preprocessing

preprocessing(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Validate and prepare annual data.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed streamflow. If not provided, uses data from constructor.

None
sites list of str

Subset of site names to use.

None

fit

fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Fit the SMARTA model to observed data.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

If provided, calls preprocessing first.

None
sites list of str

Subset of site names.

None

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble

Generate synthetic annual timeseries.

Parameters:

Name Type Description Default
n_realizations int

Number of realizations to generate (default 1).

1
n_years int

Number of years per realization. Defaults to observed length.

None
n_timesteps int

Alias for n_years at annual resolution.

None
seed int

Random seed for reproducibility.

None

Returns:

Type Description
Ensemble

Generated synthetic data.


HMMKNNGenerator

HMMKNNGenerator

HMMKNNGenerator(*, n_states: int = 2, delta: float = 1.0, covariance_type: str = 'full', n_init: int = 10, name: Optional[str] = None, debug: bool = False, **kwargs)

Bases: Generator

HMM-KNN generator for synthetic annual multisite streamflow.

Combines a Gaussian Hidden Markov Model for regime sequencing with K-Nearest Neighbor bootstrapping for within-regime resampling. Regime transitions are governed by a learned Markov transition matrix. For each synthetic year the generator identifies the regime-transition category (previous state, current state), searches the historical record for analog years within that category using normalized log-flow distances, and resamples the full multisite flow vector from one of the K nearest analogs.

Parameters:

Name Type Description Default
n_states int

Number of hidden hydrologic regimes. State 0 is the driest regime.

2
delta float

Additive offset applied before log transformation to handle near-zero flows. Must be positive.

1.0
covariance_type str

Covariance structure for the Gaussian HMM emissions. One of 'full', 'diag', or 'spherical'. 'full' preserves all inter-site correlations within each state.

'full'
n_init int

Number of random initializations for HMM fitting. The fit with the highest log-likelihood is retained.

10
name str

Name identifier for this generator instance.

None
debug bool

Enable debug-level logging.

False

Attributes:

Name Type Description
transition_matrix_ np.ndarray of shape (n_states, n_states)

Learned HMM transition probability matrix.

stationary_distribution_ np.ndarray of shape (n_states,)

Stationary distribution of the Markov chain.

state_sequence_ np.ndarray of shape (N,)

Viterbi state assignment for each year of the historical record.

log_std_ np.ndarray of shape (n_sites,)

Per-site standard deviation of historical log-flows, used to normalize distances in KNN search.

Q_log_ np.ndarray of shape (N, n_sites)

Log-transformed historical flows.

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.parametric import HMMKNNGenerator
>>>
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>>
>>> gen = HMMKNNGenerator(n_states=2)
>>> gen.fit(Q_annual)
>>>
>>> ensemble = gen.generate(n_realizations=100, n_years=50, seed=42)

Initialize the HMMKNNGenerator.

output_frequency property

output_frequency: str

Temporal frequency of generated output.

Returns:

Type Description
str

Always 'YS' (annual start).

preprocessing

preprocessing(Q_obs: Union[Series, DataFrame], *, sites: Optional[List[str]] = None, **kwargs) -> None

Preprocess observed annual flow data for HMM-KNN fitting.

Applies the log transform Y = log(Q + delta) and stores the result. The preprocessed data are used by fit().

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed annual streamflow with DatetimeIndex.

required
sites list of str

Subset of sites to use. If None, all columns are used.

None
**kwargs dict

Additional preprocessing parameters (unused).

{}

Raises:

Type Description
ValueError

If log-transformed data contain non-finite values.

fit

fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs) -> None

Fit the HMM-KNN model to observed annual flow data.

Runs n_init random HMM initializations and retains the fit with the highest log-likelihood. Decodes the historical state sequence via the Viterbi algorithm, reorders states by ascending mean log-flow at the first site (state 0 = driest), and builds KNN pool index structures.

Parameters:

Name Type Description Default
Q_obs Series or DataFrame

Observed annual streamflow. If provided, preprocessing() is called automatically. If None, preprocessing() must have been called first.

None
sites list of str

Sites to use (only applied when Q_obs is provided).

None
**kwargs dict

Additional fitting parameters (unused).

{}

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble

Generate synthetic annual streamflow realizations.

Parameters:

Name Type Description Default
n_realizations int

Number of independent synthetic sequences to generate.

1
n_years int

Number of years per realization. If None and n_timesteps is also None, defaults to the length of the historical record.

None
n_timesteps int

Explicit number of timesteps. Takes precedence over n_years.

None
seed int

Random seed for reproducibility.

None
**kwargs dict

Additional generation parameters (unused).

{}

Returns:

Type Description
Ensemble

Generated synthetic flows.

Raises:

Type Description
ValueError

If the generator has not been fitted.


MultisitePhaseRandomizationGenerator

MultisitePhaseRandomizationGenerator

MultisitePhaseRandomizationGenerator(*, wavelet: str = 'cmor1.5-1.0', n_scales: int = 100, win_h_length: int = 15, transform: str = 'mean_center', name: Optional[str] = None, debug: bool = False, **kwargs: Any)

Bases: Generator

Multisite wavelet phase randomization generator (Brunner and Gilleland, 2020).

Generates synthetic daily streamflow at multiple sites using a shared wavelet (CWT) phase structure. Each site's power spectrum (CWT amplitude) is preserved from the observed record, while spatial correlation is maintained by applying identical random phases -- drawn from a single white-noise CWT -- to all sites simultaneously.

Attributes:

Name Type Description
par_day_ dict of dict

Fitted kappa distribution parameters for each site and day of year. Keyed by site name, then day-of-year integer (1-365). Each leaf entry contains {'xi', 'alfa', 'k', 'h'}.

cwt_amplitudes_ dict of np.ndarray

Per-site CWT amplitude spectra of shape (n_scales, N). Keyed by site name.

norm_ dict of np.ndarray

Per-site pre-CWT series (mean-centered or normal-score) of length N. Keyed by site name.

obs_mean_ dict of float

Per-site global mean subtracted during mean-center transform. Empty when transform='normal_score'.

scales_ ndarray

CWT scales used, shape (n_scales,).

delta_j_ float

Log-scale spacing (constant for geometrically spaced scales).

Examples:

>>> import pandas as pd
>>> from synhydro.methods.generation.nonparametric import (
...     MultisitePhaseRandomizationGenerator,
... )
>>> Q_daily = pd.read_csv('daily_flows.csv', index_col=0, parse_dates=True)
>>> gen = MultisitePhaseRandomizationGenerator()
>>> gen.preprocessing(Q_daily)
>>> gen.fit()
>>> ensemble = gen.generate(n_realizations=100, seed=42)
Notes
  • Requires at least 2 years (730 days) of daily data per site.
  • February 29 observations are removed before fitting.
  • After leap-day removal, the record length must be a multiple of 365.
  • All sites must share the same DatetimeIndex.
  • The generator produces realizations of the same length as the observed record unless n_years is specified.

Initialize the MultisitePhaseRandomizationGenerator.

Parameters:

Name Type Description Default
wavelet str

PyWavelets continuous wavelet identifier. The complex Morlet wavelet 'cmor1.5-1.0' (bandwidth 1.5, center frequency 1.0) is recommended.

'cmor1.5-1.0'
n_scales int

Number of CWT scales, spaced log-uniformly from 2 to N/8 where N is the record length in days.

100
win_h_length int

Half-window length (days) for per-day-of-year kappa fitting. Values within +-win_h_length days of each target day are pooled, giving a total window of 2*win_h_length+1 days.

15
transform str

Transform applied to each site's observed series before computing the CWT. Options:

  • 'mean_center': subtract the global site mean, matching the Brunner and Gilleland (2020) PRSim reference implementation.
  • 'normal_score': apply the per-day-of-year Van der Waerden normal-score transform, producing a more Gaussian CWT input.

The kappa marginal fitting always uses the raw (untransformed) flow values regardless of this setting.

'mean_center'
name str

Name identifier for this generator instance.

None
debug bool

Enable debug-level logging.

False
**kwargs dict

Additional keyword arguments (currently unused).

{}

output_frequency property

output_frequency: str

Wavelet phase randomization generates daily output.

preprocessing

preprocessing(Q_obs: DataFrame, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Preprocess observed multisite daily streamflow data.

Validates input, removes leap days, and creates per-site day-of-year indices. After leap-day removal, the record length must be a multiple of 365.

Parameters:

Name Type Description Default
Q_obs DataFrame or Series

Observed daily streamflow with DatetimeIndex. A DataFrame with one column per site is required for multisite generation. A Series is accepted and treated as a single-site case.

required
sites list of str

Subset of columns to use. If None, all columns are used.

None
**kwargs dict

Additional preprocessing parameters (currently unused).

{}

Raises:

Type Description
ValueError

If data has fewer than 730 days after leap-day removal, or if the length after removal is not a multiple of 365.

fit

fit(Q_obs: Optional[DataFrame] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None

Fit the multisite wavelet phase randomization model.

This method: 1. Fits per-site, per-day-of-year kappa distributions using L-moments. 2. Applies the normal score transform per site and day of year. 3. Computes the CWT of each normal-score series and stores per-site amplitude spectra.

Parameters:

Name Type Description Default
Q_obs DataFrame

If provided, calls preprocessing() automatically.

None
sites list of str

Passed to preprocessing() when Q_obs is provided.

None
**kwargs dict

Additional fitting parameters (currently unused).

{}

generate

generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble

Generate synthetic multisite daily streamflow realizations.

Parameters:

Name Type Description Default
n_realizations int

Number of independent synthetic realizations to generate.

1
n_years int

Target length of each realization in years (365-day years, no leap days). When provided, independent phase-randomized chunks are concatenated until the target length is reached and then trimmed. When None, the output length equals the observed record length.

None
n_timesteps int

Not used. Length is controlled via n_years.

None
seed int

Random seed for reproducibility.

None
**kwargs dict

Additional generation parameters (currently unused).

{}

Returns:

Type Description
Ensemble

Generated synthetic flows as an Ensemble object. Each realization is a DataFrame with shape (n_days, n_sites) and a no-leap DatetimeIndex.