Generators¶
Base Class¶
Generator ¶
Bases: ABC
Abstract base class for all synthetic generation methods.
All generator implementations should inherit from this class.
Follows the scikit-learn pattern: __init__ configures the algorithm,
fit(Q_obs) learns from data, generate() produces synthetic flows.
Class Attributes
supports_multisite : bool
Whether this generator supports multiple sites. Default False.
supported_frequencies : tuple of str
Pandas frequency strings this generator accepts (e.g., ('MS',)).
Initialize the generator with algorithm configuration.
Subclasses add algorithm-specific keyword-only parameters before
name and debug. Data is not passed here — use fit(Q_obs)
or preprocessing(Q_obs) instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name identifier for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
n_sites
property
¶
Number of sites in the generator.
Returns:
| Type | Description |
|---|---|
int
|
Number of sites. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If preprocessing not yet run. |
sites
property
¶
List of site names.
Returns:
| Type | Description |
|---|---|
List[str]
|
Site identifiers. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If preprocessing not yet run. |
output_frequency
abstractmethod
property
¶
Temporal frequency of generated output.
Returns:
| Type | Description |
|---|---|
str
|
Pandas frequency string (e.g., 'MS' for monthly, 'D' for daily). |
validate_input_data ¶
Validate and standardize input data format.
Checks type, DatetimeIndex, NaN content, negative values, data frequency, and minimum record length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Series or DataFrame
|
Input time series data |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Validated and standardized data |
Raises:
| Type | Description |
|---|---|
ValueError
|
If data format is invalid |
TypeError
|
If data type is unsupported |
validate_preprocessing ¶
Check if preprocessing has been completed.
Raises:
| Type | Description |
|---|---|
ValueError
|
If preprocessing() has not been run. |
validate_fit ¶
Check if generator has been fitted.
Raises:
| Type | Description |
|---|---|
ValueError
|
If fit() has not been run. |
update_state ¶
Update generator state flags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
preprocessed
|
bool
|
Set preprocessing state. |
None
|
fitted
|
bool
|
Set fitted state. |
None
|
get_params ¶
Get initialization parameters (scikit-learn style).
Returns only constructor/configuration parameters, not fitted values. Following scikit-learn convention for compatibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
deep
|
bool
|
If True, return deep copy of parameters. |
True
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary of initialization parameters. |
get_fitted_params ¶
Get parameters learned from data during fit().
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary of fitted parameters (all keys end with underscore). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If generator has not been fitted yet. |
summary ¶
Generate comprehensive summary of generator configuration and fit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
show_fitted
|
bool
|
Whether to include fitted parameters in summary. |
True
|
Returns:
| Type | Description |
|---|---|
str
|
Formatted summary string. |
get_state_info ¶
Get complete state information including params and metadata.
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary containing all generator state, parameters, and metadata. |
save ¶
Save fitted generator to file using pickle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
Path to save the generator. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If generator is not fitted. |
load
classmethod
¶
Load fitted generator from file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
Path to saved generator file. |
required |
Returns:
| Type | Description |
|---|---|
Generator
|
Loaded generator instance. |
preprocessing
abstractmethod
¶
preprocessing(Q_obs: Union[Series, DataFrame], *, sites: Optional[List[str]] = None, **kwargs: Any) -> None
Preprocess and validate observed flow data.
Implementations should:
1. Call _store_obs_data(Q_obs, sites) to validate and store data
2. Perform generator-specific data preparation
3. Call update_state(preprocessed=True) at end
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed historical flow data. |
required |
sites
|
list of str
|
Sites to use. If None, uses all columns. |
None
|
**kwargs
|
Any
|
Additional preprocessing parameters. |
{}
|
fit
abstractmethod
¶
fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None
Fit the generator to observed flow data.
If Q_obs is provided, preprocessing() is called automatically.
If omitted, a prior call to preprocessing() is required.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed data. If provided, runs preprocessing automatically. |
None
|
sites
|
list of str
|
Sites to use (only when Q_obs is provided). |
None
|
**kwargs
|
Any
|
Additional fitting parameters. |
{}
|
generate
abstractmethod
¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble
Generate synthetic streamflow realizations.
Implementations should: 1. Call validate_fit() at start 2. Set random seed if provided 3. Generate synthetic flows 4. Return Ensemble object containing all realizations
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of synthetic realizations to generate. |
1
|
n_years
|
int
|
Number of years to generate (alternative to n_timesteps). |
None
|
n_timesteps
|
int
|
Number of timesteps to generate explicitly. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
Any
|
Additional generation parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic flows as an Ensemble object. |
KirschGenerator¶
KirschGenerator ¶
KirschGenerator(*, generate_using_log_flow=True, matrix_repair_method='spectral', name=None, debug=False, **kwargs)
Bases: Generator
Kirsch nonparametric bootstrap generator for monthly streamflow synthesis.
Generates monthly synthetic flows using bootstrap resampling with correlation preservation via Cholesky decomposition.
References
Kirsch, B.R., Characklis, G.W., and Zeff, H.B. (2013). Evaluating the impact of alternative hydro-climate scenarios on transfer agreements. Journal of Water Resources Planning and Management, 139(4), 396-406.
Initialize Kirsch generator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
generate_using_log_flow
|
bool
|
If True, generates in log-space for better handling of skewed distributions. |
True
|
matrix_repair_method
|
str
|
Method for repairing non-positive-definite correlation matrices. |
'spectral'
|
name
|
str
|
Name for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
Q_obs_monthly
property
¶
Get observed monthly data (alias for Qm for consistency with other generators).
preprocessing ¶
Preprocess observed data for Kirsch generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
DataFrame
|
Observed historical flow data with DatetimeIndex. |
required |
sites
|
list
|
Sites to use. If None, uses all sites. |
None
|
timestep
|
str
|
Currently only 'monthly' is supported. |
'monthly'
|
**kwargs
|
Additional preprocessing parameters. |
{}
|
fit ¶
Fit Kirsch generator to preprocessed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
DataFrame
|
If provided, calls preprocessing automatically. |
None
|
sites
|
list
|
Sites to use (passed to preprocessing if Q_obs provided). |
None
|
**kwargs
|
Additional fitting parameters. |
{}
|
generate_from_indices ¶
Generate synthetic flows by directly specifying historical year indices.
This method allows external code (e.g., MOEA-FIND) to inject decision variables (year indices) instead of random sampling. Runs the full post-bootstrap pipeline: Cholesky, normal-score inversion, re-seasonalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indices
|
ndarray
|
Array of historical year indices to resample. Shape (n_years+1, 12) where each entry is in [0, n_historic_years). The extra year allows Dec-Jan cross-year correlation handling. Can be floats (will be cast to int). |
required |
n_years
|
int
|
Number of years for the synthetic output. If None, inferred from indices.shape[0] - 1. |
None
|
as_array
|
bool
|
If True, returns numpy array; if False, returns pandas DataFrame. |
True
|
synthetic_index
|
DatetimeIndex
|
Custom DatetimeIndex for the output. If None, a default index is generated. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray or DataFrame
|
Synthetic monthly flows with shape (n_years * 12, n_sites) if as_array=True, otherwise a pandas DataFrame. |
Notes
This method assumes the generator has been fitted. Indices are treated as indices into the historic years array (self.historic_years or [0, 1, ..., n-1]).
generate_from_residuals ¶
Generate synthetic flows from pre-computed standardized residuals.
This method allows external code (e.g., MOEA-FIND) to inject decision variables (standardized residuals) directly, bypassing the bootstrap resampling step. Runs steps 4-8 of the Kirsch pipeline: normal-score transform, Cholesky, inverse normal-score, Dec-Jan combination, and re-seasonalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
residuals
|
ndarray
|
Array of standardized residuals with shape (n_years, 12, n_sites). Each residual should be approximately N(0,1) or representable as such within month-specific empirical distributions. |
required |
as_array
|
bool
|
If True, returns numpy array; if False, returns pandas DataFrame. |
True
|
synthetic_index
|
DatetimeIndex
|
Custom DatetimeIndex for the output. If None, a default index is generated. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray or DataFrame
|
Synthetic monthly flows with shape (n_years * 12, n_sites) if as_array=True, otherwise a pandas DataFrame. |
Notes
This method assumes the generator has been fitted. Residuals are assumed to be standardized residuals; they will be normal-score transformed, processed through Cholesky factors, and combined to preserve Dec-Jan correlations.
generate_single_series ¶
Generate a single synthetic time series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_years
|
int
|
Number of years for the synthetic time series. |
required |
M
|
ndarray
|
Bootstrap indices for the synthetic time series. If None, random indices will be generated. |
None
|
as_array
|
bool
|
If True, returns a numpy array; if False, returns a pandas DataFrame. |
True
|
synthetic_index
|
DatetimeIndex
|
Custom index for the synthetic time series. If None, a default index will be generated. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray or DataFrame
|
Synthetic time series data. |
generate ¶
Generate an ensemble of synthetic monthly flows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of synthetic time series to generate. |
1
|
n_years
|
int
|
Number of years for each synthetic time series. If None, uses the number of historic years. |
None
|
n_timesteps
|
int
|
Not used (Kirsch generates by years, not timesteps). |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
Additional generation parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Ensemble object containing all generated realizations. |
KNNBootstrapGenerator¶
KNNBootstrapGenerator ¶
KNNBootstrapGenerator(*, n_neighbors: Optional[int] = None, feature_cols: Optional[List[str]] = None, index_site: Optional[str] = None, block_size: int = 1, name: Optional[str] = None, debug: bool = False, **kwargs: Any)
Bases: Generator
K-Nearest Neighbor bootstrap generator for synthetic streamflow.
Conditionally resamples from historical record by finding K nearest neighbors to the current state and selecting successor values with Lall-Sharma kernel weights.
References
Lall, U., and Sharma, A. (1996). A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research, 32(3), 679-693.
Initialize KNN Bootstrap generator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_neighbors
|
int
|
Number of neighbors K. If None, uses ceil(sqrt(n)) where n is the number of historical timesteps. |
None
|
feature_cols
|
list
|
Column names to use as features for KNN search. If None, uses all columns. |
None
|
index_site
|
str
|
Site name to use for distance computation in multisite mode. If None, uses multivariate distance across all feature columns. |
None
|
block_size
|
int
|
Number of consecutive timesteps to resample as a block (1 = standard KNN). |
1
|
name
|
str
|
Name for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
**kwargs
|
Any
|
Additional parameters (stored but not used). |
{}
|
output_frequency
property
¶
Return temporal frequency of generated output.
Detected from input data frequency (monthly or annual).
preprocessing ¶
Preprocess and validate observed flow data.
Constructs feature vectors for KNN search and successor pairs. Also detects the temporal frequency of the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed historical flow data with DatetimeIndex. |
required |
sites
|
list
|
Sites to use. If None, uses all columns. |
None
|
**kwargs
|
Any
|
Additional preprocessing parameters. |
{}
|
fit ¶
Fit KNN model(s) to preprocessed data.
For monthly data, fits 12 separate KNN models — one per calendar month — so that the neighbor search is conditioned on month (Rajagopalan & Lall 1999). For annual or daily data, fits a single global model.
Also computes Lall-Sharma kernel weights for neighbor selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed historical flow data. If provided, preprocessing is called automatically. |
None
|
sites
|
list of str
|
Sites to use (only when Q_obs is provided). |
None
|
**kwargs
|
Any
|
Additional fitting parameters. |
{}
|
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble
Generate synthetic streamflow realizations.
Uses KNN bootstrap with Lall-Sharma kernel weighting to conditionally resample from historical record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of synthetic realizations to generate. |
1
|
n_years
|
int
|
Number of years to generate. If None, uses number of observed years. |
None
|
n_timesteps
|
int
|
Number of timesteps to generate explicitly. Overrides n_years if provided. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
Any
|
Additional generation parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic flows with metadata. |
PhaseRandomizationGenerator¶
PhaseRandomizationGenerator ¶
PhaseRandomizationGenerator(*, marginal: str = 'kappa', win_h_length: int = 15, name: Optional[str] = None, debug: bool = False, **kwargs)
Bases: Generator
Phase randomization generator for synthetic streamflow using Brunner et al. (2019).
Generates synthetic daily streamflow time series using Fourier transform phase randomization combined with the four-parameter kappa distribution. The method preserves both short- and long-range temporal dependence by conserving the power spectrum while randomizing phases.
Attributes:
| Name | Type | Description |
|---|---|---|
par_day_ |
dict
|
Fitted kappa distribution parameters for each day of year (1-365). Each entry contains {'xi', 'alfa', 'k', 'h'}. |
modulus_ |
ndarray
|
Amplitude spectrum (modulus of FFT) from fitted data. |
phases_ |
ndarray
|
Phase spectrum from fitted data. |
norm_ |
ndarray
|
Normalized/deseasonalized data after normal score transform. |
Examples:
>>> import pandas as pd
>>> from synhydro.methods.generation.nonparametric import PhaseRandomizationGenerator
>>> Q_daily = pd.read_csv('daily_flows.csv', index_col=0, parse_dates=True)
>>> gen = PhaseRandomizationGenerator(marginal='kappa')
>>> gen.preprocessing(Q_daily)
>>> gen.fit()
>>> ensemble = gen.generate(n_realizations=100, seed=42)
Notes
- Requires at least 2 years (730 days) of daily data
- February 29 observations are removed to ensure consistent 365-day years
- The method generates series of the same length as the observed data
Initialize the PhaseRandomizationGenerator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
marginal
|
str
|
Marginal distribution type for back-transformation: - 'kappa': Four-parameter kappa distribution (default, allows extrapolation) - 'empirical': Empirical distribution (no extrapolation beyond observed) |
'kappa'
|
win_h_length
|
int
|
Half-window length for daily distribution fitting. Values within +-win_h_length days are used, giving a total window of 2*win_h_length+1 days. |
15
|
name
|
str
|
Name identifier for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
**kwargs
|
dict
|
Additional parameters (currently unused). |
{}
|
preprocessing ¶
Preprocess observed data for phase randomization generation.
Validates input data, removes leap days, and creates day-of-year index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed daily streamflow data with DatetimeIndex. |
required |
sites
|
list
|
Sites to keep. If None, uses all columns. |
None
|
**kwargs
|
dict
|
Additional preprocessing parameters (currently unused). |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data has fewer than 730 days or has missing days. |
fit ¶
Fit the phase randomization model to observed data.
This method: 1. Fits kappa distribution parameters for each day of year (if marginal='kappa') 2. Applies normal score transform per day of year 3. Computes FFT and extracts modulus/phases
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
If provided, calls preprocessing automatically. |
None
|
sites
|
list
|
Sites to keep. Passed to preprocessing if Q_obs is provided. |
None
|
**kwargs
|
dict
|
Additional fitting parameters (currently unused). |
{}
|
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble
Generate synthetic streamflow realizations using phase randomization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of synthetic realizations to generate. |
1
|
n_years
|
int
|
Target length of each realization in years (365-day years, no leap days). When provided, independent phase-randomized chunks are concatenated until the target length is reached, then trimmed. When None the output length equals the observed record length. |
None
|
n_timesteps
|
int
|
Not used. Length is controlled via n_years. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
dict
|
Additional generation parameters (currently unused). |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic flows as an Ensemble object. |
ThomasFieringGenerator¶
ThomasFieringGenerator ¶
Bases: Generator
Thomas-Fiering autoregressive model for monthly streamflow generation.
Generates synthetic monthly streamflows using a lag-1 autoregressive model with Stedinger-Taylor normalization. Preserves monthly means, standard deviations, and lag-1 serial correlations.
Note: Thomas-Fiering is a univariate method (single site only).
Examples:
>>> import pandas as pd
>>> from synhydro.methods.generate.parametric.thomas_fiering import ThomasFieringGenerator
>>> Q_monthly = pd.read_csv('monthly_flows.csv', index_col=0, parse_dates=True)
>>> tf = ThomasFieringGenerator()
>>> tf.fit(Q_monthly.iloc[:, 0])
>>> ensemble = tf.generate(n_years=10, n_realizations=5)
References
Thomas, H.A., and Fiering, M.B. (1962). Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation.
Stedinger, J.R., and Taylor, M.R. (1982). Synthetic streamflow generation: 1. Model verification and validation. Water Resources Research, 18(4), 909-918.
Initialize the ThomasFieringGenerator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
**kwargs
|
dict
|
Additional parameters (currently unused). |
{}
|
preprocessing ¶
Preprocess observed data for Thomas-Fiering generation.
Validates input, resamples to monthly if needed, and applies Stedinger-Taylor normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Streamflow data with DatetimeIndex. Must be single site. |
required |
sites
|
list
|
Not used (Thomas-Fiering is univariate). |
None
|
**kwargs
|
dict
|
Additional parameters (currently unused). |
{}
|
fit ¶
Estimate Thomas-Fiering model parameters from normalized flows.
Calculates monthly means, standard deviations, and lag-1 serial correlations from normalized flows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
If provided, calls preprocessing automatically. |
None
|
sites
|
list
|
Sites to use (passed to preprocessing if Q_obs provided). |
None
|
**kwargs
|
dict
|
Additional parameters (currently unused). |
{}
|
generate ¶
generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble
Generate synthetic monthly streamflows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_years
|
int
|
Number of years to generate per realization. If None, uses the length of historic data. |
None
|
n_realizations
|
int
|
Number of synthetic realizations to generate. |
1
|
n_timesteps
|
int
|
Number of monthly timesteps to generate. If provided, overrides n_years. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
dict
|
Additional parameters (currently unused). |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Ensemble object containing all realizations. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither n_years nor n_timesteps is provided. |
MatalasGenerator¶
MatalasGenerator ¶
MatalasGenerator(*, log_transform: bool = True, name: Optional[str] = None, debug: bool = False, **kwargs)
Bases: Generator
Matalas (1967) multi-site monthly lag-1 autoregressive (MAR(1)) model.
The standard classical baseline for parametric multi-site stochastic generation. Extends the Thomas-Fiering univariate model to n sites using matrix autoregression, preserving contemporaneous cross-site correlations and lag-1 temporal structure at each site.
For each monthly transition m → m+1, generates:
Z(t+1) = A(m) · Z(t) + B(m) · ε(t+1)
where Z are standardized flows across all sites, ε ~ N(0, I), and A, B are coefficient matrices fitted from observed cross-correlations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_transform
|
bool
|
Apply log(Q + 1) transformation before standardization to reduce skewness and improve normality assumption. |
True
|
name
|
str
|
Name for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
Notes
The coefficient matrices are derived from the lag-0 and lag-1 cross-correlation matrices of the standardized flows:
A(m) = S₁(m) · S₀(m)⁻¹
B(m) · B(m)ᵀ = S₀(m+1) - A(m) · S₀(m) · A(m)ᵀ
where S₀(m) is the contemporaneous correlation matrix at month m and S₁(m) is the lag-1 cross-correlation between months m+1 and m. B(m) is the lower Cholesky factor of the residual covariance.
Examples:
>>> gen = MatalasGenerator(log_transform=True)
>>> gen.fit(Q_monthly)
>>> ensemble = gen.generate(n_years=100, n_realizations=50, seed=42)
References
Matalas, N. C. (1967). Mathematical assessment of synthetic hydrology. Water Resources Research, 3(4), 937–945.
Salas, J. D., Delleur, J. W., Yevjevich, V., & Lane, W. L. (1980). Applied Modeling of Hydrologic Time Series. Water Resources Publications.
preprocessing ¶
Validate input and resample to monthly frequency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
DataFrame or Series
|
Monthly streamflow with DatetimeIndex. Columns are sites. |
required |
sites
|
list
|
Subset of site columns to use. Uses all columns if None. |
None
|
**kwargs
|
dict
|
Unused. |
{}
|
fit ¶
Estimate MAR(1) coefficient matrices from observed monthly flows.
For each of the 12 monthly transitions, computes lag-0 (S0) and lag-1 (S1) cross-correlation matrices then solves for A and B.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
DataFrame or Series
|
If provided, calls preprocessing automatically. |
None
|
sites
|
list
|
Sites to use (passed to preprocessing if Q_obs provided). |
None
|
**kwargs
|
dict
|
Unused. |
{}
|
generate ¶
generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble
Generate synthetic monthly streamflows at all sites.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_years
|
int
|
Years per realization. Defaults to length of historic record. |
None
|
n_realizations
|
int
|
Number of independent synthetic sequences. |
1
|
n_timesteps
|
int
|
Total monthly timesteps; overrides n_years when provided. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
dict
|
Unused. |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Collection of synthetic realizations. |
MultiSiteHMMGenerator¶
MultiSiteHMMGenerator ¶
MultiSiteHMMGenerator(*, n_states: int = 2, offset: float = 1.0, max_iterations: int = 1000, covariance_type: str = 'full', name: Optional[str] = None, debug: bool = False, **kwargs)
Bases: Generator
Multi-site Hidden Markov Model generator for synthetic streamflow.
Generates synthetic streamflow using a Gaussian Mixture Model HMM that models temporal dependencies through hidden states and spatial correlations through multivariate Gaussian emissions with state-specific covariance matrices.
The method is particularly suited for capturing drought dynamics across multiple sites/basins simultaneously.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_states
|
int
|
Number of hidden states. Default is 2 (dry/wet states). |
2
|
offset
|
float
|
Small value added before log transformation to handle zeros. Recommended: 1.0 for flows in standard units. |
1.0
|
max_iterations
|
int
|
Maximum iterations for HMM fitting convergence. |
1000
|
covariance_type
|
str
|
Type of covariance matrix: - 'full': Full covariance matrix (captures all correlations) - 'diag': Diagonal covariance (independent sites) - 'spherical': Single variance for all dimensions |
'full'
|
name
|
str
|
Name identifier for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
means_ |
ndarray
|
State means for each site. Shape: (n_states, n_sites). |
covariances_ |
ndarray
|
Covariance matrices for each state. Shape: (n_states, n_sites, n_sites). |
transition_matrix_ |
ndarray
|
State transition probability matrix. Shape: (n_states, n_states). |
stationary_distribution_ |
ndarray
|
Stationary distribution of states. Shape: (n_states,). |
Q_log_ |
ndarray
|
Log-transformed observed flows used for fitting. |
Examples:
>>> import pandas as pd
>>> from synhydro.methods.generation.parametric import MultiSiteHMMGenerator
>>>
>>> # Load multi-site annual flows
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>>
>>> # Initialize generator
>>> gen = MultiSiteHMMGenerator(n_states=2)
>>> gen.preprocessing(Q_annual)
>>> gen.fit()
>>>
>>> # Generate 100 realizations of 50 years each
>>> ensemble = gen.generate(n_realizations=100, n_years=50, seed=42)
Notes
- Designed for annual timestep data (can handle other frequencies)
- Log transformation ensures positive emissions
- Full covariance preserves spatial correlations between sites
- State ordering: states sorted by mean (low mean = dry state)
Initialize the MultiSiteHMMGenerator.
output_frequency
property
¶
Output frequency matches input frequency.
Typically used for annual data ('YS' or 'AS'), but flexible.
preprocessing ¶
Preprocess observed data for HMM fitting.
Applies offset and log transformation to handle zeros and ensure positive values for fitting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed streamflow data with DatetimeIndex. |
required |
sites
|
List[str]
|
Subset of sites to use. If None, uses all columns. |
None
|
**kwargs
|
dict
|
Additional preprocessing parameters (currently unused). |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data has fewer than 2 sites for multi-site modeling. |
fit ¶
Fit the multi-site HMM to observed data.
Estimates hidden states, transition probabilities, state-specific means, and covariance matrices using the GMMHMM algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed streamflow data. If provided, preprocessing is called automatically. |
None
|
sites
|
list of str
|
Sites to use (only when Q_obs is provided). |
None
|
**kwargs
|
dict
|
Additional fitting parameters. May include |
{}
|
Notes
States are automatically ordered by mean (ascending), so state 0 represents the dry state and higher-numbered states represent progressively wetter states.
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble
Generate synthetic streamflow realizations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of synthetic realizations to generate. |
1
|
n_years
|
int
|
Number of years to generate. If provided with annual data, this equals n_timesteps. |
None
|
n_timesteps
|
int
|
Number of timesteps to generate explicitly. Takes precedence over n_years if both provided. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
dict
|
Additional generation parameters (currently unused). |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic flows as an Ensemble object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither n_years nor n_timesteps is provided. |
WARMGenerator¶
WARMGenerator ¶
WARMGenerator(*, wavelet: str = 'morl', scales: Optional[NDArray] = None, n_octaves: Optional[float] = None, n_voices: int = 8, s0: Optional[float] = None, ar_order: int = 1, n_ar_max: int = 5, ar_select: str = 'fixed', bands: Optional[List[Tuple[float, float]]] = None, background_spectrum: str = 'red', significance_level: float = 0.95, min_band_scales: int = 1, name: Optional[str] = None, debug: bool = False, **kwargs)
Bases: Generator
Wavelet Auto-Regressive Method (WARM) for non-stationary streamflow generation.
Implements the enhanced WARM framework of Nowak et al. (2011). The procedure decomposes an observed annual flow record into significant spectral bands via the continuous wavelet transform, removes time-varying envelope by dividing each band-reconstructed signal by the square root of its Scale-Averaged Wavelet Power (SAWP), fits AR(p) models to the resulting stationary signals (one per band plus a noise residual), and reverses the process to synthesize new traces with the same non-stationary spectral structure as the historic record.
Significance of spectral peaks is assessed using the chi-squared background spectrum framework of Torrence and Compo (1998), with either a white-noise or AR(1) red-noise background.
Notes
The WARMGenerator is univariate. For multi-site simulation as described in
Nowak et al. (2011, Section 2.4), apply this generator to an aggregate
gauge time series and then disaggregate spatially using the proportional
KNN method of Nowak et al. (2010), available in SynHydro as
synhydro.methods.disaggregation.spatial.NowakDisaggregator.
Examples:
>>> import pandas as pd
>>> from synhydro.methods.generation.parametric.warm import WARMGenerator
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>> warm = WARMGenerator(wavelet='morl', background_spectrum='red')
>>> warm.fit(Q_annual.iloc[:, [0]])
>>> ensemble = warm.generate(n_years=100, n_realizations=50, seed=42)
References
Nowak, K., Rajagopalan, B., and Zagona, E. (2011). A Wavelet Auto-Regressive Method (WARM) for multi-site streamflow simulation of data with non-stationary spectra. Journal of Hydrology, 410(1-2), 1-12.
Torrence, C., and Compo, G.P. (1998). A practical guide to wavelet analysis. Bulletin of the American Meteorological Society, 79(1), 61-78.
Initialize the WARM Generator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
wavelet
|
str
|
Wavelet type for the continuous wavelet transform. Supported with tabulated reconstruction constants: 'morl' (Morlet) and 'mexh' (Mexican Hat). Other PyWavelets continuous wavelets are accepted but will fall back to Morlet constants and emit a warning. |
'morl'
|
scales
|
array-like of float
|
Explicit scales (in units of the sampling period) at which to
evaluate the CWT. If |
None
|
n_octaves
|
float
|
Number of powers-of-two of scale to span. If |
None
|
n_voices
|
int
|
Number of voices per octave. Setting |
8
|
s0
|
float
|
Smallest scale, in units of the sampling period. Defaults to 2,
corresponding to a Fourier period of approximately |
None
|
ar_order
|
int
|
Order of the autoregressive model fitted to each band's
stationary component when |
1
|
n_ar_max
|
int
|
Maximum AR order considered when |
5
|
ar_select
|
(fixed, aic)
|
Strategy for choosing AR order. |
'fixed'
|
bands
|
list of (period_low, period_high) tuples
|
Explicit Fourier-period bands (in years) to model. Each tuple
specifies the inclusive low and high period bounds of a band. If
|
None
|
background_spectrum
|
(red, white)
|
Background spectrum for the chi-squared significance test of
Torrence and Compo (1998). |
'red'
|
significance_level
|
float
|
Confidence level (0 < level < 1) used to threshold the global wavelet spectrum for band detection. |
0.95
|
min_band_scales
|
int
|
Minimum number of contiguous scales above the significance threshold required to declare a band. Increase to suppress narrow single-scale spurious peaks. |
1
|
name
|
str
|
Name for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
**kwargs
|
dict
|
Additional parameters; ignored. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
output_frequency
property
¶
Pandas frequency string of generated output (annual, year-start).
preprocessing ¶
Preprocess observed data for WARM fitting.
Validates input, ensures (or resamples to) annual frequency, and stores
the resulting series on self.Q_obs_annual.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed streamflow with a DatetimeIndex. |
required |
sites
|
list of str
|
Sites to keep. If |
None
|
**kwargs
|
dict
|
Ignored. |
{}
|
fit ¶
Fit the WARM model to observed annual flows.
Steps follow Nowak et al. (2011) Sections 2.1-2.3:
- Compute the continuous wavelet transform on the mean-centered flow series.
- Compute the global wavelet spectrum and its chi-squared significance threshold against the chosen background spectrum (Torrence and Compo 1998).
- Identify significant spectral bands as contiguous runs of scales exceeding the threshold (or use user-supplied bands).
- For each band, compute the band-restricted SAWP (Eq. 5) and the band-reconstructed time-domain signal via the inverse CWT (Eq. 4).
- Divide the band-reconstructed signal by the square root of SAWP to obtain a stationary series and fit an AR(p) model.
- Form the noise residual as the observed series minus the sum of all band reconstructions, and fit an AR model to it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
If provided, |
None
|
sites
|
list of str
|
Forwarded to |
None
|
**kwargs
|
dict
|
Ignored. |
{}
|
generate ¶
generate(n_years: Optional[int] = None, n_realizations: int = 1, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble
Generate synthetic annual streamflows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_years
|
int
|
Number of years per realization. Defaults to the historical record length. |
None
|
n_realizations
|
int
|
Number of synthetic realizations to produce. |
1
|
n_timesteps
|
int
|
Synonym for |
None
|
seed
|
int
|
Seed for the random number generator (NumPy |
None
|
**kwargs
|
dict
|
Ignored. |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Ensemble object containing all realizations. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ARFIMAGenerator¶
ARFIMAGenerator ¶
ARFIMAGenerator(*, p: int = 1, q: int = 0, d_method: str = 'whittle', truncation_lag: int = 100, deseasonalize: bool = True, auto_order: bool = False, name: Optional[str] = None, debug: bool = False, **kwargs)
Bases: Generator
Autoregressive Fractionally Integrated Moving Average (ARFIMA) generator for synthetic monthly/annual streamflow generation.
Generates synthetic streamflows using an ARFIMA model that captures long-range dependence through fractional differencing parameter d in (0, 0.5). The model preserves Hurst exponent, seasonal patterns (if monthly), and autocorrelation structure.
The Hurst exponent H relates to the fractional differencing parameter via H = d + 0.5, providing direct parameterization of long-memory behavior.
Examples:
>>> import pandas as pd
>>> from synhydro.methods.generation.parametric.arfima import ARFIMAGenerator
>>> Q_monthly = pd.read_csv('monthly_flows.csv', index_col=0, parse_dates=True)
>>> arfima = ARFIMAGenerator()
>>> arfima.preprocessing(Q_monthly.iloc[:, 0])
>>> arfima.fit()
>>> ensemble = arfima.generate(n_years=50, n_realizations=100)
References
Hosking, J.R.M. (1984). Modeling persistence in hydrological time series using fractional differencing. Water Resources Research, 20(12), 1898-1908. https://doi.org/10.1029/WR020i012p01898
Initialize the ARFIMAGenerator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
int
|
AR order for the short-memory ARMA(p,q) component. |
1
|
q
|
int
|
MA order for the short-memory ARMA(p,q) component. |
0
|
d_method
|
str
|
Method for estimating d: 'whittle' (frequency domain MLE), 'gph' (Geweke-Porter-Hudak), or 'rs' (R/S analysis). |
'whittle'
|
truncation_lag
|
int
|
Truncation lag K for fractional differencing coefficients. |
100
|
deseasonalize
|
bool
|
Remove seasonal component (monthly means/stds) before fitting. Set False for annual data. |
True
|
auto_order
|
bool
|
If True, select (p, q) via BIC grid search over p in {0, 1, 2} and q in {0, 1, 2}. Overrides user-supplied p and q values. Uses BIC which is proven consistent for ARFIMA (Huang et al. 2022, Annals of Statistics). |
False
|
name
|
str
|
Name identifier for this generator instance. |
None
|
debug
|
bool
|
Enable debug logging. |
False
|
**kwargs
|
dict
|
Additional parameters (stored in init_params). |
{}
|
preprocessing ¶
Preprocess observed data for ARFIMA generation.
Validates input, ensures univariate data, optionally deseasonalizes for monthly data, and checks stationarity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed historical flow data. |
required |
sites
|
list
|
Sites to keep. If None, uses all columns. |
None
|
**kwargs
|
dict
|
Additional preprocessing parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data has insufficient length or multiple sites. |
fit ¶
Estimate ARFIMA model parameters from preprocessed data.
Sequence: 1. Estimate fractional differencing parameter d using specified method 2. Apply fractional differencing to obtain differenced series 3. Fit ARMA(p,q) to differenced series using Yule-Walker equations 4. Store all fitted parameters
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
If provided, calls preprocessing automatically. |
None
|
sites
|
list
|
Sites to keep. Passed to preprocessing if Q_obs is provided. |
None
|
**kwargs
|
dict
|
Additional fitting parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If fitting fails (e.g., ARMA estimation error). |
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble
Generate synthetic streamflow realizations.
Sequence: 1. Generate white noise innovations 2. Apply AR recursion to obtain ARMA differenced series W_t 3. Invert fractional differencing via MA convolution (FIR filter) to recover X_t 4. Re-seasonalize if monthly 5. Return as Ensemble
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of synthetic realizations to generate. |
1
|
n_years
|
int
|
Number of years to generate. If None, uses length of training data. |
None
|
n_timesteps
|
int
|
Number of timesteps to generate. Overrides n_years if provided. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
dict
|
Additional parameters (unused). |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic flows as an Ensemble object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither n_years nor n_timesteps is provided. |
SPARTAGenerator¶
SPARTAGenerator ¶
SPARTAGenerator(*, nataf_method: str = 'GH', nataf_n_eval: int = 9, nataf_poly_deg: int = 6, nataf_gh_nodes: int = 21, marginal_method: str = 'parametric', matrix_repair_method: str = 'spectral', name: Optional[str] = None, debug: bool = False, **kwargs: Any)
Bases: Generator
Stochastic Periodic AutoRegressive To Anything generator.
Generates multisite cyclostationary synthetic timeseries at monthly resolution with per-month marginal distributions and PAR(1)-N auxiliary Gaussian model with Nataf ICDF mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nataf_method
|
str
|
Nataf evaluation method: |
'GH'
|
nataf_n_eval
|
int
|
Number of support points for Nataf polynomial fitting (default 9). |
9
|
nataf_poly_deg
|
int
|
Polynomial degree for Nataf approximation (default 6). |
6
|
nataf_gh_nodes
|
int
|
Gauss-Hermite quadrature nodes (default 21). |
21
|
marginal_method
|
str
|
Marginal fitting: |
'parametric'
|
matrix_repair_method
|
str
|
Method for repairing non-PD matrices (default |
'spectral'
|
name
|
str
|
Generator name. |
None
|
debug
|
bool
|
Enable debug logging (default False). |
False
|
preprocessing ¶
preprocessing(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None
Validate and prepare monthly data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed monthly streamflow. |
None
|
sites
|
list of str
|
Subset of site names. |
None
|
fit ¶
fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None
Fit the SPARTA model to observed monthly data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
If provided, calls preprocessing first. |
None
|
sites
|
list of str
|
Subset of site names. |
None
|
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble
Generate synthetic monthly timeseries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of realizations (default 1). |
1
|
n_years
|
int
|
Number of years. Defaults to observed length. |
None
|
n_timesteps
|
int
|
Total months. Overrides n_years. |
None
|
seed
|
int
|
Random seed. |
None
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic data. |
SMARTAGenerator¶
SMARTAGenerator ¶
SMARTAGenerator(*, acf_model: str = 'cas', sma_order: int = 512, nataf_method: str = 'GH', nataf_n_eval: int = 9, nataf_poly_deg: int = 8, nataf_gh_nodes: int = 21, marginal_method: str = 'parametric', matrix_repair_method: str = 'spectral', name: Optional[str] = None, debug: bool = False, **kwargs: Any)
Bases: Generator
Symmetric Moving Average (neaRly) To Anything generator.
Generates multisite stationary synthetic timeseries at annual resolution with arbitrary marginal distributions and any-range autocorrelation structure via the SMA model with Nataf ICDF mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acf_model
|
str
|
Autocorrelation model: |
'cas'
|
sma_order
|
int
|
SMA truncation order q (default 512, should be power of 2). |
512
|
nataf_method
|
str
|
Nataf evaluation method: |
'GH'
|
nataf_n_eval
|
int
|
Number of support points for Nataf polynomial fitting (default 9). |
9
|
nataf_poly_deg
|
int
|
Polynomial degree for Nataf approximation (default 8). |
8
|
nataf_gh_nodes
|
int
|
Gauss-Hermite quadrature nodes (default 21). |
21
|
marginal_method
|
str
|
Marginal fitting method: |
'parametric'
|
matrix_repair_method
|
str
|
Method for repairing non-PD matrices: |
'spectral'
|
name
|
str
|
Generator name. |
None
|
debug
|
bool
|
Enable debug logging (default False). |
False
|
preprocessing ¶
preprocessing(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None
Validate and prepare annual data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed streamflow. If not provided, uses data from constructor. |
None
|
sites
|
list of str
|
Subset of site names to use. |
None
|
fit ¶
fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs: Any) -> None
Fit the SMARTA model to observed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
If provided, calls preprocessing first. |
None
|
sites
|
list of str
|
Subset of site names. |
None
|
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble
Generate synthetic annual timeseries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of realizations to generate (default 1). |
1
|
n_years
|
int
|
Number of years per realization. Defaults to observed length. |
None
|
n_timesteps
|
int
|
Alias for n_years at annual resolution. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic data. |
HMMKNNGenerator¶
HMMKNNGenerator ¶
HMMKNNGenerator(*, n_states: int = 2, delta: float = 1.0, covariance_type: str = 'full', n_init: int = 10, name: Optional[str] = None, debug: bool = False, **kwargs)
Bases: Generator
HMM-KNN generator for synthetic annual multisite streamflow.
Combines a Gaussian Hidden Markov Model for regime sequencing with K-Nearest Neighbor bootstrapping for within-regime resampling. Regime transitions are governed by a learned Markov transition matrix. For each synthetic year the generator identifies the regime-transition category (previous state, current state), searches the historical record for analog years within that category using normalized log-flow distances, and resamples the full multisite flow vector from one of the K nearest analogs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_states
|
int
|
Number of hidden hydrologic regimes. State 0 is the driest regime. |
2
|
delta
|
float
|
Additive offset applied before log transformation to handle near-zero flows. Must be positive. |
1.0
|
covariance_type
|
str
|
Covariance structure for the Gaussian HMM emissions. One of 'full', 'diag', or 'spherical'. 'full' preserves all inter-site correlations within each state. |
'full'
|
n_init
|
int
|
Number of random initializations for HMM fitting. The fit with the highest log-likelihood is retained. |
10
|
name
|
str
|
Name identifier for this generator instance. |
None
|
debug
|
bool
|
Enable debug-level logging. |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
transition_matrix_ |
np.ndarray of shape (n_states, n_states)
|
Learned HMM transition probability matrix. |
stationary_distribution_ |
np.ndarray of shape (n_states,)
|
Stationary distribution of the Markov chain. |
state_sequence_ |
np.ndarray of shape (N,)
|
Viterbi state assignment for each year of the historical record. |
log_std_ |
np.ndarray of shape (n_sites,)
|
Per-site standard deviation of historical log-flows, used to normalize distances in KNN search. |
Q_log_ |
np.ndarray of shape (N, n_sites)
|
Log-transformed historical flows. |
Examples:
>>> import pandas as pd
>>> from synhydro.methods.generation.parametric import HMMKNNGenerator
>>>
>>> Q_annual = pd.read_csv('annual_flows.csv', index_col=0, parse_dates=True)
>>>
>>> gen = HMMKNNGenerator(n_states=2)
>>> gen.fit(Q_annual)
>>>
>>> ensemble = gen.generate(n_realizations=100, n_years=50, seed=42)
Initialize the HMMKNNGenerator.
output_frequency
property
¶
Temporal frequency of generated output.
Returns:
| Type | Description |
|---|---|
str
|
Always 'YS' (annual start). |
preprocessing ¶
preprocessing(Q_obs: Union[Series, DataFrame], *, sites: Optional[List[str]] = None, **kwargs) -> None
Preprocess observed annual flow data for HMM-KNN fitting.
Applies the log transform Y = log(Q + delta) and stores the result. The preprocessed data are used by fit().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed annual streamflow with DatetimeIndex. |
required |
sites
|
list of str
|
Subset of sites to use. If None, all columns are used. |
None
|
**kwargs
|
dict
|
Additional preprocessing parameters (unused). |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If log-transformed data contain non-finite values. |
fit ¶
fit(Q_obs: Optional[Union[Series, DataFrame]] = None, *, sites: Optional[List[str]] = None, **kwargs) -> None
Fit the HMM-KNN model to observed annual flow data.
Runs n_init random HMM initializations and retains the fit with the highest log-likelihood. Decodes the historical state sequence via the Viterbi algorithm, reorders states by ascending mean log-flow at the first site (state 0 = driest), and builds KNN pool index structures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
Series or DataFrame
|
Observed annual streamflow. If provided, preprocessing() is called automatically. If None, preprocessing() must have been called first. |
None
|
sites
|
list of str
|
Sites to use (only applied when Q_obs is provided). |
None
|
**kwargs
|
dict
|
Additional fitting parameters (unused). |
{}
|
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs) -> Ensemble
Generate synthetic annual streamflow realizations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of independent synthetic sequences to generate. |
1
|
n_years
|
int
|
Number of years per realization. If None and n_timesteps is also None, defaults to the length of the historical record. |
None
|
n_timesteps
|
int
|
Explicit number of timesteps. Takes precedence over n_years. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
dict
|
Additional generation parameters (unused). |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic flows. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the generator has not been fitted. |
MultisitePhaseRandomizationGenerator¶
MultisitePhaseRandomizationGenerator ¶
MultisitePhaseRandomizationGenerator(*, wavelet: str = 'cmor1.5-1.0', n_scales: int = 100, win_h_length: int = 15, transform: str = 'mean_center', name: Optional[str] = None, debug: bool = False, **kwargs: Any)
Bases: Generator
Multisite wavelet phase randomization generator (Brunner and Gilleland, 2020).
Generates synthetic daily streamflow at multiple sites using a shared wavelet (CWT) phase structure. Each site's power spectrum (CWT amplitude) is preserved from the observed record, while spatial correlation is maintained by applying identical random phases -- drawn from a single white-noise CWT -- to all sites simultaneously.
Attributes:
| Name | Type | Description |
|---|---|---|
par_day_ |
dict of dict
|
Fitted kappa distribution parameters for each site and day of year. Keyed by site name, then day-of-year integer (1-365). Each leaf entry contains {'xi', 'alfa', 'k', 'h'}. |
cwt_amplitudes_ |
dict of np.ndarray
|
Per-site CWT amplitude spectra of shape (n_scales, N). Keyed by site name. |
norm_ |
dict of np.ndarray
|
Per-site pre-CWT series (mean-centered or normal-score) of length N. Keyed by site name. |
obs_mean_ |
dict of float
|
Per-site global mean subtracted during mean-center transform. Empty when transform='normal_score'. |
scales_ |
ndarray
|
CWT scales used, shape (n_scales,). |
delta_j_ |
float
|
Log-scale spacing (constant for geometrically spaced scales). |
Examples:
>>> import pandas as pd
>>> from synhydro.methods.generation.nonparametric import (
... MultisitePhaseRandomizationGenerator,
... )
>>> Q_daily = pd.read_csv('daily_flows.csv', index_col=0, parse_dates=True)
>>> gen = MultisitePhaseRandomizationGenerator()
>>> gen.preprocessing(Q_daily)
>>> gen.fit()
>>> ensemble = gen.generate(n_realizations=100, seed=42)
Notes
- Requires at least 2 years (730 days) of daily data per site.
- February 29 observations are removed before fitting.
- After leap-day removal, the record length must be a multiple of 365.
- All sites must share the same DatetimeIndex.
- The generator produces realizations of the same length as the observed record unless n_years is specified.
Initialize the MultisitePhaseRandomizationGenerator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
wavelet
|
str
|
PyWavelets continuous wavelet identifier. The complex Morlet wavelet 'cmor1.5-1.0' (bandwidth 1.5, center frequency 1.0) is recommended. |
'cmor1.5-1.0'
|
n_scales
|
int
|
Number of CWT scales, spaced log-uniformly from 2 to N/8 where N is the record length in days. |
100
|
win_h_length
|
int
|
Half-window length (days) for per-day-of-year kappa fitting. Values within +-win_h_length days of each target day are pooled, giving a total window of 2*win_h_length+1 days. |
15
|
transform
|
str
|
Transform applied to each site's observed series before computing the CWT. Options:
The kappa marginal fitting always uses the raw (untransformed) flow values regardless of this setting. |
'mean_center'
|
name
|
str
|
Name identifier for this generator instance. |
None
|
debug
|
bool
|
Enable debug-level logging. |
False
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
output_frequency
property
¶
Wavelet phase randomization generates daily output.
preprocessing ¶
Preprocess observed multisite daily streamflow data.
Validates input, removes leap days, and creates per-site day-of-year indices. After leap-day removal, the record length must be a multiple of 365.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
DataFrame or Series
|
Observed daily streamflow with DatetimeIndex. A DataFrame with one column per site is required for multisite generation. A Series is accepted and treated as a single-site case. |
required |
sites
|
list of str
|
Subset of columns to use. If None, all columns are used. |
None
|
**kwargs
|
dict
|
Additional preprocessing parameters (currently unused). |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data has fewer than 730 days after leap-day removal, or if the length after removal is not a multiple of 365. |
fit ¶
Fit the multisite wavelet phase randomization model.
This method: 1. Fits per-site, per-day-of-year kappa distributions using L-moments. 2. Applies the normal score transform per site and day of year. 3. Computes the CWT of each normal-score series and stores per-site amplitude spectra.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Q_obs
|
DataFrame
|
If provided, calls preprocessing() automatically. |
None
|
sites
|
list of str
|
Passed to preprocessing() when Q_obs is provided. |
None
|
**kwargs
|
dict
|
Additional fitting parameters (currently unused). |
{}
|
generate ¶
generate(n_realizations: int = 1, n_years: Optional[int] = None, n_timesteps: Optional[int] = None, seed: Optional[int] = None, **kwargs: Any) -> Ensemble
Generate synthetic multisite daily streamflow realizations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_realizations
|
int
|
Number of independent synthetic realizations to generate. |
1
|
n_years
|
int
|
Target length of each realization in years (365-day years, no leap days). When provided, independent phase-randomized chunks are concatenated until the target length is reached and then trimmed. When None, the output length equals the observed record length. |
None
|
n_timesteps
|
int
|
Not used. Length is controlled via n_years. |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
**kwargs
|
dict
|
Additional generation parameters (currently unused). |
{}
|
Returns:
| Type | Description |
|---|---|
Ensemble
|
Generated synthetic flows as an Ensemble object. Each realization is a DataFrame with shape (n_days, n_sites) and a no-leap DatetimeIndex. |