Impact Function Calibration Module#
What’s New?#
Since CLIMADA v6.0.1, some functionality of this module has been changed. When upgrading to a newer version of CLIMADA, please mind the following changes:
Inputreceived additional attributes. We now support optional weights that are passed to the cost function. Therefore, the cost function must support an additional, optional argument.cost_funcnow receives numpy arrays. An additional attributedf_to_numpywas added to transformpandas.DataFrameobjects tonp.ndarray. By default, it returns a flattened array.This module now exports cost functions that support optional weights, see
climada.util.calibrate.cost_func.Ensemble optimizers have been addded.
Base Classes#
Generic classes for defining the data structures of this module.
- class climada.util.calibrate.base.Input(hazard: ~climada.hazard.base.Hazard, exposure: ~climada.entity.exposures.base.Exposures, data: ~pandas.core.frame.DataFrame, impact_func_creator: ~typing.Callable[[...], ~climada.entity.impact_funcs.impact_func_set.ImpactFuncSet], impact_to_dataframe: ~typing.Callable[[~climada.engine.impact.Impact], ~pandas.core.frame.DataFrame], cost_func: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray | None], ~numbers.Number], bounds: ~typing.Mapping[str, ~scipy.optimize._constraints.Bounds | ~typing.Tuple[~numbers.Number, ~numbers.Number]] | None = None, constraints: ~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping | list[~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping] | None = None, impact_calc_kwds: ~typing.Mapping[str, ~typing.Any] = <factory>, missing_data_value: float = nan, df_to_numpy: ~typing.Callable[[~pandas.core.frame.DataFrame], ~numpy.ndarray] = <function Input.<lambda>>, *, data_weights: ~pandas.core.frame.DataFrame | None = None, missing_weights_value: float = 0.0, assign_centroids: dataclasses.InitVar[bool] = True)[source]#
Define the static input for a calibration task
- hazard#
Hazard object to compute impacts from
- Type:
climada.Hazard
- exposure#
Exposures object to compute impacts from
- Type:
climada.Exposures
- data#
The data to compare computed impacts to. Index: Event IDs matching the IDs of
hazard. Columns: Arbitrary columns. NaN values in the data frame have special meaning: Corresponding impact values computed by the model are ignored in the calibration.- Type:
pandas.DataFrame
- impact_func_creator#
Function that takes the parameters as keyword arguments and returns an impact function set. This will be called each time the optimization algorithm updates the parameters.
- Type:
Callable
- impact_to_dataframe#
Function that takes an impact object as input and transforms its data into a pandas.DataFrame that is compatible with the format of
data. The return value of this function will be passed to thecost_funcas first argument.- Type:
Callable
- cost_func#
Function that takes two
pandas.Dataframeobjects and returns the scalar “cost” between them. The optimization algorithm will try to minimize this number. The first argument is the true/correct values (data), the second argument is the estimated/predicted values, and the third argument is thedata_weights. The cost function is intended to operate onnumpy.ndarrayobjects. Dataframes are transformed usingdf_to_numpy.- Type:
Callable
- bounds#
The bounds for the parameters. Keys: parameter names. Values:
scipy.minimize.Boundsinstance or tuple of minimum and maximum value. Unbounded parameters need not be specified here. See the documentation for the selected optimization algorithm on which data types are supported.- Type:
Mapping (str, {Bounds, tuple(float, float)}), optional
- constraints#
One or multiple instances of
scipy.minimize.LinearConstraint,scipy.minimize.NonlinearConstraint, or a mapping. See the documentation for the selected optimization algorithm on which data types are supported.- Type:
Constraint or list of Constraint, optional
- impact_calc_kwds#
Keyword arguments to
climada.engine.impact_calc.ImpactCalc.impact(). Defaults to{"assign_centroids": False}(by default, centroids are assigned here via theassign_centroidsparameter, to avoid assigning them each time the impact is calculated).- Type:
Mapping (str, Any), optional
- missing_data_value#
If the impact model returns impact data for which no values exist in
data, insert this value. Defaults to NaN, in which case the impact from the model is ignored. Set this to zero to explicitly calibrate to zero impacts in these cases.- Type:
float, optional
- df_to_numpy#
A function that transforms a pandas.DataFrame into a numpy.ndarray to be inserted into the
cost_func. By default, this will flatten the data frame.- Type:
Callable, optional
- data_weights#
Weights for each entry in
data. Must have the exact same index and columns. IfNone, the weights will be ignored (equivalent to the same weight for each event).- Type:
pandas.DataFrame, optional
- missing_weights_value#
Same as
missing_data_value, but fordata_weights.- Type:
float, optional
- assign_centroids#
If
True(default), assign the hazard centroids to the exposure when this object is created.- Type:
bool, optional
- impact_to_aligned_df(impact: Impact, fillna: float = nan) Tuple[DataFrame, DataFrame][source]#
Create a dataframe from an impact and align it with the data.
When aligning, two general cases might occur, which are not mutually exclusive:
There are data points for which no impact was computed. This will always be treated as an impact of zero.
There are impacts for which no data points exist. For these points, the input data will be filled with the value of
Input.missing_data_value.
This method performs the following steps:
Transform the impact into a dataframe using
impact_to_dataframe.Align the
datawith the impact dataframe, usingmissing_data_valueas fill value.Align the impact dataframe with the data, using zeros as fill value.
In the aligned impact, set all values to zero where the data is NaN.
Fill remaining NaNs in data with
fillna.
- Parameters:
impact_df (pandas.DataFrame) – The impact computed by the model, transformed into a dataframe by
Input.impact_to_dataframe.- Returns:
data_aligned (pd.DataFrame) – The data aligned to the impact dataframe
impact_df_aligned (pd.DataFrame) – The impact transformed to a dataframe and aligned with the data
- __init__(hazard: ~climada.hazard.base.Hazard, exposure: ~climada.entity.exposures.base.Exposures, data: ~pandas.core.frame.DataFrame, impact_func_creator: ~typing.Callable[[...], ~climada.entity.impact_funcs.impact_func_set.ImpactFuncSet], impact_to_dataframe: ~typing.Callable[[~climada.engine.impact.Impact], ~pandas.core.frame.DataFrame], cost_func: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray | None], ~numbers.Number], bounds: ~typing.Mapping[str, ~scipy.optimize._constraints.Bounds | ~typing.Tuple[~numbers.Number, ~numbers.Number]] | None = None, constraints: ~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping | list[~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping] | None = None, impact_calc_kwds: ~typing.Mapping[str, ~typing.Any] = <factory>, missing_data_value: float = nan, df_to_numpy: ~typing.Callable[[~pandas.core.frame.DataFrame], ~numpy.ndarray] = <function Input.<lambda>>, *, data_weights: ~pandas.core.frame.DataFrame | None = None, missing_weights_value: float = 0.0, assign_centroids: dataclasses.InitVar[bool] = True) None#
- class climada.util.calibrate.base.Output(params: Mapping[str, Number], target: Number)[source]#
Generic output of a calibration task
- params#
The optimal parameters
- Type:
Mapping (str, Number)
- target#
The target function value for the optimal parameters
- Type:
Number
- to_hdf5(filepath: Path | str, mode: str = 'x')[source]#
Write the output into an H5 file
This stores the data as attributes because we only store single numbers, not arrays
- Parameters:
filepath (Path or str) – The filepath to store the data.
mode (str (optional)) – The mode for opening the file. Defaults to
x(Create file, fail if exists).
- __init__(params: Mapping[str, Number], target: Number) None#
- class climada.util.calibrate.base.OutputEvaluator(input: Input, output: Output)[source]#
Evaluate the output of a calibration task
- Parameters:
input (Input) – The input object for the optimization task.
output (Output) – The output object returned by the optimization task.
- impf_set#
The impact function set built from the optimized parameters
- Type:
climada.entity.ImpactFuncSet
- plot_at_event(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#
Create a bar plot comparing estimated model output and data per event.
Every row of the
Input.datais considered an event. The data to be plotted can be transformed with a generic functiondata_transf.- Parameters:
data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.
plot_kwargs – Keyword arguments passed to the
DataFrame.plot.barmethod.
- Returns:
ax – The plot axis returned by
DataFrame.plot.bar- Return type:
matplotlib.axes.Axes
Note
This plot does not include the ignored impact, see
Input.data.
- plot_at_region(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#
Create a bar plot comparing estimated model output and data per event
Every column of the
Input.datais considered a region. The data to be plotted can be transformed with a generic functiondata_transf.- Parameters:
data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent regions and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.
plot_kwargs – Keyword arguments passed to the
DataFrame.plot.barmethod.
- Returns:
ax – The plot axis returned by
DataFrame.plot.bar.- Return type:
matplotlib.axes.Axes
Note
This plot does not include the ignored impact, see
Input.data.
- plot_event_region_heatmap(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#
Plot a heatmap comparing all events per all regions
Every column of the
Input.datais considered a region, and every row is considered an event. The data to be plotted can be transformed with a generic functiondata_transf.- Parameters:
data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the regions, respectively. By default, the data is not transformed.
plot_kwargs – Keyword arguments passed to the
DataFrame.plot.barmethod.
- Returns:
ax – The plot axis returned by
DataFrame.plot.bar.- Return type:
matplotlib.axes.Axes
- class climada.util.calibrate.base.Optimizer(input: Input)[source]#
Abstract base class (interface) for an optimization
This defines the interface for optimizers in CLIMADA. New optimizers can be created by deriving from this class and overriding at least the
run()method.- _target_func(data: ndarray, predicted: ndarray, weights: ndarray | None) Number[source]#
Target function for the optimizer
The default version of this function simply returns the value of the cost function evaluated on the arguments.
- Parameters:
data (nd.ndarray) – The reference data used for calibration. By default, this is
Input.data.predicted (nd.ndarray) – The impact predicted by the data calibration after it has been transformed into a dataframe by
Input.impact_to_dataframe.weights (nd.ndarray) – The relative weight for each data/entry pair.
- Return type:
The value of the target function for the optimizer.
- _kwargs_to_impact_func_creator(*_, **kwargs) Dict[str, Any][source]#
Define how the parameters to
_opt_func()must be transformedOptimizers may implement different ways of representing the parameters (e.g., key-value pairs, arrays, etc.). Depending on this representation, the parameters must be transformed to match the syntax of the impact function generator used, see
Input.impact_func_creator.In this default version, the method simply returns its keyword arguments as mapping. Override this method if the optimizer used does not represent parameters as key-value pairs.
- Parameters:
kwargs – The parameters as key-value pairs.
- Return type:
The parameters as key-value pairs.
- _opt_func(*args, **kwargs) Number[source]#
The optimization function iterated by the optimizer
This function takes arbitrary arguments from the optimizer, generates a new set of impact functions from it, computes the impact, and finally calculates the target function value and returns it.
- Parameters:
args, kwargs – Arbitrary arguments from the optimizer, including parameters
- Return type:
Target function value for the given arguments
- climada.util.calibrate.cost_func.mse(data: ndarray, predicted: ndarray, weights: ndarray | None) float[source]#
Weighted mean squared error
See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html
- climada.util.calibrate.cost_func.msle(data: ndarray, predicted: ndarray, weights: ndarray | None) float[source]#
Weighted mean squared logarithmic error
See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_log_error.html
Bayesian Optimizer#
Calibration based on Bayesian optimization.
- climada.util.calibrate.bayesian_optimizer.select_best(p_space_df: DataFrame, cost_limit: float, absolute: bool = True, cost_col=('Calibration', 'Cost Function')) DataFrame[source]#
Select the best parameter space samples defined by a cost function limit
The limit is a factor of the minimum value relative to itself (
absolute=True) or to the range of cost function values (absolute=False). Acost_limitof 0.1 will select all rows where the cost function is within110% of the minimum value if
absolute=True.- 10% of the range between minimum and maximum cost function value if
absolute=False.
- Parameters:
p_space_df (pd.DataFrame) – The parameter space to select from.
cost_limit (float) – The limit factor used for selection.
absolute (bool, optional) – Whether the limit factor is applied to the minimum value (
True) or the range of values (False). Defaults toTrue.cost_col (Column specifier, optional) – The column indicating cost function values. Defaults to
("Calibration", "Cost Function").
- Returns:
A subselection of the input data frame.
- Return type:
pd.DataFrame
- class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerOutput(params: Mapping[str, Number], target: Number, p_space: TargetSpace)[source]#
Bases:
OutputOutput of a calibration with
BayesianOptimizer- p_space#
The parameter space sampled by the optimizer.
- Type:
bayes_opt.target_space.TargetSpace
- p_space_to_dataframe()[source]#
Return the sampled parameter space as pandas.DataFrame
- Returns:
Data frame whose columns are the parameter values and the associated cost function value (
Cost Function) and whose rows are the optimizer iterations.- Return type:
pandas.DataFrame
- classmethod from_hdf5(filepath: Path | str)[source]#
Read BayesianOptimizerOutput from an H5 file
Warning
This results in an object with broken
p_spaceobject. Do not further modify this parameter space. This function is only intended to load the parameter space again for analysis/plotting.
- plot_p_space(p_space_df: DataFrame | None = None, x: str | None = None, y: str | None = None, min_def: str | Tuple[str, str] | None = 'Cost Function', min_fmt: str = 'x', min_color: str = 'r', **plot_kwargs) Axes | List[Axes][source]#
Plot the parameter space as scatter plot(s)
Produce a scatter plot where each point represents a parameter combination sampled by the optimizer. The coloring represents the cost function value. If there are more than two parameters in the input data frame, this method will produce one plot for each combination of two parameters. Explicit parameter names to plot can be given via the
xandyarguments. If no data frame is provided as argument, the output ofp_space_to_dataframe()is used.- Parameters:
p_space_df (pd.DataFrame, optional) – The parameter space to plot. Defaults to the one returned by
p_space_to_dataframe()x (str, optional) – The parameter to plot on the x-axis. If
yis not given, this will plotxagainst all other parameters.y (str, optional) – The parameter to plot on the y-axis. If
xis not given, this will plotyagainst all other parameters.min_def (str, optional) – The name of the column in
p_space_dfdefining which parameter set represents the minimum, which is plotted separately. Defaults to"Cost Function". Set toNoneto avoid plotting the minimum.min_fmt (str, optional) – Plot format string for plotting the minimum. Defaults to
"x".min_color (str, optional) – Color for plotting the minimum. Defaults to
"r"(red).
- __init__(params: Mapping[str, Number], target: Number, p_space: TargetSpace) None#
- class climada.util.calibrate.bayesian_optimizer.Improvement(iteration, sample, random, target, improvement)#
Bases:
tuple- __init__()#
- count(value, /)#
Return number of occurrences of value.
- improvement#
Alias for field number 4
- index(value, start=0, stop=9223372036854775807, /)#
Return first index of value.
Raises ValueError if the value is not present.
- iteration#
Alias for field number 0
- random#
Alias for field number 2
- sample#
Alias for field number 1
- target#
Alias for field number 3
- exception climada.util.calibrate.bayesian_optimizer.StopEarly[source]#
Bases:
ExceptionAn exception for stopping an optimization iteration early
- __init__(*args, **kwargs)#
- add_note()#
Exception.add_note(note) – add a note to the exception
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerController(init_points: int, n_iter: int, min_improvement: float = 0.001, min_improvement_count: int = 2, kappa: float = 2.576, kappa_min: float = 0.1, max_iterations: int = 10, utility_func_kwargs: dict[str, int | float | str] = <factory>, _last_it_improved: int = 0, _last_it_end: int = 0)[source]#
Bases:
objectA class for controlling the iterations of a
BayesianOptimizer.Each iteration in the optimizer consists of a random sampling of the parameter space with
init_pointssteps, followed by a Gaussian process sampling withn_itersteps. During the latter, thekappaparameter is reduced to reachkappa_minat the end of the iteration. The iteration is stopped prematurely if improvements of the buest guess are belowmin_improvementformin_improvement_countconsecutive times. At the beginning of the next iteration,kappais reset to its original value.Optimization stops if
max_iterationsis reached or if an entire iteration saw now improvement.- init_points#
Number of randomly sampled points during each iteration.
- Type:
int
- n_iter#
Maximum number of points using Gaussian process sampling during each iteration.
- Type:
int
- min_improvement#
Minimal relative improvement. If improvements are below this value
min_improvement_counttimes, the iteration is stopped.- Type:
float
- min_improvement_count#
Number of times the
min_improvementmust be undercut to stop the iteration.- Type:
int
- kappa#
Parameter controlling exploration of the upper-confidence-bound acquisition function of the sampling algorithm. Lower values mean less exploration of the parameter space and more exploitation of local information. This value is reduced throughout one iteration, reaching
kappa_minat the last iteration step.- Type:
float
- max_iterations#
Maximum number of iterations before optimization is stopped, irrespective of convergence.
- Type:
int
- utility_func_kwargs#
Further keyword arguments to the
bayes_opt.UtilityFunction.- Type:
dict[str, int | float | str]
- classmethod from_input(inp: Input, sampling_base: float = 4, **kwargs)[source]#
Create a controller from a calibration input
This uses the number of parameters to determine the appropriate values for
init_pointsandn_iter. Both values are set to \(b^N\), where \(b\) is thesampling_baseparameter and \(N\) is the number of estimated parameters.- Parameters:
inp (Input) – Input to the calibration
sampling_base (float, optional) – Base for determining the sample size. Increase this for denser sampling. Defaults to 4.
kwargs – Keyword argument for the default constructor.
- optimizer_params() dict[str, int | float | str | UtilityFunction][source]#
Return parameters for the optimizer
In the current implementation, these do not change.
- update(event: str, instance: BayesianOptimization)[source]#
Update the step tracker of this instance.
For step events, check if the latest guess is the new maximum. Also check if the iteration will be stopped early.
For end events, check if any improvement occured. If not, stop the optimization.
- Parameters:
event (bayes_opt.Events) – The event descriptor
instance (bayes_opt.BayesianOptimization) – Optimization instance triggering the event
- Raises:
StopEarly – If the optimization only achieves minimal improvement, stop the iteration early with this exception.
StopIteration – If an entire iteration did not achieve improvement, stop the optimization.
- improvements() DataFrame[source]#
Return improvements as nicely formatted data
- Returns:
improvements
- Return type:
pd.DataFrame
- __init__(init_points: int, n_iter: int, min_improvement: float = 0.001, min_improvement_count: int = 2, kappa: float = 2.576, kappa_min: float = 0.1, max_iterations: int = 10, utility_func_kwargs: dict[str, int | float | str] = <factory>, _last_it_improved: int = 0, _last_it_end: int = 0) None#
- class climada.util.calibrate.bayesian_optimizer.BayesianOptimizer(input: Input, verbose: int = 0, random_state: dataclasses.InitVar[int] = 1, allow_duplicate_points: dataclasses.InitVar[bool] = True, bayes_opt_kwds: dataclasses.InitVar[Optional[Mapping[str, Any]]] = None)[source]#
Bases:
OptimizerAn optimization using
bayes_opt.BayesianOptimizationThis optimizer reports the target function value for each parameter set and maximizes that value. Therefore, a higher target function value is better. The cost function, however, is still minimized: The target function is defined as the inverse of the cost function.
For details on the underlying optimizer, see bayesian-optimization/BayesianOptimization.
- Parameters:
input (Input) – The input data for this optimizer. See the Notes below for input requirements.
verbose (int, optional) – Verbosity of the optimizer output. Defaults to 0. The output is not affected by the CLIMADA logging settings.
random_state (int, optional) – Seed for initializing the random number generator. Defaults to 1.
allow_duplicate_points (bool, optional) – Allow the optimizer to sample the same points in parameter space multiple times. This may happen if the parameter space is tightly bound or constrained. Defaults to
True.bayes_opt_kwds (dict) – Additional keyword arguments passed to the
BayesianOptimizationconstructor.
Notes
The following requirements apply to the parameters of
Inputwhen using this class:- bounds
Setting
boundsis required because the optimizer first “explores” the bound parameter space and then narrows its search to regions where the cost function is low.- constraints
Must be an instance of
scipy.minimize.LinearConstraintorscipy.minimize.NonlinearConstraint. See bayesian-optimization/BayesianOptimization for further information. Supplying contraints is optional.
- optimizer#
The optimizer instance of this class.
- Type:
bayes_opt.BayesianOptimization
- run(**opt_kwargs) BayesianOptimizerOutput[source]#
Execute the optimization
BayesianOptimizationmaximizes a target function. Therefore, this class inverts the cost function and used that as target function. The cost function is still minimized.- Parameters:
controller (BayesianOptimizerController) – The controller instance used to set the optimization iteration parameters.
kwargs – Further keyword arguments passed to
BayesianOptimization.maximize. Note that some arguments are also provided byBayesianOptimizerController.optimizer_params().
- Returns:
output – Optimization output.
BayesianOptimizerOutput.p_spacestores data on the sampled parameter space.- Return type:
- class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerOutputEvaluator(input: Input, output: BayesianOptimizerOutput)[source]#
Bases:
OutputEvaluatorEvaluate the output of
BayesianOptimizer.- Parameters:
input (Input) – The input object for the optimization task.
output (BayesianOptimizerOutput) – The output object returned by the Bayesian optimization task.
- Raises:
TypeError – If
outputis not of typeBayesianOptimizerOutput
- plot_impf_variability(p_space_df: DataFrame | None = None, plot_haz: bool = True, plot_opt_kws: dict | None = None, plot_impf_kws: dict | None = None, plot_hist_kws: dict | None = None, plot_axv_kws: dict | None = None)[source]#
Plot impact function variability with parameter combinations of almost equal cost function values
- Parameters:
p_space_df (pd.DataFrame, optional) – Parameter space to plot functions from. If
None, this uses the space returned byp_space_to_dataframe(). Useselect_best()for a convenient subselection of parameters close to the optimum.plot_haz (bool, optional) – Whether or not to plot hazard intensity distibution. Defaults to False.
plot_opt_kws (dict, optional) – Keyword arguments for optimal impact function plot. Defaults to None.
plot_impf_kws (dict, optional) – Keyword arguments for all impact function plots. Defaults to None.
plot_hist_kws (dict, optional) – Keyword arguments for hazard intensity histogram plot. Defaults to None.
plot_axv_kws (dict, optional) – Keyword arguments for hazard intensity range plot (axvspan).
- __init__(input: Input, output: BayesianOptimizerOutput) None#
- plot_at_event(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#
Create a bar plot comparing estimated model output and data per event.
Every row of the
Input.datais considered an event. The data to be plotted can be transformed with a generic functiondata_transf.- Parameters:
data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.
plot_kwargs – Keyword arguments passed to the
DataFrame.plot.barmethod.
- Returns:
ax – The plot axis returned by
DataFrame.plot.bar- Return type:
matplotlib.axes.Axes
Note
This plot does not include the ignored impact, see
Input.data.
- plot_at_region(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#
Create a bar plot comparing estimated model output and data per event
Every column of the
Input.datais considered a region. The data to be plotted can be transformed with a generic functiondata_transf.- Parameters:
data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent regions and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.
plot_kwargs – Keyword arguments passed to the
DataFrame.plot.barmethod.
- Returns:
ax – The plot axis returned by
DataFrame.plot.bar.- Return type:
matplotlib.axes.Axes
Note
This plot does not include the ignored impact, see
Input.data.
- plot_event_region_heatmap(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#
Plot a heatmap comparing all events per all regions
Every column of the
Input.datais considered a region, and every row is considered an event. The data to be plotted can be transformed with a generic functiondata_transf.- Parameters:
data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the regions, respectively. By default, the data is not transformed.
plot_kwargs – Keyword arguments passed to the
DataFrame.plot.barmethod.
- Returns:
ax – The plot axis returned by
DataFrame.plot.bar.- Return type:
matplotlib.axes.Axes
Scipy Optimizer#
Calibration based on the scipy.optimize module.
- class climada.util.calibrate.scipy_optimizer.ScipyMinimizeOptimizerOutput(params: Mapping[str, Number], target: Number, result: OptimizeResult)[source]#
Bases:
OutputOutput of a calibration with
ScipyMinimizeOptimizer- result#
The OptimizeResult instance returned by
scipy.optimize.minimize.- Type:
scipy.minimize.OptimizeResult
- __init__(params: Mapping[str, Number], target: Number, result: OptimizeResult) None#
- classmethod from_hdf5(filepath: Path | str)#
Create an output object from an H5 file
- to_hdf5(filepath: Path | str, mode: str = 'x')#
Write the output into an H5 file
This stores the data as attributes because we only store single numbers, not arrays
- Parameters:
filepath (Path or str) – The filepath to store the data.
mode (str (optional)) – The mode for opening the file. Defaults to
x(Create file, fail if exists).
- class climada.util.calibrate.scipy_optimizer.ScipyMinimizeOptimizer(input: Input)[source]#
Bases:
OptimizerAn optimization using scipy.optimize.minimize
By default, this optimizer uses the
"trust-constr"method. This is advertised as the most general minimization method of thescipypackage and supports bounds and constraints on the parameters. Users are free to choose any method of the catalogue, but must be aware that they might require different input parameters. These can be supplied via additional keyword arguments torun().See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html for details.
- Parameters:
input (Input) – The input data for this optimizer. Supported data types for
constraintmight vary depending on the minimization method used.
- run(**opt_kwargs) ScipyMinimizeOptimizerOutput[source]#
Execute the optimization
- Parameters:
params_init (Mapping (str, Number)) – The initial guess for all parameters as key-value pairs.
method (str, optional) – The minimization method applied. Defaults to
"trust-constr". See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html for details.kwargs – Additional keyword arguments passed to
scipy.optimize.minimize.
- Returns:
output – The output of the optimization. The
ScipyMinimizeOptimizerOutput.resultattribute stores the associatedscipy.optimize.OptimizeResultinstance.- Return type:
Ensemble Optimizers#
Ensemble optimizers calibrate an ensemble of optimized parameter sets from subsets of the original input by employing multiple instances of the above “default” optimizers.
This gives a better sense of uncertainty in the calibration results:
By only selecting a subset of events to calibrate on, and by repeating this process for several times, one receives a varying set of impact functions that may spread considerably, as some events might dominate the calibration.
We distinguish two cases:
The AverageEnsembleOptimizer samples a subset of all events with or without replacement.
The resulting “average ensemble” contains uncertainty information on the average impact function for all events.
The TragedyEnsembleOptimizer calibrates one impact function for each single event.
The resulting “ensemble of tragedies” encodes the inter-event uncertainty.
- climada.util.calibrate.ensemble.sample_data(data: DataFrame, sample: list[tuple[int, int]])[source]#
Return a DataFrame containing only the sampled values from the input data.
The resulting data frame has the same shape and indices ad
dataand is filled with NaNs, except for the row and column indices specified bysample.- Parameters:
data (pandas.DataFrame) – The input DataFrame from which values will be sampled.
sample (list of tuple of int) – A list of (row, column) index pairs indicating which positions to copy from
datainto the returned DataFrame.
- Returns:
A DataFrame of the same shape as
datawith NaNs in all positions except those specified insample, which contain the corresponding values fromdata.- Return type:
pandas.DataFrame
- climada.util.calibrate.ensemble.sample_weights(weights: DataFrame, sample: list[tuple[int, int]])[source]#
Return an updated DataFrame containing the appropriate weights for a sample.
Weights that are not in
sampleare set to zero, whereas weights that are sampled multiple times are effectively multiplied by their occurrence insample.- Parameters:
weights (pandas.DataFrame) – The original weights for the data
sample (list of tuple of int) – A list of (row, column) index pairs indicating which weights will be used, and how often.
- Returns:
Updated
weightsforsample.- Return type:
pandas.DataFrame
- climada.util.calibrate.ensemble.event_info_from_input(inp: Input) dict[str, Any][source]#
Get information on the event(s) for which we calibrated
This tries to retrieve the event IDs, region IDs, and event names.
- Returns:
With keys
event_id,region_id,event_name- Return type:
dict
- class climada.util.calibrate.ensemble.SingleEnsembleOptimizerOutput(params: ~typing.Mapping[str, ~numbers.Number], target: ~numbers.Number, event_info: dict[str, ~typing.Any] = <factory>)[source]#
Bases:
OutputOutput for a single member of an ensemble optimizer
This extends a regular
Outputby information on the particular event(s) this calibration was performed on.- event_info#
Information on the events for this calibration instance
- Type:
dict(str, any)
- __init__(params: ~typing.Mapping[str, ~numbers.Number], target: ~numbers.Number, event_info: dict[str, ~typing.Any] = <factory>) None#
- classmethod from_hdf5(filepath: Path | str)#
Create an output object from an H5 file
- to_hdf5(filepath: Path | str, mode: str = 'x')#
Write the output into an H5 file
This stores the data as attributes because we only store single numbers, not arrays
- Parameters:
filepath (Path or str) – The filepath to store the data.
mode (str (optional)) – The mode for opening the file. Defaults to
x(Create file, fail if exists).
- climada.util.calibrate.ensemble.optimize(optimizer_type: type[Optimizer], inp: Input, opt_init_kwargs: Mapping[str, Any], opt_run_kwargs: Mapping[str, Any]) SingleEnsembleOptimizerOutput[source]#
Instantiate an optimizer, run it, and return its output
- Parameters:
optimizer_type (type) – The type of the optimizer to use
inp (Input) – The optimizer input
opt_init_kwargs – Keyword argument for initializing the optimizer
opt_run_kwargs – Keyword argument for running the optimizer
- Returns:
The output of the optimizer
- Return type:
- class climada.util.calibrate.ensemble.EnsembleOptimizerOutput(data: DataFrame)[source]#
Bases:
objectThe collective output of an ensemble optimization
- classmethod from_outputs(outputs: Sequence[SingleEnsembleOptimizerOutput])[source]#
Build data from a list of outputs
- to_input_var(impact_func_creator: Callable[[...], ImpactFuncSet], **impfset_kwargs) InputVar[source]#
Build Unsequa InputVar from the parameters stored in this object
- plot(impact_func_creator: Callable[[...], ImpactFuncSet], **impf_set_plot_kwargs)[source]#
Plot all impact functions into the same plot
This uses the basic plot functions of
ImpactFuncSet.
- plot_shiny(impact_func_creator: Callable[[...], ImpactFuncSet], haz_type: str, impf_id: int, inp: Input | None = None, impf_plot_kwargs: Mapping[str, Any] | None = None, hazard_plot_kwargs: Mapping[str, Any] | None = None, legend: bool = True)[source]#
Plot all impact functions with appropriate color coding and event data
- Parameters:
impact_func_creator (Callable) – A function taking parameters and returning an
ImpactFuncSet.haz_type (str) – The hazard type of the impact function to plot.
impf_id (int) – The ID of the impact function to plot.
inp (Input, optional) – The input object used for the calibration. If provided, a histogram of the hazard intensity will be drawn behin the impact functions.
impf_plot_kwargs – Keyword arguments for the function plotting the impact functions.
hazard_plot_kwargs – Keyword arguments for the function plotting the hazard intensity histogram.
legend (bool) – Whether to create a legend or not. The legend may become cluttered for results of
AverageEnsembleOptimizer, therefore it is advisable to disable it in these cases.
- plot_category(impact_func_creator: Callable[[...], ImpactFuncSet], haz_type: str, impf_id: int, category: str, category_colors: Mapping[str, str | tuple] | None = None, **impf_set_plot_kwargs)[source]#
Plot impact functions with coloring according to a certain category
- Parameters:
impact_func_creator (Callable) – A function taking parameters and returning an
ImpactFuncSet.haz_type (str) – The hazard type of the impact function to plot.
impf_id (int) – The ID of the impact function to plot.
category (str) – The event information on which to categorize (can be
"region_id","event_id", or"event_name")category_colors (dict(str, str or tuple), optional) – Specify which categories to plot (keys) and what colors to use for them (values). If
None, will categorize for unique values in thecategorycolumn and color automatically.
- __init__(data: DataFrame) None#
- class climada.util.calibrate.ensemble.EnsembleOptimizer(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>)[source]#
Bases:
ABCAbstract base class for defining an ensemble optimizer.
An ensemble optimizer uses a user-defined optimizer type to run multiple calibration tasks. The tasks are defined by the
samplesattribute: For each entry insamples, a newInputis created and passed to an instance ofoptimizer_type. Derived classes need to set thesamplesduring initialization and define theinput_from_sample()method.The calibration tasks can be conducted in parallel by executing
run()withprocessesset to a value larger than 1.- optimizer_init_kwargs#
Keyword argument for initializing an instance of the chosen
optimizer_type.- Type:
dict[str, Any]
- samples#
The samples for each calibration task. Each entry is a list of tuples that encode row and column indices of the
Inputdatathat are selected for the particular calibration task. Seesample_data().- Type:
list of list of tuple(int, int)
- run(processes=1, **optimizer_run_kwargs) EnsembleOptimizerOutput[source]#
Execute the ensemble optimization
- Parameters:
processes (int, optional) – The number of processes to distribute the optimization tasks to. Defaults to 1 (no parallelization)
optimizer_run_kwargs – Additional keywords arguments for the
run()method of the particular optimizer used.
- abstractmethod input_from_sample(sample: list[tuple[int, int]]) Input[source]#
Define how an input is created from a sample
- __init__(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>) None#
- class climada.util.calibrate.ensemble.AverageEnsembleOptimizer(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, sample_fraction: dataclasses.InitVar[float] = 1.0, ensemble_size: dataclasses.InitVar[int] = 20, random_state: dataclasses.InitVar[int] = 1, replace: dataclasses.InitVar[bool] = True)[source]#
Bases:
EnsembleOptimizerAn optimizer for the “average ensemble”.
This optimizer samples a fraction of the original events in
input.data.- sample_fraction#
The fraction of data points to use for each calibration. For values > 1,
replacemust beTrue.- Type:
float
- ensemble_size#
The number of calibration tasks to perform (and hence size of the ensemble).
- Type:
int
- random_state#
The seed for the random number generator selecting the samples
- Type:
int
- replace#
If samples of the input data should be drawn with replacement
- Type:
bool
- input_from_sample(sample: list[tuple[int, int]])[source]#
Shallow-copy the input and update the data
- __init__(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, sample_fraction: dataclasses.InitVar[float] = 1.0, ensemble_size: dataclasses.InitVar[int] = 20, random_state: dataclasses.InitVar[int] = 1, replace: dataclasses.InitVar[bool] = True) None#
- run(processes=1, **optimizer_run_kwargs) EnsembleOptimizerOutput#
Execute the ensemble optimization
- Parameters:
processes (int, optional) – The number of processes to distribute the optimization tasks to. Defaults to 1 (no parallelization)
optimizer_run_kwargs – Additional keywords arguments for the
run()method of the particular optimizer used.
- class climada.util.calibrate.ensemble.TragedyEnsembleOptimizer(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, ensemble_size: dataclasses.InitVar[typing.Optional[int]] = None, random_state: dataclasses.InitVar[int] = 1)[source]#
Bases:
EnsembleOptimizerAn optimizer for the “ensemble of tragedies”.
Each sample (and thus calibration task) of this optimizer only contains a single event from
input.data.- ensemble_size#
The number of calibration tasks to perform. Defaults to
None, which means one for each data point. Must be smaller or equal to the number of data points. If smaller, random events will be left out from the ensemble calibration.- Type:
int, optional
- random_state#
The seed for the random number generator selecting the samples
- Type:
int
- __init__(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, ensemble_size: dataclasses.InitVar[typing.Optional[int]] = None, random_state: dataclasses.InitVar[int] = 1) None#
- run(processes=1, **optimizer_run_kwargs) EnsembleOptimizerOutput#
Execute the ensemble optimization
- Parameters:
processes (int, optional) – The number of processes to distribute the optimization tasks to. Defaults to 1 (no parallelization)
optimizer_run_kwargs – Additional keywords arguments for the
run()method of the particular optimizer used.