Impact Function Calibration Module#

What’s New?#

Since CLIMADA v6.0.1, some functionality of this module has been changed. When upgrading to a newer version of CLIMADA, please mind the following changes:

  • Input received additional attributes. We now support optional weights that are passed to the cost function. Therefore, the cost function must support an additional, optional argument.

  • cost_func now receives numpy arrays. An additional attribute df_to_numpy was added to transform pandas.DataFrame objects to np.ndarray. By default, it returns a flattened array.

  • This module now exports cost functions that support optional weights, see climada.util.calibrate.cost_func.

  • Ensemble optimizers have been addded.

Base Classes#

Generic classes for defining the data structures of this module.

class climada.util.calibrate.base.Input(hazard: ~climada.hazard.base.Hazard, exposure: ~climada.entity.exposures.base.Exposures, data: ~pandas.core.frame.DataFrame, impact_func_creator: ~typing.Callable[[...], ~climada.entity.impact_funcs.impact_func_set.ImpactFuncSet], impact_to_dataframe: ~typing.Callable[[~climada.engine.impact.Impact], ~pandas.core.frame.DataFrame], cost_func: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray | None], ~numbers.Number], bounds: ~typing.Mapping[str, ~scipy.optimize._constraints.Bounds | ~typing.Tuple[~numbers.Number, ~numbers.Number]] | None = None, constraints: ~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping | list[~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping] | None = None, impact_calc_kwds: ~typing.Mapping[str, ~typing.Any] = <factory>, missing_data_value: float = nan, df_to_numpy: ~typing.Callable[[~pandas.core.frame.DataFrame], ~numpy.ndarray] = <function Input.<lambda>>, *, data_weights: ~pandas.core.frame.DataFrame | None = None, missing_weights_value: float = 0.0, assign_centroids: dataclasses.InitVar[bool] = True)[source]#

Define the static input for a calibration task

hazard#

Hazard object to compute impacts from

Type:

climada.Hazard

exposure#

Exposures object to compute impacts from

Type:

climada.Exposures

data#

The data to compare computed impacts to. Index: Event IDs matching the IDs of hazard. Columns: Arbitrary columns. NaN values in the data frame have special meaning: Corresponding impact values computed by the model are ignored in the calibration.

Type:

pandas.DataFrame

impact_func_creator#

Function that takes the parameters as keyword arguments and returns an impact function set. This will be called each time the optimization algorithm updates the parameters.

Type:

Callable

impact_to_dataframe#

Function that takes an impact object as input and transforms its data into a pandas.DataFrame that is compatible with the format of data. The return value of this function will be passed to the cost_func as first argument.

Type:

Callable

cost_func#

Function that takes two pandas.Dataframe objects and returns the scalar “cost” between them. The optimization algorithm will try to minimize this number. The first argument is the true/correct values (data), the second argument is the estimated/predicted values, and the third argument is the data_weights. The cost function is intended to operate on numpy.ndarray objects. Dataframes are transformed using df_to_numpy.

Type:

Callable

bounds#

The bounds for the parameters. Keys: parameter names. Values: scipy.minimize.Bounds instance or tuple of minimum and maximum value. Unbounded parameters need not be specified here. See the documentation for the selected optimization algorithm on which data types are supported.

Type:

Mapping (str, {Bounds, tuple(float, float)}), optional

constraints#

One or multiple instances of scipy.minimize.LinearConstraint, scipy.minimize.NonlinearConstraint, or a mapping. See the documentation for the selected optimization algorithm on which data types are supported.

Type:

Constraint or list of Constraint, optional

impact_calc_kwds#

Keyword arguments to climada.engine.impact_calc.ImpactCalc.impact(). Defaults to {"assign_centroids": False} (by default, centroids are assigned here via the assign_centroids parameter, to avoid assigning them each time the impact is calculated).

Type:

Mapping (str, Any), optional

missing_data_value#

If the impact model returns impact data for which no values exist in data, insert this value. Defaults to NaN, in which case the impact from the model is ignored. Set this to zero to explicitly calibrate to zero impacts in these cases.

Type:

float, optional

df_to_numpy#

A function that transforms a pandas.DataFrame into a numpy.ndarray to be inserted into the cost_func. By default, this will flatten the data frame.

Type:

Callable, optional

data_weights#

Weights for each entry in data. Must have the exact same index and columns. If None, the weights will be ignored (equivalent to the same weight for each event).

Type:

pandas.DataFrame, optional

missing_weights_value#

Same as missing_data_value, but for data_weights.

Type:

float, optional

assign_centroids#

If True (default), assign the hazard centroids to the exposure when this object is created.

Type:

bool, optional

impact_to_aligned_df(impact: Impact, fillna: float = nan) Tuple[DataFrame, DataFrame][source]#

Create a dataframe from an impact and align it with the data.

When aligning, two general cases might occur, which are not mutually exclusive:

  1. There are data points for which no impact was computed. This will always be treated as an impact of zero.

  2. There are impacts for which no data points exist. For these points, the input data will be filled with the value of Input.missing_data_value.

This method performs the following steps:

  • Transform the impact into a dataframe using impact_to_dataframe.

  • Align the data with the impact dataframe, using missing_data_value as fill value.

  • Align the impact dataframe with the data, using zeros as fill value.

  • In the aligned impact, set all values to zero where the data is NaN.

  • Fill remaining NaNs in data with fillna.

Parameters:

impact_df (pandas.DataFrame) – The impact computed by the model, transformed into a dataframe by Input.impact_to_dataframe.

Returns:

  • data_aligned (pd.DataFrame) – The data aligned to the impact dataframe

  • impact_df_aligned (pd.DataFrame) – The impact transformed to a dataframe and aligned with the data

__init__(hazard: ~climada.hazard.base.Hazard, exposure: ~climada.entity.exposures.base.Exposures, data: ~pandas.core.frame.DataFrame, impact_func_creator: ~typing.Callable[[...], ~climada.entity.impact_funcs.impact_func_set.ImpactFuncSet], impact_to_dataframe: ~typing.Callable[[~climada.engine.impact.Impact], ~pandas.core.frame.DataFrame], cost_func: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray | None], ~numbers.Number], bounds: ~typing.Mapping[str, ~scipy.optimize._constraints.Bounds | ~typing.Tuple[~numbers.Number, ~numbers.Number]] | None = None, constraints: ~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping | list[~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping] | None = None, impact_calc_kwds: ~typing.Mapping[str, ~typing.Any] = <factory>, missing_data_value: float = nan, df_to_numpy: ~typing.Callable[[~pandas.core.frame.DataFrame], ~numpy.ndarray] = <function Input.<lambda>>, *, data_weights: ~pandas.core.frame.DataFrame | None = None, missing_weights_value: float = 0.0, assign_centroids: dataclasses.InitVar[bool] = True) None#
class climada.util.calibrate.base.Output(params: Mapping[str, Number], target: Number)[source]#

Generic output of a calibration task

params#

The optimal parameters

Type:

Mapping (str, Number)

target#

The target function value for the optimal parameters

Type:

Number

to_hdf5(filepath: Path | str, mode: str = 'x')[source]#

Write the output into an H5 file

This stores the data as attributes because we only store single numbers, not arrays

Parameters:
  • filepath (Path or str) – The filepath to store the data.

  • mode (str (optional)) – The mode for opening the file. Defaults to x (Create file, fail if exists).

classmethod from_hdf5(filepath: Path | str)[source]#

Create an output object from an H5 file

__init__(params: Mapping[str, Number], target: Number) None#
class climada.util.calibrate.base.OutputEvaluator(input: Input, output: Output)[source]#

Evaluate the output of a calibration task

Parameters:
  • input (Input) – The input object for the optimization task.

  • output (Output) – The output object returned by the optimization task.

impf_set#

The impact function set built from the optimized parameters

Type:

climada.entity.ImpactFuncSet

impact#

An impact object calculated using the optimal impf_set

Type:

climada.engine.Impact

plot_at_event(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#

Create a bar plot comparing estimated model output and data per event.

Every row of the Input.data is considered an event. The data to be plotted can be transformed with a generic function data_transf.

Parameters:
  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the DataFrame.plot.bar method.

Returns:

ax – The plot axis returned by DataFrame.plot.bar

Return type:

matplotlib.axes.Axes

Note

This plot does not include the ignored impact, see Input.data.

plot_at_region(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#

Create a bar plot comparing estimated model output and data per event

Every column of the Input.data is considered a region. The data to be plotted can be transformed with a generic function data_transf.

Parameters:
  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent regions and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the DataFrame.plot.bar method.

Returns:

ax – The plot axis returned by DataFrame.plot.bar.

Return type:

matplotlib.axes.Axes

Note

This plot does not include the ignored impact, see Input.data.

plot_event_region_heatmap(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#

Plot a heatmap comparing all events per all regions

Every column of the Input.data is considered a region, and every row is considered an event. The data to be plotted can be transformed with a generic function data_transf.

Parameters:
  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the regions, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the DataFrame.plot.bar method.

Returns:

ax – The plot axis returned by DataFrame.plot.bar.

Return type:

matplotlib.axes.Axes

__init__(input: Input, output: Output) None#
class climada.util.calibrate.base.Optimizer(input: Input)[source]#

Abstract base class (interface) for an optimization

This defines the interface for optimizers in CLIMADA. New optimizers can be created by deriving from this class and overriding at least the run() method.

input#

The input object for the optimization task. See Input.

Type:

Input

_target_func(data: ndarray, predicted: ndarray, weights: ndarray | None) Number[source]#

Target function for the optimizer

The default version of this function simply returns the value of the cost function evaluated on the arguments.

Parameters:
  • data (nd.ndarray) – The reference data used for calibration. By default, this is Input.data.

  • predicted (nd.ndarray) – The impact predicted by the data calibration after it has been transformed into a dataframe by Input.impact_to_dataframe.

  • weights (nd.ndarray) – The relative weight for each data/entry pair.

Return type:

The value of the target function for the optimizer.

_kwargs_to_impact_func_creator(*_, **kwargs) Dict[str, Any][source]#

Define how the parameters to _opt_func() must be transformed

Optimizers may implement different ways of representing the parameters (e.g., key-value pairs, arrays, etc.). Depending on this representation, the parameters must be transformed to match the syntax of the impact function generator used, see Input.impact_func_creator.

In this default version, the method simply returns its keyword arguments as mapping. Override this method if the optimizer used does not represent parameters as key-value pairs.

Parameters:

kwargs – The parameters as key-value pairs.

Return type:

The parameters as key-value pairs.

_opt_func(*args, **kwargs) Number[source]#

The optimization function iterated by the optimizer

This function takes arbitrary arguments from the optimizer, generates a new set of impact functions from it, computes the impact, and finally calculates the target function value and returns it.

Parameters:

args, kwargs – Arbitrary arguments from the optimizer, including parameters

Return type:

Target function value for the given arguments

abstractmethod run(**opt_kwargs) Output[source]#

Execute the optimization

__init__(input: Input) None#
climada.util.calibrate.cost_func.mse(data: ndarray, predicted: ndarray, weights: ndarray | None) float[source]#

Weighted mean squared error

See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html

climada.util.calibrate.cost_func.msle(data: ndarray, predicted: ndarray, weights: ndarray | None) float[source]#

Weighted mean squared logarithmic error

See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_log_error.html

Bayesian Optimizer#

Calibration based on Bayesian optimization.

climada.util.calibrate.bayesian_optimizer.select_best(p_space_df: DataFrame, cost_limit: float, absolute: bool = True, cost_col=('Calibration', 'Cost Function')) DataFrame[source]#

Select the best parameter space samples defined by a cost function limit

The limit is a factor of the minimum value relative to itself (absolute=True) or to the range of cost function values (absolute=False). A cost_limit of 0.1 will select all rows where the cost function is within

  • 110% of the minimum value if absolute=True.

  • 10% of the range between minimum and maximum cost function value if

    absolute=False.

Parameters:
  • p_space_df (pd.DataFrame) – The parameter space to select from.

  • cost_limit (float) – The limit factor used for selection.

  • absolute (bool, optional) – Whether the limit factor is applied to the minimum value (True) or the range of values (False). Defaults to True.

  • cost_col (Column specifier, optional) – The column indicating cost function values. Defaults to ("Calibration", "Cost Function").

Returns:

A subselection of the input data frame.

Return type:

pd.DataFrame

class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerOutput(params: Mapping[str, Number], target: Number, p_space: TargetSpace)[source]#

Bases: Output

Output of a calibration with BayesianOptimizer

p_space#

The parameter space sampled by the optimizer.

Type:

bayes_opt.target_space.TargetSpace

p_space_to_dataframe()[source]#

Return the sampled parameter space as pandas.DataFrame

Returns:

Data frame whose columns are the parameter values and the associated cost function value (Cost Function) and whose rows are the optimizer iterations.

Return type:

pandas.DataFrame

to_hdf5(filepath: Path | str, mode: str = 'x')[source]#

Write this output to an H5 file

classmethod from_hdf5(filepath: Path | str)[source]#

Read BayesianOptimizerOutput from an H5 file

Warning

This results in an object with broken p_space object. Do not further modify this parameter space. This function is only intended to load the parameter space again for analysis/plotting.

plot_p_space(p_space_df: DataFrame | None = None, x: str | None = None, y: str | None = None, min_def: str | Tuple[str, str] | None = 'Cost Function', min_fmt: str = 'x', min_color: str = 'r', **plot_kwargs) Axes | List[Axes][source]#

Plot the parameter space as scatter plot(s)

Produce a scatter plot where each point represents a parameter combination sampled by the optimizer. The coloring represents the cost function value. If there are more than two parameters in the input data frame, this method will produce one plot for each combination of two parameters. Explicit parameter names to plot can be given via the x and y arguments. If no data frame is provided as argument, the output of p_space_to_dataframe() is used.

Parameters:
  • p_space_df (pd.DataFrame, optional) – The parameter space to plot. Defaults to the one returned by p_space_to_dataframe()

  • x (str, optional) – The parameter to plot on the x-axis. If y is not given, this will plot x against all other parameters.

  • y (str, optional) – The parameter to plot on the y-axis. If x is not given, this will plot y against all other parameters.

  • min_def (str, optional) – The name of the column in p_space_df defining which parameter set represents the minimum, which is plotted separately. Defaults to "Cost Function". Set to None to avoid plotting the minimum.

  • min_fmt (str, optional) – Plot format string for plotting the minimum. Defaults to "x".

  • min_color (str, optional) – Color for plotting the minimum. Defaults to "r" (red).

__init__(params: Mapping[str, Number], target: Number, p_space: TargetSpace) None#
class climada.util.calibrate.bayesian_optimizer.Improvement(iteration, sample, random, target, improvement)#

Bases: tuple

__init__()#
count(value, /)#

Return number of occurrences of value.

improvement#

Alias for field number 4

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

iteration#

Alias for field number 0

random#

Alias for field number 2

sample#

Alias for field number 1

target#

Alias for field number 3

exception climada.util.calibrate.bayesian_optimizer.StopEarly[source]#

Bases: Exception

An exception for stopping an optimization iteration early

__init__(*args, **kwargs)#
add_note()#

Exception.add_note(note) – add a note to the exception

with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerController(init_points: int, n_iter: int, min_improvement: float = 0.001, min_improvement_count: int = 2, kappa: float = 2.576, kappa_min: float = 0.1, max_iterations: int = 10, utility_func_kwargs: dict[str, int | float | str] = <factory>, _last_it_improved: int = 0, _last_it_end: int = 0)[source]#

Bases: object

A class for controlling the iterations of a BayesianOptimizer.

Each iteration in the optimizer consists of a random sampling of the parameter space with init_points steps, followed by a Gaussian process sampling with n_iter steps. During the latter, the kappa parameter is reduced to reach kappa_min at the end of the iteration. The iteration is stopped prematurely if improvements of the buest guess are below min_improvement for min_improvement_count consecutive times. At the beginning of the next iteration, kappa is reset to its original value.

Optimization stops if max_iterations is reached or if an entire iteration saw now improvement.

init_points#

Number of randomly sampled points during each iteration.

Type:

int

n_iter#

Maximum number of points using Gaussian process sampling during each iteration.

Type:

int

min_improvement#

Minimal relative improvement. If improvements are below this value min_improvement_count times, the iteration is stopped.

Type:

float

min_improvement_count#

Number of times the min_improvement must be undercut to stop the iteration.

Type:

int

kappa#

Parameter controlling exploration of the upper-confidence-bound acquisition function of the sampling algorithm. Lower values mean less exploration of the parameter space and more exploitation of local information. This value is reduced throughout one iteration, reaching kappa_min at the last iteration step.

Type:

float

kappa_min#

Minimal value of kappa after n_iter steps.

Type:

float

max_iterations#

Maximum number of iterations before optimization is stopped, irrespective of convergence.

Type:

int

utility_func_kwargs#

Further keyword arguments to the bayes_opt.UtilityFunction.

Type:

dict[str, int | float | str]

classmethod from_input(inp: Input, sampling_base: float = 4, **kwargs)[source]#

Create a controller from a calibration input

This uses the number of parameters to determine the appropriate values for init_points and n_iter. Both values are set to \(b^N\), where \(b\) is the sampling_base parameter and \(N\) is the number of estimated parameters.

Parameters:
  • inp (Input) – Input to the calibration

  • sampling_base (float, optional) – Base for determining the sample size. Increase this for denser sampling. Defaults to 4.

  • kwargs – Keyword argument for the default constructor.

optimizer_params() dict[str, int | float | str | UtilityFunction][source]#

Return parameters for the optimizer

In the current implementation, these do not change.

update(event: str, instance: BayesianOptimization)[source]#

Update the step tracker of this instance.

For step events, check if the latest guess is the new maximum. Also check if the iteration will be stopped early.

For end events, check if any improvement occured. If not, stop the optimization.

Parameters:
  • event (bayes_opt.Events) – The event descriptor

  • instance (bayes_opt.BayesianOptimization) – Optimization instance triggering the event

Raises:
  • StopEarly – If the optimization only achieves minimal improvement, stop the iteration early with this exception.

  • StopIteration – If an entire iteration did not achieve improvement, stop the optimization.

improvements() DataFrame[source]#

Return improvements as nicely formatted data

Returns:

improvements

Return type:

pd.DataFrame

__init__(init_points: int, n_iter: int, min_improvement: float = 0.001, min_improvement_count: int = 2, kappa: float = 2.576, kappa_min: float = 0.1, max_iterations: int = 10, utility_func_kwargs: dict[str, int | float | str] = <factory>, _last_it_improved: int = 0, _last_it_end: int = 0) None#
class climada.util.calibrate.bayesian_optimizer.BayesianOptimizer(input: Input, verbose: int = 0, random_state: dataclasses.InitVar[int] = 1, allow_duplicate_points: dataclasses.InitVar[bool] = True, bayes_opt_kwds: dataclasses.InitVar[Optional[Mapping[str, Any]]] = None)[source]#

Bases: Optimizer

An optimization using bayes_opt.BayesianOptimization

This optimizer reports the target function value for each parameter set and maximizes that value. Therefore, a higher target function value is better. The cost function, however, is still minimized: The target function is defined as the inverse of the cost function.

For details on the underlying optimizer, see bayesian-optimization/BayesianOptimization.

Parameters:
  • input (Input) – The input data for this optimizer. See the Notes below for input requirements.

  • verbose (int, optional) – Verbosity of the optimizer output. Defaults to 0. The output is not affected by the CLIMADA logging settings.

  • random_state (int, optional) – Seed for initializing the random number generator. Defaults to 1.

  • allow_duplicate_points (bool, optional) – Allow the optimizer to sample the same points in parameter space multiple times. This may happen if the parameter space is tightly bound or constrained. Defaults to True.

  • bayes_opt_kwds (dict) – Additional keyword arguments passed to the BayesianOptimization constructor.

Notes

The following requirements apply to the parameters of Input when using this class:

bounds

Setting bounds is required because the optimizer first “explores” the bound parameter space and then narrows its search to regions where the cost function is low.

constraints

Must be an instance of scipy.minimize.LinearConstraint or scipy.minimize.NonlinearConstraint. See bayesian-optimization/BayesianOptimization for further information. Supplying contraints is optional.

optimizer#

The optimizer instance of this class.

Type:

bayes_opt.BayesianOptimization

run(**opt_kwargs) BayesianOptimizerOutput[source]#

Execute the optimization

BayesianOptimization maximizes a target function. Therefore, this class inverts the cost function and used that as target function. The cost function is still minimized.

Parameters:
  • controller (BayesianOptimizerController) – The controller instance used to set the optimization iteration parameters.

  • kwargs – Further keyword arguments passed to BayesianOptimization.maximize. Note that some arguments are also provided by BayesianOptimizerController.optimizer_params().

Returns:

output – Optimization output. BayesianOptimizerOutput.p_space stores data on the sampled parameter space.

Return type:

BayesianOptimizerOutput

__init__(input: Input, verbose: int = 0, random_state: dataclasses.InitVar[int] = 1, allow_duplicate_points: dataclasses.InitVar[bool] = True, bayes_opt_kwds: dataclasses.InitVar[Optional[Mapping[str, Any]]] = None) None#
class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerOutputEvaluator(input: Input, output: BayesianOptimizerOutput)[source]#

Bases: OutputEvaluator

Evaluate the output of BayesianOptimizer.

Parameters:
  • input (Input) – The input object for the optimization task.

  • output (BayesianOptimizerOutput) – The output object returned by the Bayesian optimization task.

Raises:

TypeError – If output is not of type BayesianOptimizerOutput

plot_impf_variability(p_space_df: DataFrame | None = None, plot_haz: bool = True, plot_opt_kws: dict | None = None, plot_impf_kws: dict | None = None, plot_hist_kws: dict | None = None, plot_axv_kws: dict | None = None)[source]#

Plot impact function variability with parameter combinations of almost equal cost function values

Parameters:
  • p_space_df (pd.DataFrame, optional) – Parameter space to plot functions from. If None, this uses the space returned by p_space_to_dataframe(). Use select_best() for a convenient subselection of parameters close to the optimum.

  • plot_haz (bool, optional) – Whether or not to plot hazard intensity distibution. Defaults to False.

  • plot_opt_kws (dict, optional) – Keyword arguments for optimal impact function plot. Defaults to None.

  • plot_impf_kws (dict, optional) – Keyword arguments for all impact function plots. Defaults to None.

  • plot_hist_kws (dict, optional) – Keyword arguments for hazard intensity histogram plot. Defaults to None.

  • plot_axv_kws (dict, optional) – Keyword arguments for hazard intensity range plot (axvspan).

__init__(input: Input, output: BayesianOptimizerOutput) None#
plot_at_event(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#

Create a bar plot comparing estimated model output and data per event.

Every row of the Input.data is considered an event. The data to be plotted can be transformed with a generic function data_transf.

Parameters:
  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the DataFrame.plot.bar method.

Returns:

ax – The plot axis returned by DataFrame.plot.bar

Return type:

matplotlib.axes.Axes

Note

This plot does not include the ignored impact, see Input.data.

plot_at_region(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#

Create a bar plot comparing estimated model output and data per event

Every column of the Input.data is considered a region. The data to be plotted can be transformed with a generic function data_transf.

Parameters:
  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent regions and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the DataFrame.plot.bar method.

Returns:

ax – The plot axis returned by DataFrame.plot.bar.

Return type:

matplotlib.axes.Axes

Note

This plot does not include the ignored impact, see Input.data.

plot_event_region_heatmap(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#

Plot a heatmap comparing all events per all regions

Every column of the Input.data is considered a region, and every row is considered an event. The data to be plotted can be transformed with a generic function data_transf.

Parameters:
  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the regions, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the DataFrame.plot.bar method.

Returns:

ax – The plot axis returned by DataFrame.plot.bar.

Return type:

matplotlib.axes.Axes

Scipy Optimizer#

Calibration based on the scipy.optimize module.

class climada.util.calibrate.scipy_optimizer.ScipyMinimizeOptimizerOutput(params: Mapping[str, Number], target: Number, result: OptimizeResult)[source]#

Bases: Output

Output of a calibration with ScipyMinimizeOptimizer

result#

The OptimizeResult instance returned by scipy.optimize.minimize.

Type:

scipy.minimize.OptimizeResult

__init__(params: Mapping[str, Number], target: Number, result: OptimizeResult) None#
classmethod from_hdf5(filepath: Path | str)#

Create an output object from an H5 file

to_hdf5(filepath: Path | str, mode: str = 'x')#

Write the output into an H5 file

This stores the data as attributes because we only store single numbers, not arrays

Parameters:
  • filepath (Path or str) – The filepath to store the data.

  • mode (str (optional)) – The mode for opening the file. Defaults to x (Create file, fail if exists).

class climada.util.calibrate.scipy_optimizer.ScipyMinimizeOptimizer(input: Input)[source]#

Bases: Optimizer

An optimization using scipy.optimize.minimize

By default, this optimizer uses the "trust-constr" method. This is advertised as the most general minimization method of the scipy package and supports bounds and constraints on the parameters. Users are free to choose any method of the catalogue, but must be aware that they might require different input parameters. These can be supplied via additional keyword arguments to run().

See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html for details.

Parameters:

input (Input) – The input data for this optimizer. Supported data types for constraint might vary depending on the minimization method used.

run(**opt_kwargs) ScipyMinimizeOptimizerOutput[source]#

Execute the optimization

Parameters:
Returns:

output – The output of the optimization. The ScipyMinimizeOptimizerOutput.result attribute stores the associated scipy.optimize.OptimizeResult instance.

Return type:

ScipyMinimizeOptimizerOutput

__init__(input: Input) None#

Ensemble Optimizers#

Ensemble optimizers calibrate an ensemble of optimized parameter sets from subsets of the original input by employing multiple instances of the above “default” optimizers. This gives a better sense of uncertainty in the calibration results: By only selecting a subset of events to calibrate on, and by repeating this process for several times, one receives a varying set of impact functions that may spread considerably, as some events might dominate the calibration. We distinguish two cases: The AverageEnsembleOptimizer samples a subset of all events with or without replacement. The resulting “average ensemble” contains uncertainty information on the average impact function for all events. The TragedyEnsembleOptimizer calibrates one impact function for each single event. The resulting “ensemble of tragedies” encodes the inter-event uncertainty.

climada.util.calibrate.ensemble.sample_data(data: DataFrame, sample: list[tuple[int, int]])[source]#

Return a DataFrame containing only the sampled values from the input data.

The resulting data frame has the same shape and indices ad data and is filled with NaNs, except for the row and column indices specified by sample.

Parameters:
  • data (pandas.DataFrame) – The input DataFrame from which values will be sampled.

  • sample (list of tuple of int) – A list of (row, column) index pairs indicating which positions to copy from data into the returned DataFrame.

Returns:

A DataFrame of the same shape as data with NaNs in all positions except those specified in sample, which contain the corresponding values from data.

Return type:

pandas.DataFrame

climada.util.calibrate.ensemble.sample_weights(weights: DataFrame, sample: list[tuple[int, int]])[source]#

Return an updated DataFrame containing the appropriate weights for a sample.

Weights that are not in sample are set to zero, whereas weights that are sampled multiple times are effectively multiplied by their occurrence in sample.

Parameters:
  • weights (pandas.DataFrame) – The original weights for the data

  • sample (list of tuple of int) – A list of (row, column) index pairs indicating which weights will be used, and how often.

Returns:

Updated weights for sample.

Return type:

pandas.DataFrame

climada.util.calibrate.ensemble.event_info_from_input(inp: Input) dict[str, Any][source]#

Get information on the event(s) for which we calibrated

This tries to retrieve the event IDs, region IDs, and event names.

Returns:

With keys event_id, region_id, event_name

Return type:

dict

class climada.util.calibrate.ensemble.SingleEnsembleOptimizerOutput(params: ~typing.Mapping[str, ~numbers.Number], target: ~numbers.Number, event_info: dict[str, ~typing.Any] = <factory>)[source]#

Bases: Output

Output for a single member of an ensemble optimizer

This extends a regular Output by information on the particular event(s) this calibration was performed on.

event_info#

Information on the events for this calibration instance

Type:

dict(str, any)

__init__(params: ~typing.Mapping[str, ~numbers.Number], target: ~numbers.Number, event_info: dict[str, ~typing.Any] = <factory>) None#
classmethod from_hdf5(filepath: Path | str)#

Create an output object from an H5 file

to_hdf5(filepath: Path | str, mode: str = 'x')#

Write the output into an H5 file

This stores the data as attributes because we only store single numbers, not arrays

Parameters:
  • filepath (Path or str) – The filepath to store the data.

  • mode (str (optional)) – The mode for opening the file. Defaults to x (Create file, fail if exists).

climada.util.calibrate.ensemble.optimize(optimizer_type: type[Optimizer], inp: Input, opt_init_kwargs: Mapping[str, Any], opt_run_kwargs: Mapping[str, Any]) SingleEnsembleOptimizerOutput[source]#

Instantiate an optimizer, run it, and return its output

Parameters:
  • optimizer_type (type) – The type of the optimizer to use

  • inp (Input) – The optimizer input

  • opt_init_kwargs – Keyword argument for initializing the optimizer

  • opt_run_kwargs – Keyword argument for running the optimizer

Returns:

The output of the optimizer

Return type:

SingleEnsembleOptimizerOutput

class climada.util.calibrate.ensemble.EnsembleOptimizerOutput(data: DataFrame)[source]#

Bases: object

The collective output of an ensemble optimization

classmethod from_outputs(outputs: Sequence[SingleEnsembleOptimizerOutput])[source]#

Build data from a list of outputs

to_hdf(filepath: Path | str)[source]#

Store data to HDF5

classmethod from_hdf(filepath: Path | str)[source]#

Load data from HDF

classmethod from_csv(filepath: Path | str)[source]#

Load data from CSV

to_csv(filepath: Path | str)[source]#

Store data as CSV

to_input_var(impact_func_creator: Callable[[...], ImpactFuncSet], **impfset_kwargs) InputVar[source]#

Build Unsequa InputVar from the parameters stored in this object

plot(impact_func_creator: Callable[[...], ImpactFuncSet], **impf_set_plot_kwargs)[source]#

Plot all impact functions into the same plot

This uses the basic plot functions of ImpactFuncSet.

plot_shiny(impact_func_creator: Callable[[...], ImpactFuncSet], haz_type: str, impf_id: int, inp: Input | None = None, impf_plot_kwargs: Mapping[str, Any] | None = None, hazard_plot_kwargs: Mapping[str, Any] | None = None, legend: bool = True)[source]#

Plot all impact functions with appropriate color coding and event data

Parameters:
  • impact_func_creator (Callable) – A function taking parameters and returning an ImpactFuncSet.

  • haz_type (str) – The hazard type of the impact function to plot.

  • impf_id (int) – The ID of the impact function to plot.

  • inp (Input, optional) – The input object used for the calibration. If provided, a histogram of the hazard intensity will be drawn behin the impact functions.

  • impf_plot_kwargs – Keyword arguments for the function plotting the impact functions.

  • hazard_plot_kwargs – Keyword arguments for the function plotting the hazard intensity histogram.

  • legend (bool) – Whether to create a legend or not. The legend may become cluttered for results of AverageEnsembleOptimizer, therefore it is advisable to disable it in these cases.

plot_category(impact_func_creator: Callable[[...], ImpactFuncSet], haz_type: str, impf_id: int, category: str, category_colors: Mapping[str, str | tuple] | None = None, **impf_set_plot_kwargs)[source]#

Plot impact functions with coloring according to a certain category

Parameters:
  • impact_func_creator (Callable) – A function taking parameters and returning an ImpactFuncSet.

  • haz_type (str) – The hazard type of the impact function to plot.

  • impf_id (int) – The ID of the impact function to plot.

  • category (str) – The event information on which to categorize (can be "region_id", "event_id", or "event_name")

  • category_colors (dict(str, str or tuple), optional) – Specify which categories to plot (keys) and what colors to use for them (values). If None, will categorize for unique values in the category column and color automatically.

__init__(data: DataFrame) None#
class climada.util.calibrate.ensemble.EnsembleOptimizer(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>)[source]#

Bases: ABC

Abstract base class for defining an ensemble optimizer.

An ensemble optimizer uses a user-defined optimizer type to run multiple calibration tasks. The tasks are defined by the samples attribute: For each entry in samples, a new Input is created and passed to an instance of optimizer_type. Derived classes need to set the samples during initialization and define the input_from_sample() method.

The calibration tasks can be conducted in parallel by executing run() with processes set to a value larger than 1.

input#

The generic input for the optimization

Type:

Input

optimizer_type#

The type of the optimizer to use for each calibration task

Type:

type[Optimizer]

optimizer_init_kwargs#

Keyword argument for initializing an instance of the chosen optimizer_type.

Type:

dict[str, Any]

samples#

The samples for each calibration task. Each entry is a list of tuples that encode row and column indices of the Input data that are selected for the particular calibration task. See sample_data().

Type:

list of list of tuple(int, int)

run(processes=1, **optimizer_run_kwargs) EnsembleOptimizerOutput[source]#

Execute the ensemble optimization

Parameters:
  • processes (int, optional) – The number of processes to distribute the optimization tasks to. Defaults to 1 (no parallelization)

  • optimizer_run_kwargs – Additional keywords arguments for the run() method of the particular optimizer used.

abstractmethod input_from_sample(sample: list[tuple[int, int]]) Input[source]#

Define how an input is created from a sample

__init__(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>) None#
class climada.util.calibrate.ensemble.AverageEnsembleOptimizer(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, sample_fraction: dataclasses.InitVar[float] = 1.0, ensemble_size: dataclasses.InitVar[int] = 20, random_state: dataclasses.InitVar[int] = 1, replace: dataclasses.InitVar[bool] = True)[source]#

Bases: EnsembleOptimizer

An optimizer for the “average ensemble”.

This optimizer samples a fraction of the original events in input. data.

sample_fraction#

The fraction of data points to use for each calibration. For values > 1, replace must be True.

Type:

float

ensemble_size#

The number of calibration tasks to perform (and hence size of the ensemble).

Type:

int

random_state#

The seed for the random number generator selecting the samples

Type:

int

replace#

If samples of the input data should be drawn with replacement

Type:

bool

input_from_sample(sample: list[tuple[int, int]])[source]#

Shallow-copy the input and update the data

__init__(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, sample_fraction: dataclasses.InitVar[float] = 1.0, ensemble_size: dataclasses.InitVar[int] = 20, random_state: dataclasses.InitVar[int] = 1, replace: dataclasses.InitVar[bool] = True) None#
run(processes=1, **optimizer_run_kwargs) EnsembleOptimizerOutput#

Execute the ensemble optimization

Parameters:
  • processes (int, optional) – The number of processes to distribute the optimization tasks to. Defaults to 1 (no parallelization)

  • optimizer_run_kwargs – Additional keywords arguments for the run() method of the particular optimizer used.

class climada.util.calibrate.ensemble.TragedyEnsembleOptimizer(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, ensemble_size: dataclasses.InitVar[typing.Optional[int]] = None, random_state: dataclasses.InitVar[int] = 1)[source]#

Bases: EnsembleOptimizer

An optimizer for the “ensemble of tragedies”.

Each sample (and thus calibration task) of this optimizer only contains a single event from input. data.

ensemble_size#

The number of calibration tasks to perform. Defaults to None, which means one for each data point. Must be smaller or equal to the number of data points. If smaller, random events will be left out from the ensemble calibration.

Type:

int, optional

random_state#

The seed for the random number generator selecting the samples

Type:

int

input_from_sample(sample: list[tuple[int, int]])[source]#

Subselect all input

__init__(input: ~climada.util.calibrate.base.Input, optimizer_type: type[~climada.util.calibrate.base.Optimizer], optimizer_init_kwargs: dict[str, ~typing.Any] = <factory>, ensemble_size: dataclasses.InitVar[typing.Optional[int]] = None, random_state: dataclasses.InitVar[int] = 1) None#
run(processes=1, **optimizer_run_kwargs) EnsembleOptimizerOutput#

Execute the ensemble optimization

Parameters:
  • processes (int, optional) – The number of processes to distribute the optimization tasks to. Defaults to 1 (no parallelization)

  • optimizer_run_kwargs – Additional keywords arguments for the run() method of the particular optimizer used.