Metrics - Core metrics and classes

Function/Class Documentation

Module containing verification and performance metrics

With the exception of the ContingencyNxN and Contingency2x2 classes, the inputs for all metrics are assumed to be array-like and 1D. Bad values are assumed to be stored as NaN and these are excluded in metric calculations.

With the exception of the ContingencyNxN and Contingency2x2 classes, the inputs for all metrics are assumed to be array-like and 1D. Bad values are assumed to be stored as NaN and these are excluded in metric calculations.

Author: Steve Morley Institution: Los Alamos National Laboratory

Copyright (c) 2017, Triad National Security, LLC All rights reserved.

verify.metrics.MASE(predicted, observed)

Mean Absolute Scaled Error

  • predicted (array-like) – predicted data for which to calculate MASE

  • observed (float) – observation vector (or climatological value (scalar)) to use as reference value


out – the mean absolute scaled error of the data set

Return type:


See also



References: R.J. Hyndman and A.B. Koehler, Another look at measures of forecast accuracy, Intl. J. Forecasting, 22, pp. 679-688, 2006.

verify.metrics.RMSE(data, climate=None)

Calcualte the root mean squared error of a data set relative to a reference value

  • data (array-like) – data to calculate mean squared error, default reference is persistence

  • climate (array-like or float, optional) – Array-like (list, numpy array, etc.) or float of observed values of scalar quantity. If climate is None (default) then the accuracy is assessed relative to persistence.


out – the root-mean-squared error of the data set relative to the chosen reference

Return type:



The chosen reference can be persistence, a provided climatological mean (scalar) or a provided climatology (observation vector).

verify.metrics.Sn(data, scale=True, correct=True)

Sn statistic, a robust measure of scale

  • data (array-like) – data to calculate Sn statistic for

  • scale (boolean) – Scale so that output is the same as the standard deviation for if the distribution is normal (default=True) (default=True)

  • correct (boolean) – Set a correction factor (default=True)


Sn – the Sn statistic

Return type:


See also



Sn is more efficient than the median absolute deviation, and is not constructed with the assumption of a symmetric distribution, because it does not measure distance from an assumed central location. To quote RC1993, “…Sn looks at a typical distance between observations, which is still valid at asymmetric distributions.”

[RC1993] P.J.Rouseeuw and C.Croux, “Alternatives to the Median Absolute Deviation”, J. Amer. Stat. Assoc., 88 (424), pp.1273-1283. Equation 2.1, but note that they use “low” and “high” medians: Sn = c * 1.1926 * LOMED_{i} ( HIMED_{j} (|x_i - x_j|) )

Note that the implementation of the original formulation is slow for large n. As the original formulation is identical to using a true median for odd-length series, we do so here automatically to gain a significant speedup.

verify.metrics.absPercError(predicted, observed)

Absolute percentage error

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity


perc – Array of absolute percentage errors

Return type:


verify.metrics.accuracy(data, climate=None)

Convenience function to calculate a selection of unscaled accuracy measures

  • data (array-like) – Array-like (list, numpy array, etc.) of predictions

  • climate (array-like or float, optional) – Array-like (list, numpy array, etc.) or float of observed values of scalar quantity. If climate is None (default) then the accuracy is assessed relative to persistence.


out – Dictionary containing unscaled accuracy measures MSE - mean squared error RMSE - root mean squared error MAE - mean absolute error MdAE - median absolute error

Return type:


verify.metrics.bias(predicted, observed)

Scale-dependent bias as measured by the mean error

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity


bias – Mean error of prediction

Return type:


verify.metrics.forecastError(predicted, observed, full=True)

forecast error, defined using the sign convention of J&S ch. 5

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity

  • full (boolean, optional) – Switch determining nature of return value. When it is True (the default) the function returns the errors as well as the predicted and observed values as numpy arrays of floats, when False only the array of forecast errors is returned.


  • err (array) – the forecast error

  • pred (array) – Optional return array of predicted values as floats, included if full is True

  • obse (array) – Optional return array of observed values as floats, included if full is True


J&S: Jolliffe and Stephenson (Ch. 5)

verify.metrics.logAccuracy(predicted, observed, base=10, mask=True)

Log Accuracy Ratio, defined as log(predicted/observed) or log(predicted)-log(observed)

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity

  • base (number, optional) – Base to use for logarithmic transform (allows 10, 2, and ‘e’) (default=10)

  • mask (boolean, optional) – Switch to set masking behaviour. If True (default) the function will mask out NaN and negative values, and will return a masked array. If False, the presence of negative numbers will raise a ValueError and NaN will propagate through the calculation.


logacc – Array of absolute percentage errors

Return type:

array or masked array


Using base 2 is computationally much faster, so unless the base is important to interpretation we recommend using that.

verify.metrics.meanAPE(predicted, observed, mfunc=<function mean>)

mean absolute percentage error

  • predicted (array-like) – predicted data for which to calculate mean squared error

  • observed (float) – observation vector (or climatological value (scalar)) to use as reference value

  • mfunc (function) – function to calculate mean (default=np.mean)


mape – the mean absolute percentage error

Return type:


verify.metrics.meanAbsError(data, climate=None)

mean absolute error of a data set relative to some reference value

  • data (array-like) – data to calculate mean squared error, default reference is persistence

  • climate (array-like or float, optional) – Array-like (list, numpy array, etc.) or float of observed values of scalar quantity. If climate is None (default) then the accuracy is assessed relative to persistence.


out – the mean absolute error of the data set relative to the chosen reference

Return type:



The chosen reference can be persistence, a provided climatological mean (scalar) or a provided climatology (observation vector).

verify.metrics.meanPercentageError(predicted, observed)

Order-dependent bias as measured by the mean percentage error

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity


mpe – Mean percentage error of prediction

Return type:


verify.metrics.meanSquaredError(data, climate=None)

Mean squared error of a data set relative to a reference value

  • data (array-like) – data to calculate mean squared error, default reference is persistence

  • climate (array-like or float, optional) – Array-like (list, numpy array, etc.) or float of observed values of scalar quantity. If climate is None (default) then the accuracy is assessed relative to persistence.


out – the mean-squared-error of the data set relative to the chosen reference

Return type:


See also

RMSE, meanAbsError


The chosen reference can be persistence, a provided climatological mean (scalar), or a provided climatology (observation vector).

verify.metrics.medAbsDev(series, scale=False, median=False)

Computes the median absolute deviation from the median

  • series (array-like) – Input data

  • scale (boolean) – Scale so that median absolute deviation is the same as the standard deviation for normal distributions (default=False)

  • median (boolean) – Return the median of the series as well as the median absolute deviation (default=False)


  • mad (float) – median absolute deviation

  • perc50 (float) – median of series, optional output

verify.metrics.medAbsError(data, climate=None)

median absolute error of a data set relative to some reference value

  • data (array-like) – data to calculate median absolute error, default reference is persistence

  • climate (array-like or float, optional) – Array-like (list, numpy array, etc.) or float of observed values of scalar quantity. If climate is None (default) then the accuracy is assessed relative to persistence.


out – the median absolute error of the data set relative to the chosen reference

Return type:



The chosen reference can be persistence, a provided climatological mean (scalar) or a provided climatology (observation vector).

verify.metrics.medSymAccuracy(predicted, observed, mfunc=<function median>, method=None)

Scaled measure of accuracy that is not biased to over- or under-predictions.

  • predicted (array-like) – predicted data for which to calculate mean squared error

  • observed (float) – observation vector (or climatological value (scalar)) to use as reference value

  • mfunc (function) – function for calculating the median (default=np.median)

  • method (string, optional) – Method to use for calculating the median symmetric accuracy (MSA). Options are ‘log’ which uses the median of the re-exponentiated absolute log accuracy, ‘UPE’ which calculates MSA using the unsigned percentage error, and None (default), in which case the method is implemented as described above. The UPE method has reduced accuracy compared to the other methods and is included primarily for testing purposes.


msa – Array of median symmetric accuracy

Return type:



The accuracy ratio is given by (prediction/observation), to avoid the bias inherent in mean/median percentage error metrics we use the log of the accuracy ratio (which is symmetric about 0 for changes of the same factor). Specifically, the Median Symmetric Accuracy is found by calculating the median of the absolute log accuracy, and re-exponentiating: g = exp( median( |ln(pred) - ln(obs)| ) )

This can be expressed as a symmetric percentage error by shifting by one unit and multiplying by 100: MSA = 100*(g-1)

It can also be shown that this is identically equivalent to the median unsigned percentage error, where the unsigned relative error is given by: (y’ - x’)/x’

where y’ is always the larger of the (observation, prediction) pair, and x’ is always the smaller.

Reference: Morley, S.K., Brito, T.V., and Welling, D.T. (2018), Measures of Model Performance Based on the Log Accuracy Ratio, Space Weather, 16(1), pp. 69-88, doi: 10.102/2017SW001669.

verify.metrics.median(data, ws=None)

Weighted median

  • data (array) – Array of data values

  • ws (None or array) – None, which implies equal weighting, or an array of weights.


wmedian – (Weighted) median of input series

Return type:


verify.metrics.medianLogAccuracy(predicted, observed, mfunc=<function median>, base=10)

Order-dependent bias as measured by the median of the log accuracy ratio

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity

  • mfunc (function, optional) – Function to use for central tendency (default: numpy.median)

  • base (number, optional) – Base to use for logarithmic transform (default: 10)


mla – Median log accuracy of prediction

Return type:



Reference: Morley, S.K. (2016), Alternatives to accuracy and bias metrics based on percentage errors for radiation belt modeling applications, Los Alamos National Laboratory Report, LA-UR-15-24592.

verify.metrics.nRMSE(predicted, observed)

normalized root mean squared error of a data set relative to a reference value

  • predicted (array-like) – predicted data for which to calculate mean squared error

  • observed (float) – observation vector (or climatological value (scalar)) to use as reference value


out – the normalized root-mean-squared-error of the data set relative to the observations

Return type:


See also



The chosen reference can be an observation vector or, a provided climatological mean (scalar). This definition is due to Yu and Ridley (2002)

References: Yu, Y., and A. J. Ridley (2008), Validation of the space weather modeling framework using ground-based magnetometers, Space Weather, 6, S05002, doi:10.1029/2007SW000345.

verify.metrics.normSn(data, **kwargs)

Computes the normalized Sn statistic, a scaled measure of spread.

  • data (array-like) – data to calculate normSn statistic for

  • **kwards (dict) – Optional keyword arguements (see Sn)


normSn – the normalized Sn statistic

Return type:


See also



We here scale the Sn estimator by the median, giving a non-symmetric alternative to the robust coefficient of variation (rCV).

verify.metrics.percBetter(predict1, predict2, observed)

The percentage of cases when method A was closer to actual than method B

  • predict1 (array-like) – Array-like (list, numpy array, etc.) of predictions from model A

  • predict2 (array-like) – Array-like (list, numpy array, etc.) of predictions from model B

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity


percBetter – The percentage of observations where method A was closer to observation than method B

Return type:



For example, if we want to know whether a new forecast performs better than a reference forecast…


>>> import verify
>>> data = [3,4,5,6,7,8]
>>> p_ref = [5.5]*6 #mean prediction
>>> p_good = [4,5,4,7,7,8] #"good" model prediction
>>> verify.percBetter(p_good, p_ref, data)

That is, two-thirds (66.67%) of the predictions have a lower absolute error in p_good than in p_ref.

verify.metrics.percError(predicted, observed)

Percentage Error

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity


perc – Array of forecast errors expressed as a percentage

Return type:



robust coefficient of variation


predicted (array-like) – Predicted input


rcv – robust coefficient of variation (see notes)

Return type:



Computes the “robust coefficient of variation”, i.e. median absolute deviation divided by the median

By analogy with the coefficient of variation, which is the standard deviation divided by the mean, rCV gives the median absolute deviation (aka rSD) divided by the median, thereby providing a scaled measure of precision/spread.


robust standard deviation


predicted (array-like) – Predicted input


rsd – robust standard deviation, the scaled med abs dev

Return type:



Computes the “robust standard deviation”, i.e. the median absolute deviation times a correction factor

The median absolute deviation (medAbsDev) scaled by a factor of 1.4826 recovers the standard deviation when applied to a normal distribution. However, unlike the standard deviation the medAbsDev has a high breakdown point and is therefore considered a robust estimator.

verify.metrics.scaledAccuracy(predicted, observed)

Calculate scaled and relative accuracy measures

  • predicted (array-like) – Array-like (list, numpy array, etc.) of predictions

  • observed (array-like) – Array-like (list, numpy array, etc.) of observed values of scalar quantity


out – Dictionary containing scaled or relative accuracy measures nRMSE - normalized root mean squared error MASE - mean absolute scaled error MAPE - mean absolute percentage error MdAPE - median absolute percentage error MdSymAcc - median symmetric accuracy

Return type:


verify.metrics.scaledError(predicted, observed)

Scaled errors, see Hyndman and Koehler (2006)

  • predicted (array-like) – predicted data for which to calculate scaled error

  • observed (float) – observation vector (or climatological value (scalar)) to use as reference value


q – the scaled error

Return type:



References: R.J. Hyndman and A.B. Koehler, Another look at measures of forecast accuracy, Intl. J. Forecasting, 22, pp. 679-688, 2006.

See also


verify.metrics.skill(A_data, A_ref, A_perf=0)

Generic forecast skill score for quantifying forecast improvement

  • A_data (float) – Accuracy measure of data set

  • A_ref (float) – Accuracy measure for reference forecast

  • A_perf (float, optional) – Accuracy measure for “perfect forecast” (Default = 0)


ss_ref – Forecast skill for the given forecast, relative to the reference, using the chosen accuracy measure

Return type:



See section 7.1.4 of Wilks [2006] (Statistical methods in the atmospheric sciences) for details.

verify.metrics.symmetricSignedBias(predicted, observed)

Symmetric signed bias, expressed as a percentage

  • predicted (array-like) – List of predicted values

  • observed (array-like) – List of observed values


bias – symmetric signed bias, as a precentage

Return type:



Reference: Morley, S.K., Brito, T.V., and Welling, D.T. (2018), Measures of Model Performance Based on the Log Accuracy Ratio, Space Weather, 16(1), pp. 69-88, doi: 10.102/2017SW001669.