Contingency Tables¶

Function/Class Documentation¶

Module containing contingency table classes and related metrics

Contingency tables are widely used in analysis of categorical data. The simplest, and most common, is binary event analysis, where two categories are given (event/non-event). This can be generalized to N categories that constitute different types of event.

ContingencyNxN and Contingency2x2 classes are currently provided. Each class can be constructed with the numbers of true positives, false positives, false negatives, and true negatives. The Contingency2x2 class can also be constructed using the fromBoolean class method by providing arrays of True/False.

Author: Steve Morley Institution: Los Alamos National Laboratory

class verify.categorical.Contingency2x2(input_array, attrs=None, dtype=None)¶

Class to work with 2x2 contingency tables for forecast verification

The table is defined following the standard presentation in works such as Wilks [2006], where the columns are observations and the rows are predictions. For a binary forecast, this gives a table

		Observed
		Y	N
Predicted	Y	True Positive	False Positive
Predicted	N	False Negative	True Negative

Note that in many machine learning applications this table is called a ``confusion matrix’’ and the columns and rows are often transposed.

Wilks, D.S. (2006), Statistical Methods in the Atmospheric Sciences, 2nd Ed. Academic Press, Elsevier, Burlington, MA.

Examples

Duplicating the Finley[1884] tornado forecasts [Wilks, 2006, pp267-268]

>>> import verify
>>> tt = verify.Contingency2x2([[28,72],[23,2680]])
>>> tt.sum()
2803
>>> tt.threat()
0.22764227642276422
>>> tt.heidke()
0.35532486145845693
>>> tt.peirce()
0.52285681714546284

FAR(ci=None)¶

False Alarm Ratio, the fraction of incorrect “yes” forecasts

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: far – The false alarm ratio of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

MatthewsCC(ci=None)¶

Matthews Correlation Coefficient

Examples

>>> event_series = [ True,  True,  True, False]
>>> pred_series  = [ True, False,  True,  True]
>>> ct = verify.Contingency2x2.fromBoolean(pred_series, event_series)
>>> ct.MatthewsCC()
-0.333...

PC(ci=None)¶

Returns the Proportion Correct (PC) for the 2x2 contingency table

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: pc – Returns and updates ‘PC’ attribute
Return type:: float

POD(ci=None)¶

Calculate the Probability of Detection, a.k.a. hit rate (ratio of correct forecasts to number of event occurrences)

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: hitrate – The hit rate of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

POFD(ci=None)¶

Calculate the Probability of False Detection (POFD), a.k.a. False Alarm Rate

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: pofd – The probability of false detection of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

bias(ci=None)¶

The frequency bias of the forecast calculated as the ratio of yes forecasts to number of yes events

Returns:: bias – The bias of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

Notes

An unbiased forecast will have bias=1, showing that the number of forecasts is the same as the number of events. Bias>1 means that more events were forecast than observed (overforecast).

equitableThreat(ci=None)¶

Calculate the Equitable Threat Score (a.k.a. Gilbert Skill Score)

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: thr – The threat score of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

Notes

This is a ratio of verification, i.e., the proportion of correct forecasts after removing correct “no” forecasts (or ‘true negatives’).

classmethod fromBoolean(predicted, observed)¶: Construct a 2x2 contingency table from two boolean input arrays

heidke(ci=None)¶

Calculate the Heidke Skill Score for the 2x2 contingency table

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: hss – The Heidke skill score of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

Notes

This is a skill score based on the proportion of correct forecasts referred to the proportion expected correct by chance.

majorityClassFraction()¶: Proportion Correct (a.k.a. “accuracy” in machine learning) for majority classifier

oddsRatio()¶

Calculate the odds ratio for the 2x2 contingency table

Returns:: odds – The odds ratio for the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

peirce(ci=None)¶

Calculate the Peirce Skill Score for the 2x2 contingency table

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: pss – The Peirce skill score of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

summary(verbose=False, ci=None)¶

Summary table

Parameters:

verbose (boolean) – Print output to stdout (default=False)
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

threat(ci=None)¶

Calculate the Threat Score (a.k.a. critical success index)

Parameters:: ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
Returns:: thr – The threat score of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

Notes

This is a ratio of verification, i.e., the proportion of correct forecasts after removing correct “no” forecasts (or ‘true negatives’).

yuleQ()¶

Calculate Yule’s Q (odds ratio skill score) for the 2x2 contingency table

Returns:: yule – Yule’s Q for the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

class verify.categorical.ContingencyNxN(input_array, attrs=None, dtype=None)¶

Class to work with NxN contingency tables for forecast verification

Examples

>>> import verify
>>> tt = verify.ContingencyNxN([[28,72],[23,2680]])
>>> tt.sum()
2803
>>> tt.threat()
0.22764227642276422
>>> tt.heidke()
0.35532486145845693
>>> tt.peirce()
0.52285681714546284

PC()¶: Returns the Proportion Correct (PC) for the NxN contingency table

get2x2(category)¶

Get 2x2 sub-table from multicategory contingency table

Examples

Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273]

>>> import verify
>>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288])
>>> tt2 = tt.get2x2(0)
>>> print(tt2)
[[  50  162]
 [ 101 6027]]
>>> tt2.bias()
1.4039735099337749
>>> tt2.summary()
>>> tt2.attrs()
{'Bias': 1.4039735099337749,
 'FAR': 0.76415094339622647,
 'HeidkeScore': 0.25474971797571822,
 'POD': 0.33112582781456956,
 'POFD': 0.026175472612699952,
 'PeirceScore': 0.30495035520187008,
 'ThreatScore': 0.15974440894568689}

heidke()¶

Calculate the generalized Heidke Skill Score for the NxN contingency table

Returns:: hss – The Heidke skill score of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

Examples

Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273-274]

>>> import verify
>>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288])
>>> tt.heidke()
0.80535269033647217

peirce()¶

Calculate the generalized Peirce Skill Score for the NxN contingency table

Returns:: pss – The Peirce skill score of the contingency table data This is also added to the attrs attribute of the table object
Return type:: float

Examples

Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273-274]

>>> import verify
>>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288])
>>> tt.peirce()
0.81071330546125309

Contingency Tables¶

Function/Class Documentation¶

PyForecastTools

Navigation

Related Topics