Contingency Tables

Function/Class Documentation

Module containing contingency table classes and related metrics

Contingency tables are widely used in analysis of categorical data. The simplest, and most common, is binary event analysis, where two categories are given (event/non-event). This can be generalized to N categories that constitute different types of event.

ContingencyNxN and Contingency2x2 classes are currently provided. Each class can be constructed with the numbers of true positives, false positives, false negatives, and true negatives. The Contingency2x2 class can also be constructed using the fromBoolean class method by providing arrays of True/False.

Author: Steve Morley Institution: Los Alamos National Laboratory

Copyright (c) 2017, Triad National Security, LLC All rights reserved.

class verify.categorical.Contingency2x2(input_array, attrs=None, dtype=None)

Class to work with 2x2 contingency tables for forecast verification

The table is defined following the standard presentation in works such as Wilks [2006], where the columns are observations and the rows are predictions. For a binary forecast, this gives a table

Observed

Y

N

Predicted

Y

True Positive

False Positive

N

False Negative

True Negative

Note that in many machine learning applications this table is called a ``confusion matrix’’ and the columns and rows are often transposed.

Wilks, D.S. (2006), Statistical Methods in the Atmospheric Sciences, 2nd Ed. Academic Press, Elsevier, Burlington, MA.

Examples

Duplicating the Finley[1884] tornado forecasts [Wilks, 2006, pp267-268]

>>> import verify
>>> tt = verify.Contingency2x2([[28,72],[23,2680]])
>>> tt.sum()
2803
>>> tt.threat()
0.22764227642276422
>>> tt.heidke()
0.35532486145845693
>>> tt.peirce()
0.52285681714546284
FAR(ci=None)

False Alarm Ratio, the fraction of incorrect “yes” forecasts

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

far – The false alarm ratio of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

MatthewsCC(ci=None)

Matthews Correlation Coefficient

Examples

>>> event_series = [ True,  True,  True, False]
>>> pred_series  = [ True, False,  True,  True]
>>> ct = verify.Contingency2x2.fromBoolean(pred_series, event_series)
>>> ct.MatthewsCC()
-0.333...
PC(ci=None)

Returns the Proportion Correct (PC) for the 2x2 contingency table

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

pc – Returns and updates ‘PC’ attribute

Return type:

float

POD(ci=None)

Calculate the Probability of Detection, a.k.a. hit rate (ratio of correct forecasts to number of event occurrences)

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

hitrate – The hit rate of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

POFD(ci=None)

Calculate the Probability of False Detection (POFD), a.k.a. False Alarm Rate

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

pofd – The probability of false detection of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

bias(ci=None)

The frequency bias of the forecast calculated as the ratio of yes forecasts to number of yes events

Returns:

bias – The bias of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

Notes

An unbiased forecast will have bias=1, showing that the number of forecasts is the same as the number of events. Bias>1 means that more events were forecast than observed (overforecast).

equitableThreat(ci=None)

Calculate the Equitable Threat Score (a.k.a. Gilbert Skill Score)

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

thr – The threat score of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

Notes

This is a ratio of verification, i.e., the proportion of correct forecasts after removing correct “no” forecasts (or ‘true negatives’).

classmethod fromBoolean(predicted, observed)

Construct a 2x2 contingency table from two boolean input arrays

heidke(ci=None)

Calculate the Heidke Skill Score for the 2x2 contingency table

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

hss – The Heidke skill score of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

Notes

This is a skill score based on the proportion of correct forecasts referred to the proportion expected correct by chance.

majorityClassFraction()

Proportion Correct (a.k.a. “accuracy” in machine learning) for majority classifier

oddsRatio()

Calculate the odds ratio for the 2x2 contingency table

Returns:

odds – The odds ratio for the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

peirce(ci=None)

Calculate the Peirce Skill Score for the 2x2 contingency table

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

pss – The Peirce skill score of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

summary(verbose=False, ci=None)

Summary table

Parameters:
  • verbose (boolean) – Print output to stdout (default=False)

  • ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

threat(ci=None)

Calculate the Threat Score (a.k.a. critical success index)

Parameters:

ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)

Returns:

thr – The threat score of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

Notes

This is a ratio of verification, i.e., the proportion of correct forecasts after removing correct “no” forecasts (or ‘true negatives’).

yuleQ()

Calculate Yule’s Q (odds ratio skill score) for the 2x2 contingency table

Returns:

yule – Yule’s Q for the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

class verify.categorical.ContingencyNxN(input_array, attrs=None, dtype=None)

Class to work with NxN contingency tables for forecast verification

Examples

>>> import verify
>>> tt = verify.ContingencyNxN([[28,72],[23,2680]])
>>> tt.sum()
2803
>>> tt.threat()
0.22764227642276422
>>> tt.heidke()
0.35532486145845693
>>> tt.peirce()
0.52285681714546284
PC()

Returns the Proportion Correct (PC) for the NxN contingency table

get2x2(category)

Get 2x2 sub-table from multicategory contingency table

Examples

Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273]

>>> import verify
>>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288])
>>> tt2 = tt.get2x2(0)
>>> print(tt2)
[[  50  162]
 [ 101 6027]]
>>> tt2.bias()
1.4039735099337749
>>> tt2.summary()
>>> tt2.attrs()
{'Bias': 1.4039735099337749,
 'FAR': 0.76415094339622647,
 'HeidkeScore': 0.25474971797571822,
 'POD': 0.33112582781456956,
 'POFD': 0.026175472612699952,
 'PeirceScore': 0.30495035520187008,
 'ThreatScore': 0.15974440894568689}
heidke()

Calculate the generalized Heidke Skill Score for the NxN contingency table

Returns:

hss – The Heidke skill score of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

Examples

Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273-274]

>>> import verify
>>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288])
>>> tt.heidke()
0.80535269033647217
peirce()

Calculate the generalized Peirce Skill Score for the NxN contingency table

Returns:

pss – The Peirce skill score of the contingency table data This is also added to the attrs attribute of the table object

Return type:

float

Examples

Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273-274]

>>> import verify
>>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288])
>>> tt.peirce()
0.81071330546125309