Contingency Tables¶
Function/Class Documentation¶
Module containing contingency table classes and related metrics
Contingency tables are widely used in analysis of categorical data. The simplest, and most common, is binary event analysis, where two categories are given (event/non-event). This can be generalized to N categories that constitute different types of event.
ContingencyNxN and Contingency2x2 classes are currently provided. Each class can be constructed with the numbers of true positives, false positives, false negatives, and true negatives. The Contingency2x2 class can also be constructed using the fromBoolean class method by providing arrays of True/False.
Author: Steve Morley Institution: Los Alamos National Laboratory
Copyright (c) 2017, Triad National Security, LLC All rights reserved.
- class verify.categorical.Contingency2x2(input_array, attrs=None, dtype=None)¶
Class to work with 2x2 contingency tables for forecast verification
The table is defined following the standard presentation in works such as Wilks [2006], where the columns are observations and the rows are predictions. For a binary forecast, this gives a table
Observed
Y
N
Predicted
Y
True Positive
False Positive
N
False Negative
True Negative
Note that in many machine learning applications this table is called a ``confusion matrix’’ and the columns and rows are often transposed.
Wilks, D.S. (2006), Statistical Methods in the Atmospheric Sciences, 2nd Ed. Academic Press, Elsevier, Burlington, MA.
Examples
Duplicating the Finley[1884] tornado forecasts [Wilks, 2006, pp267-268]
>>> import verify >>> tt = verify.Contingency2x2([[28,72],[23,2680]]) >>> tt.sum() 2803 >>> tt.threat() 0.22764227642276422 >>> tt.heidke() 0.35532486145845693 >>> tt.peirce() 0.52285681714546284
- FAR(ci=None)¶
False Alarm Ratio, the fraction of incorrect “yes” forecasts
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
far – The false alarm ratio of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
- MatthewsCC(ci=None)¶
Matthews Correlation Coefficient
Examples
>>> event_series = [ True, True, True, False] >>> pred_series = [ True, False, True, True] >>> ct = verify.Contingency2x2.fromBoolean(pred_series, event_series) >>> ct.MatthewsCC() -0.333...
- PC(ci=None)¶
Returns the Proportion Correct (PC) for the 2x2 contingency table
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
pc – Returns and updates ‘PC’ attribute
- Return type:
float
- POD(ci=None)¶
Calculate the Probability of Detection, a.k.a. hit rate (ratio of correct forecasts to number of event occurrences)
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
hitrate – The hit rate of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
- POFD(ci=None)¶
Calculate the Probability of False Detection (POFD), a.k.a. False Alarm Rate
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
pofd – The probability of false detection of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
- bias(ci=None)¶
The frequency bias of the forecast calculated as the ratio of yes forecasts to number of yes events
- Returns:
bias – The bias of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
Notes
An unbiased forecast will have bias=1, showing that the number of forecasts is the same as the number of events. Bias>1 means that more events were forecast than observed (overforecast).
- equitableThreat(ci=None)¶
Calculate the Equitable Threat Score (a.k.a. Gilbert Skill Score)
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
thr – The threat score of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
Notes
This is a ratio of verification, i.e., the proportion of correct forecasts after removing correct “no” forecasts (or ‘true negatives’).
- classmethod fromBoolean(predicted, observed)¶
Construct a 2x2 contingency table from two boolean input arrays
- heidke(ci=None)¶
Calculate the Heidke Skill Score for the 2x2 contingency table
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
hss – The Heidke skill score of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
Notes
This is a skill score based on the proportion of correct forecasts referred to the proportion expected correct by chance.
- majorityClassFraction()¶
Proportion Correct (a.k.a. “accuracy” in machine learning) for majority classifier
- oddsRatio()¶
Calculate the odds ratio for the 2x2 contingency table
- Returns:
odds – The odds ratio for the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
- peirce(ci=None)¶
Calculate the Peirce Skill Score for the 2x2 contingency table
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
pss – The Peirce skill score of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
- summary(verbose=False, ci=None)¶
Summary table
- Parameters:
verbose (boolean) – Print output to stdout (default=False)
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- threat(ci=None)¶
Calculate the Threat Score (a.k.a. critical success index)
- Parameters:
ci (NoneType, str, boolean) – confidence interval (options None, ‘bootstrap’, True (for ‘Wald’), or ‘AC’) (default=None)
- Returns:
thr – The threat score of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
Notes
This is a ratio of verification, i.e., the proportion of correct forecasts after removing correct “no” forecasts (or ‘true negatives’).
- yuleQ()¶
Calculate Yule’s Q (odds ratio skill score) for the 2x2 contingency table
- Returns:
yule – Yule’s Q for the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
- class verify.categorical.ContingencyNxN(input_array, attrs=None, dtype=None)¶
Class to work with NxN contingency tables for forecast verification
Examples
>>> import verify >>> tt = verify.ContingencyNxN([[28,72],[23,2680]]) >>> tt.sum() 2803 >>> tt.threat() 0.22764227642276422 >>> tt.heidke() 0.35532486145845693 >>> tt.peirce() 0.52285681714546284
- PC()¶
Returns the Proportion Correct (PC) for the NxN contingency table
- get2x2(category)¶
Get 2x2 sub-table from multicategory contingency table
Examples
Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273]
>>> import verify >>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288]) >>> tt2 = tt.get2x2(0) >>> print(tt2) [[ 50 162] [ 101 6027]] >>> tt2.bias() 1.4039735099337749 >>> tt2.summary() >>> tt2.attrs() {'Bias': 1.4039735099337749, 'FAR': 0.76415094339622647, 'HeidkeScore': 0.25474971797571822, 'POD': 0.33112582781456956, 'POFD': 0.026175472612699952, 'PeirceScore': 0.30495035520187008, 'ThreatScore': 0.15974440894568689}
- heidke()¶
Calculate the generalized Heidke Skill Score for the NxN contingency table
- Returns:
hss – The Heidke skill score of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
Examples
Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273-274]
>>> import verify >>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288]) >>> tt.heidke() 0.80535269033647217
- peirce()¶
Calculate the generalized Peirce Skill Score for the NxN contingency table
- Returns:
pss – The Peirce skill score of the contingency table data This is also added to the attrs attribute of the table object
- Return type:
float
Examples
Goldsmith’s non-probabilistic forecasts for freezing rain (cat 0), snow (cat 1), and rain (cat 2). [see Wilks, 1995, p273-274]
>>> import verify >>> tt = verify.ContingencyNxN([[50,91,71],[47,2364,170],[54,205,3288]) >>> tt.peirce() 0.81071330546125309