Plot - Visual Analysis Tools¶
Function/Class Documentation¶
Sub-module to provide simple plotting for exploratory data analysis
- verify.plot.qqPlot(predicted, observed, xyline=True, addTo=None, modelName='', legend=False, plot_kwargs=None)¶
Quantile-quantile plot for predictions and observations
- Parameters:
predicted (array-like) – predicted data (model output)
observed (array-like) – observation vector reference value
xyline (boolean) – Toggles the display of a line of y=x (perfect model). Default True.
addTo (figure, axes, or None) – The object on which plotting will happen. If None (default) then a figure an axes will be created. If a matplotlib figure is supplied a set of axes will be made, and if matplotlib axes are given then the plot will be made on those axes.
modelName – Name of model for legend.
plot_kwargs (dict) – Dictionary containing plot keyword arguments to pass to matplotlib’s scatter function.
- Returns:
out_dict – A dictionary containing the Figure and the Axes
- Return type:
dict
Example
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from verify.plot import qqPlot >>> np.random.seed(46) >>> model1 = np.random.randint(0,40,101).astype(float) >>> model2 = model1*(1 + np.arange(101)/70) >>> obs = model1 + np.random.randn(101) >>> obs[[2,3,8,9,15,16,30,31]] = obs[[31,30,16,15,9,8,3,2]] >>> obs *= 0.25 + (5-(np.arange(101)/30.))/4 >>> observed = obs[:71] #QQ plots don't require even sample lengths >>> plot_settings = {'marker': 'X', 'c': np.arange(71), 'cmap':'cool'} >>> out1 = qqPlot(model1, observed, modelName='1', >>> plot_kwargs=plot_settings) >>> plot_settings = {'marker': 'o'} >>> out2 = qqPlot(model2, observed, modelName='2', legend=True, >>> plot_kwargs=plot_settings, addTo=out1['Axes']) >>> plt.show()
Notes
The q-q plot is formed by: - Vertical axis: Estimated quantiles from observed data - Horizontal axis: Estimated quantiles from predicted data
Both axes are in units of their respective data sets. That is, the actual quantile level is not plotted. For a given point on the q-q plot, we know that the quantile level is the same for both points, but not what that quantile level actually is.
For equal length samples, the q-q plot displays sorted(sample1) against sorted(sample2). If the samples are not of equal length, the quantiles for the smaller sample are calculated and data from the larger sample are interpolated to those quantiles. See, e.g., NIST’s Engineering Statistics Handbook.
- verify.plot.reliabilityDiagram(predicted, observed, norm=False, addTo=None, modelName='', bins=None, xyline=True, legend=False, plotkwargs={}, histkwargs={})¶
Reliability diagram for a probabilistic forecast model
- Parameters:
predicted (array-like) – predicted data, continuous data (e.g. probability)
observed (array-like) – observation vector of binary events (boolean or 0,1)
norm (boolean) – Normalize input scores into [0,1] interval. Default False.
addTo (figure, axes, or None) – The object on which plotting will happen. If None (default) then a figure an axes will be created. If a matplotlib figure is supplied a set of axes will be made, and if matplotlib axes are given then the plot will be made on those axes.
modelName (string) – Name for model to be supplied to legend. Default empty string.
bins (nonetype, int, sequence of scalars, or str) – Provides bin information as required by numpy.histogram. An integer sets the number of equally-sized bins used for the calibration function and refinement distribution. A sequence of scalars will be used to define the bin edges, and a string must be a valid binning method per numpy.histogram. The default is to use numpy’s ‘auto’ method, limited to a maximum of 100 bins.
xyline (boolean) – Toggles the display of a line of y=x (perfect model). Default True.
legend (boolean) – Toggles the display of a legend with the labels defined in ‘modelName’
plotkwargs (dict) – Dictionary of keyword arguments for the calibration function
histkwargs (dict) – Dictionary of keyword arguments for the refinement distribution
- Returns:
out_dict – A dictionary containing the Figure, Axes, …
- Return type:
dict
Example
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from sklearn import svm, datasets, linear_model
>>> from verify.plot import reliabilityDiagram, setTarget
>>> np.random.seed(0) >>> classifiers = {'Logistic regression': linear_model.LogisticRegression(), >>> 'SVC': svm.SVC(kernel='linear', >>> decision_function_shape='ovr', >>> probability=True)}
>>> X, y = datasets.make_classification(n_samples=10000, n_features=21, >>> n_informative=8, n_redundant=2, >>> n_classes=2, n_repeated=1, >>> n_clusters_per_class=3, >>> flip_y=0.2) >>> train_samples = 100 # Samples used for training the models >>> X_train = X[:train_samples] >>> X_test = X[train_samples:] >>> y_train = y[:train_samples] >>> y_test = y[train_samples:]
>>> fig, ax = setTarget(None) >>> output = [] >>> for method, model in classifiers.items(): >>> model.fit(X_train, y_train) >>> pred = model.predict_proba(X_test)[:,1] >>> output.append(reliabilityDiagram(pred, y_test, norm=True, >>> modelName=method, addTo=fig)) >>> plt.show()
Notes
Reliability diagrams show whether the predictions from a probabilistic binary classifier are well calibrated.
- verify.plot.rocCurve(predicted, observed, low=None, high=None, nthresh=100, addTo=None, xyline=True, modelName='', legend=False)¶
Receiver Operating Characteristic curve for assessing model skill
- Parameters:
predicted (array-like) – predicted data, continuous data (e.g. probability)
observed (array-like) – observation vector of binary events (boolean or 0,1)
low (float or None) – Set the lowest threshold to use.
high (float or None) – Set the highest threshold to use
xyline (boolean) – Toggles the display of a line of y=x (perfect model). Default True.
addTo (figure, axes, or None) – The object on which plotting will happen. If None (default) then a figure an axes will be created. If a matplotlib figure is supplied a set of axes will be made, and if matplotlib axes are given then the plot will be made on those axes.
- Returns:
out_dict – A dictionary containing the Figure, Axes, POD, POFD, and Thresholds
- Return type:
dict
Example
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from sklearn import svm, datasets, linear_model
>>> from verify.plot import rocCurve, setTarget
>>> np.random.seed(0) >>> classifiers = {'Logistic regression': linear_model.LogisticRegression(), >>> 'SVC': svm.SVC(kernel='linear', >>> decision_function_shape='ovr', >>> probability=True)}
>>> X, y = datasets.make_classification(n_samples=10000, n_features=21, >>> n_informative=8, n_redundant=2, >>> n_classes=2, n_repeated=1, >>> n_clusters_per_class=3, >>> flip_y=0.2) >>> Nsample_train = 100 >>> X_train = X[:Nsample_train] >>> X_test = X[Nsample_train:] >>> y_train = y[:Nsample_train] >>> y_test = y[Nsample_train:]
>>> fig, ax = setTarget(None) >>> output = [] >>> for method, model in classifiers.items(): >>> model.fit(X_train, y_train) >>> pred = model.predict_proba(X_test)[:,1] >>> output.append(rocCurve(pred, y_test, modelName=method, addTo=ax)) >>> output[-1]['Axes'].legend() >>> plt.show()
- verify.plot.setTarget(target, figsize=None, loc=111, polar=False)¶
Given a target on which to plot a figure, determine if that target is None or a matplotlib figure or axes object. Based on the type of target, a figure and/or axes will be either located or generated. Both the figure and axes objects are returned to the caller for further manipulation.
- Parameters:
target (object) – The object on which plotting will happen.
figsize (tuple) – A two-item tuple/list giving the dimensions of the figure, in inches. Defaults to Matplotlib defaults.
loc (integer) – The subplot triple that specifies the location of the axes object. Defaults to 111.
polar (bool) – Set the axes object to polar coodinates. Defaults to False.
- Returns:
fig (object) – A matplotlib figure object on which to plot.
ax (object) – A matplotlib subplot object on which to plot.
Examples
>>> import matplotlib.pyplot as plt >>> from verify.plot import setTarget >>> fig = plt.figure() >>> fig, ax = setTarget(target=fig, loc=211)
Notes
Implementation from SpacePy’s plot module. SpacePy is available at https://github.com/spacepy/spacepy under a Python Software Foudnation license.
- verify.plot.taylorDiagram(predicted, observed, norm=False, addTo=None, modelName='', isoSTD=True)¶
Taylor diagrams for comparing model performance
- Parameters:
predicted (array-like) – predicted data
observed (array-like) – observation vector
norm (boolean or float) – Selects whether the values should be normalized (default is False). If a value is given this will be used to normalize the inputs.
xyline (boolean) – Toggles the display of a line of y=x (perfect model). Default True.
addTo (axes, or None) – The object on which plotting will happen. If None (default) then a figure and axes will be created. If matplotlib axes are given then the plot will be made on those axes, assuming that the point is being added to a previously generated Taylor diagram.
modelName (string) – Name of model to label the point on the Taylor diagram with.
isoSTD (boolean) – Toggle for isocontours of standard deviation. Default is True, but turning them off can reduce visual clutter or prevent intereference with custom plot styles that alter background grid behavior.
- Returns:
out_dict – A dictionary containing the Figure, the Axes, and ‘Norm’ (the value used to normalize the inputs/outputs).
- Return type:
dict
Example
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from verify.plot import taylorDiagram >>> model1 = np.random.randint(0,40,101).astype(float) >>> model2 = model1*(1 + np.arange(101)/70) >>> obs = model1 + np.random.randn(101) >>> obs[[2,3,8,9,15,16,30,31]] = obs[[31,30,16,15,9,8,3,2]] >>> obs *= 0.25 + (5-(np.arange(101)/30.))/4 >>> result1 = taylorDiagram(model1, obs, norm=True, modelName='A') >>> result2 = taylorDiagram(model2, obs, norm=result1['Norm'], >>> modelName='B', addTo=result1['Axes']) >>> plt.show()
Notes
Based on ‘’Summarizing multiple aspects of model performance in a single diagram’ by K.E. Taylor (Radio Science, 2001; doi: 10.1029/2000JD900719) and ‘Taylor Diagram Primer’ by Taylor (document at https://pcmdi.llnl.gov/staff/taylor/CV/Taylor_diagram_primer.pdf) With some implementation aspects inspired by the public domain code of github user ycopin at https://gist.github.com/ycopin/3342888