Confusionmatrix: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Donal
(Created page with "===Purpose=== Create confusion matrix from a classification model or from a list of actual classes and a list of predicted classes. ===Synopsis=== :[misclassed, classids, tex...")
 
imported>Donal
 
(29 intermediate revisions by the same user not shown)
Line 1: Line 1:
===Purpose===
===Purpose===


Create confusion matrix from a classification model or from a list of actual classes and a list of predicted classes.
Create a confusion matrix showing classification rates from a classification model or from a list of actual classes and a list of predicted classes.
 


===Synopsis===
===Synopsis===
Line 8: Line 7:
:[misclassed, classids, texttable] = confusionmatrix(model);                % create confusion matrix from classifier model
:[misclassed, classids, texttable] = confusionmatrix(model);                % create confusion matrix from classifier model
:[misclassed, classids, texttable] = confusionmatrix(model, usecv);        % create confusion matrix from model using CV results
:[misclassed, classids, texttable] = confusionmatrix(model, usecv);        % create confusion matrix from model using CV results
:[misclassed, classids, texttable] = confusionmatrix(model, usecv, predrule); % create confusion matrix from model specifying CV and predrule
:[misclassed, classids, texttable] = confusionmatrix(trueClass, predClass); % create confusion matrix from vectors of true and pred classes
:[misclassed, classids, texttable] = confusionmatrix(trueClass, predClass); % create confusion matrix from vectors of true and pred classes


===Description===
===Description===


Confusionmatrix creates a table of results showing True Positive, False Postive, True Negative and False Negative rates (TP FP TN FN) as a matrix for each class modeled in an input model.
Confusionmatrix creates a table of results showing True Positive, False Postive, True Negative and False Negative rates (TPR FPR TNR FNR) as a matrix for each class modeled in an input model. The [http://wiki.eigenvector.com/index.php?title=Sample_Classification_Predictions#Class_Pred_Most_Probable 'most probable'] predicted class is used when a model is input, or [http://wiki.eigenvector.com/index.php?title=Sample_Classification_Predictions#Class_Pred_Strict  'strict'] predicted class if 'predrule'= 'strict' is used.
Input models must be of type PLSDA, SVMDA, KNN, or SIMCA.
Input models must be of type PLSDA, SVMDA, KNN, or SIMCA.


Optional second parameter "usecv" specifies use of the cross-validation
Optional second parameter "usecv" specifies use of the cross-validation
based "cvmisclassification" instead of the default self-prediction
based "model.detail.cvclassification" instead of the default self-prediction
classifications.
classifications "model.classification".


Input can consist of vectors of true class and predicted class instead of a model.
Input can consist of vectors of true class and predicted class instead of a model.


Classification rates are defined as:
Classification rates are defined as:
  TP: proportion of positive cases that were correctly identified
  TPR: proportion of positive cases that were correctly identified (Sensitivity), = TP/(TP+FN)
  FP: proportion of negatives cases that were incorrectly classified as positive
  FPR: proportion of negatives cases that were incorrectly classified as positive, = FP/(FP+TN)
  TN: proportion of negatives cases that were classified correctly
  TNR: proportion of negatives cases that were classified correctly (Specificity), = TN/(TN+FP)
  FN: proportion of positive cases that were incorrectly classified as negative
  FNR: proportion of positive cases that were incorrectly classified as negative, = FN/(FN+TP)


Four additional fields are also shown for each class:
N:  Number of samples belonging to each class
Err: Misclassification error = proportion of samples which were incorrectly classified,
        = 1-accuracy, = (FP+FN)/(TP+TN+FP+FN)
P:  Precision, = TP/(TP+FP)
F1:  F1 Score, = 2*TP/(2*TP+FP+FN)
where TP/TN/FP/FN refer to the counts rather than the rates for these quantities. Note that the Misclassification Error for a class A represents:
1) samples of class A which were incorrectly classified as not class A, and
2) samples not of class A which were incorrectly classified as being class A.
'''Note''': Prior to version 8.5 the Confusion Matrix values were based on the "most probable" prediction classification, and this is what was reported by the "Show Confusion Matrix and Table" icon in the Analysis window. Beginning with version 8.5 the Confusion Matrix function could report "most probable" or "strict" prediction results and the "Show Confusion Matrix and Table" icon in Analysis provides both, first the "most probable" classification results labeled as:
:PLSDA Classification Using Rule: Pred Most Probable
then followed by the "strict" classification results labeled as:
:PLSDA Classification Using Rule: Pred Strict (using strictthreshold = 0.50)
in the case of PLSDA, for example.


====Inputs====
====Inputs====


* '''model''' = previously generated classifier model or pred structure,
* '''model''' = previously generated classifier model or pred structure,
* '''usecv''' = 0 or 1. 0 indicates confusion matrix should be based on self-prediction results, 1 indicates it is based on using CV on calibration data,
* '''usecv''' = 0 or 1. 0 indicates confusion matrix should be based on self-prediction results, 1 indicates it is based on using cross-validation results (assuming they are available in the model),
* '''trueClass''' = vector of numeric values indicating the true sample classes,
* '''trueClass''' = vector of numeric values indicating the true sample classes,
* '''predClass''' = vector of numeric values indicating the predicted sample classes.
* '''predClass''' = vector of numeric values indicating the predicted sample classes.
* '''predrule''' = the classification rule used. 'mostprobable' makes predictions based on choosing the class that has the highest probability. 'strict' makes predictions based on the rule that each sample belongs to a class if the probability is greater than a specified threshold probability value for one and only one class.


====Outputs====
====Outputs====
Line 39: Line 56:
* '''misclassed''' = confusion matrix, nclasses x 4 array, one row per class, columns are True/False Postive/Negative rates (TP FP TN FN),
* '''misclassed''' = confusion matrix, nclasses x 4 array, one row per class, columns are True/False Postive/Negative rates (TP FP TN FN),
* '''classids''' = class names (identifiers),
* '''classids''' = class names (identifiers),
* '''texttable''' = cell array containing a text representation of the confusion matrix, texttable{i} is the i-th line of the texttable. Note that this text representation of the confusion matrix is displayed if the function is called with no output assignment.
* '''texttable''' = cell array containing a text representation of the confusion matrix. The i-th element of the cell array, texttable{i}, is the i-th line of the texttable. If there are only two classes then the Matthew's Correlation Coefficient value is included as the last line. Note that this text representation of the confusion matrix is displayed if the function is called with no output assignment.
 
 
===Algorithm===


===Example===
Calling confusionmatrix with no output variables assigned:
'confusionmatrix(model)'
displays the output:


>> confusionmatrix(model)
Confusion Matrix:                                                                                   
    Class:      TPR        FPR        TNR        FNR        N      Err        P          F1   
        K      0.58824    0.09091    0.90909    0.41176    17    0.20000    0.76923    0.66667
        BL      0.83333    0.11364    0.88636    0.16667      6    0.12000    0.50000    0.62500
        SH      0.72727    0.10256    0.89744    0.27273    11    0.14000    0.66667    0.69565
        AN      0.87500    0.02941    0.97059    0.12500    16    0.06000    0.93333    0.90323


===See Also===
===See Also===


[[confusiontable]], [[plsda]], [[svmda]], [[knn]], [[simca]]
[[confusiontable]], [[plsda]], [[svmda]], [[knn]], [[simca]], [[Sample_Classification_Predictions]]

Latest revision as of 22:52, 13 April 2018

Purpose

Create a confusion matrix showing classification rates from a classification model or from a list of actual classes and a list of predicted classes.

Synopsis

[misclassed, classids, texttable] = confusionmatrix(model);  % create confusion matrix from classifier model
[misclassed, classids, texttable] = confusionmatrix(model, usecv);  % create confusion matrix from model using CV results
[misclassed, classids, texttable] = confusionmatrix(model, usecv, predrule); % create confusion matrix from model specifying CV and predrule
[misclassed, classids, texttable] = confusionmatrix(trueClass, predClass); % create confusion matrix from vectors of true and pred classes

Description

Confusionmatrix creates a table of results showing True Positive, False Postive, True Negative and False Negative rates (TPR FPR TNR FNR) as a matrix for each class modeled in an input model. The 'most probable' predicted class is used when a model is input, or 'strict' predicted class if 'predrule'= 'strict' is used. Input models must be of type PLSDA, SVMDA, KNN, or SIMCA.

Optional second parameter "usecv" specifies use of the cross-validation based "model.detail.cvclassification" instead of the default self-prediction classifications "model.classification".

Input can consist of vectors of true class and predicted class instead of a model.

Classification rates are defined as:

TPR: proportion of positive cases that were correctly identified (Sensitivity), = TP/(TP+FN)
FPR: proportion of negatives cases that were incorrectly classified as positive, = FP/(FP+TN)
TNR: proportion of negatives cases that were classified correctly (Specificity), = TN/(TN+FP)
FNR: proportion of positive cases that were incorrectly classified as negative, = FN/(FN+TP)

Four additional fields are also shown for each class:

N:   Number of samples belonging to each class
Err: Misclassification error = proportion of samples which were incorrectly classified, 
        = 1-accuracy, = (FP+FN)/(TP+TN+FP+FN)
P:   Precision, = TP/(TP+FP)
F1:  F1 Score, = 2*TP/(2*TP+FP+FN)

where TP/TN/FP/FN refer to the counts rather than the rates for these quantities. Note that the Misclassification Error for a class A represents: 1) samples of class A which were incorrectly classified as not class A, and 2) samples not of class A which were incorrectly classified as being class A.

Note: Prior to version 8.5 the Confusion Matrix values were based on the "most probable" prediction classification, and this is what was reported by the "Show Confusion Matrix and Table" icon in the Analysis window. Beginning with version 8.5 the Confusion Matrix function could report "most probable" or "strict" prediction results and the "Show Confusion Matrix and Table" icon in Analysis provides both, first the "most probable" classification results labeled as:

PLSDA Classification Using Rule: Pred Most Probable

then followed by the "strict" classification results labeled as:

PLSDA Classification Using Rule: Pred Strict (using strictthreshold = 0.50)

in the case of PLSDA, for example.

Inputs

  • model = previously generated classifier model or pred structure,
  • usecv = 0 or 1. 0 indicates confusion matrix should be based on self-prediction results, 1 indicates it is based on using cross-validation results (assuming they are available in the model),
  • trueClass = vector of numeric values indicating the true sample classes,
  • predClass = vector of numeric values indicating the predicted sample classes.
  • predrule = the classification rule used. 'mostprobable' makes predictions based on choosing the class that has the highest probability. 'strict' makes predictions based on the rule that each sample belongs to a class if the probability is greater than a specified threshold probability value for one and only one class.

Outputs

  • misclassed = confusion matrix, nclasses x 4 array, one row per class, columns are True/False Postive/Negative rates (TP FP TN FN),
  • classids = class names (identifiers),
  • texttable = cell array containing a text representation of the confusion matrix. The i-th element of the cell array, texttable{i}, is the i-th line of the texttable. If there are only two classes then the Matthew's Correlation Coefficient value is included as the last line. Note that this text representation of the confusion matrix is displayed if the function is called with no output assignment.

Example

Calling confusionmatrix with no output variables assigned: 'confusionmatrix(model)' displays the output:

>> confusionmatrix(model) Confusion Matrix:

   Class:      TPR         FPR         TNR         FNR         N      Err         P           F1     
       K       0.58824     0.09091     0.90909     0.41176     17     0.20000     0.76923     0.66667
       BL      0.83333     0.11364     0.88636     0.16667      6     0.12000     0.50000     0.62500
       SH      0.72727     0.10256     0.89744     0.27273     11     0.14000     0.66667     0.69565
       AN      0.87500     0.02941     0.97059     0.12500     16     0.06000     0.93333     0.90323

See Also

confusiontable, plsda, svmda, knn, simca, Sample_Classification_Predictions