Evrishapley: Difference between revisions
(Created page with "===Purpose=== Calculate a variable's contribution using Shapley Values. ===Synopsis=== :results = evrishapley(calx,expx,model,options) :results = evrishapley(x,model,option...") |
No edit summary |
||
Line 15: | Line 15: | ||
See Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. | See Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. | ||
====Notes==== | |||
Calculating Shapley Values can be a very expensive task. There are ways to reduce its cost: | |||
* 1) Group variables based on interval width or dependency. | |||
* 2) Provide less explanation samples. | |||
[[Model_Exporter_User_Guide|Model_Exporter]] is utilized to speed up the process of generating predictions on the perturbed samples if the user has it installed. | |||
====Inputs==== | ====Inputs==== |
Latest revision as of 10:27, 11 December 2023
Purpose
Calculate a variable's contribution using Shapley Values.
Synopsis
- results = evrishapley(calx,expx,model,options)
- results = evrishapley(x,model,options)
Description
Shapley Values are a variable importance and explanation tool in the AI community. Shapley Values provide individual variable contributions to a model's predictors. This a model-agnostic algorithm, any model's predictions can be explained including Ann, Svm, and Xgb. Important variables deemed by Shapley Values are good candidates for variable selection.
See the shapleygui page to calculate Shapley Values in an interface, examples, and interpretation.
See Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
Notes
Calculating Shapley Values can be a very expensive task. There are ways to reduce its cost:
- 1) Group variables based on interval width or dependency.
- 2) Provide less explanation samples.
Model_Exporter is utilized to speed up the process of generating predictions on the perturbed samples if the user has it installed.
Inputs
Standard input is:
- calx = double or dataset used to calibrate the model,
- expx = double or dataset used to generate Shapley Values (samples whose predictions will be explained),
- model = EVRIModel,
- options = options structure for evrishapley.
Outputs
The output is a results structure with the following fields:
- shap: The Shapley Values for all samples in expx and all predictors from the model.
- baseprediction: The average predictions from the model on the calibration data for each predictor.
- model: The calibrated model.
- explainpred: The predictions on expdata.
- calx: The calibration data.
- x: The explanation data.
- shapoptions: Options structure for evrishapley.
Options
- options = options structure containing the fields:
- int_width: [ {10} ] The window size of variables to group for the Shapley Value calculation. Grouping highly correlated variables can provide a better explanation as well as significantly speed up the algorithm.
- nbatches: [{'auto'} double] Number of batches to piecemeal computation. When set to 'auto', n_batches is computed to preserve memory.
- n_iter: [{'auto'} double] Number of perturbed samples to create per iteration. When set to 'auto', this will be the number of (variables * 2) + 1. Increasing this gives a more faithful representation of the contributions but can lock up memory.
- random_state: [{1}] Random seed number. Set this to a number for reproducibility.
See Also
selectvars, genalg, ipls, plotloads, pls, plsda, sratio, rpls, Sample and Variable Selection, Variable Selection