Measures how important each variable is for your model predictions
Sample model explanations with contribution from each variable
This explanation evaluates the importance of each of the variables considered by the model.
Information from this explanation can be used for different purposes.
- Model validation by experts: knowing the most important variables for a prediction may help experts validate whether the model works as expected.
- Knowledge generation: this information may help humans discover unknown relevant features in the process.
- Process optimization: getting to know how your model works will help you identify useless variables and focus your resources in the most relevant features. This knowledge will no longer come from intuition but from data.
- Feature selection: variables which aren't relevant for the prediction can be removed from the data to increase efficiency.
- Models comparison: discovering how different models behave on the same data may help understand which one gets closer to the expected behaviour.
The intuition is quite simple. How does performance change if a variable is removed from our data? To measure this effect, we randomly permute the values for the variable so that they no longer match their samples.
What we expect, as presented by Breiman (2001a), is that after replacing the values for this variable, the performance of the model decreases. The higher the decrease, the more important the variable is.
If there is no effect after permuting the values, this means that this variable has no effect at all since predictions are correct even after removing its effect.
- be the model we are trying to explain.
- be the matrix containing input data for the model.
- be the variable whose impact we want to compute at the z-th column of X.
- be the loss function used to measure our model performance given, the ground-truth targetand the predictiondone by the model.
- 1.Execute the model on datasetto obtain.
- 2.Compute the lossfor this prediction.
- 3.Generateby permuting the z-th column ofcontaining variable
- 4.Execute the model on datasetto obtain.
- 5.Compute the lossfor the prediction after permutation.
- 6.Measure the importance of variableby computing:
- 7.Once importance for all variables is computed, we sum them up and calculate the relative importance (%) for each variable.
Fisher, Aaron, Cynthia Rudin, and Francesca Dominici. 2019. “All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously.” Journal of Machine Learning Research 20 (177): 1–81. http://jmlr.org/papers/v20/18-760.html.