Variable importance

Measures how important each variable is for your model predictions

What are we trying to explain?

This explanation evaluates the importance of each of the variables considered by the model.

Remember that EXPAI also allows you to explain how the model works on a limited and meaningful subgroup of your data.

Why is it useful?

Information from this explanation can be used for different purposes.

For business

  • Model validation by experts: knowing the most important variables for a prediction may help experts validate whether the model works as expected.

  • Knowledge generation: this information may help humans discover unknown relevant features in the process.

  • Process optimization: getting to know how your model works will help you identify useless variables and focus your resources in the most relevant features. This knowledge will no longer come from intuition but from data.

For developers

  • Feature selection: variables which aren't relevant for the prediction can be removed from the data to increase efficiency.

  • Models comparison: discovering how different models behave on the same data may help understand which one gets closer to the expected behaviour.

How we do it

This explanation is explained in detail in Fisher et al. (2019) work. In this section, we sum it up so that anyone can understand the idea behind our algorithms.

Plain English

The intuition is quite simple. How does performance change if a variable is removed from our data? To measure this effect, we randomly permute the values for the variable so that they no longer match their samples.

What we expect, as presented by Breiman (2001a), is that after replacing the values for this variable, the performance of the model decreases. The higher the decrease, the more important the variable is.

If there is no effect after permuting the values, this means that this variable has no effect at all since predictions are correct even after removing its effect.

More formally

Let:

Procedure:

  1. Once importance for all variables is computed, we sum them up and calculate the relative importance (%) for each variable.

Notice that since permutation is a random process, slightly different results might be obtained for each execution.

To ensure robust results, permutation is performed 10 times and results are averaged.

References

Fisher, Aaron, Cynthia Rudin, and Francesca Dominici. 2019. “All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously.” Journal of Machine Learning Research 20 (177): 1–81. http://jmlr.org/papers/v20/18-760.html.

Breiman, Leo. 2001a. “Random Forests.” Machine Learning 45: 5–32. https://doi.org/10.1023/a:1010933404324

Last updated