Variable Explanation

Represents the prediction as a function of a given variable

What are we trying to explain?

This explanation represents how the prediction changes, in general, for each of the possible values that a variable can take. It shows how predictions differ from the average for a certain value.

Remember that EXPAI also allows you to explain how the model works on a limited and meaningful subgroup of your data.

Why is it useful?

For business

Model validation by experts: knowing how the model behaves for all possible values of a variable helps experts validate whether the model works as expected.
Knowledge generation: this information may help humans discover unknown relevant impacts in the process.

For developers

Model test at boundaries: usually, it is difficult to see how the model behaves at boundaries or uncommon regions of the data distribution. This plot can show inconsistent predictions at any region of a variable.
Ensure model consistency: a noisy average prediction along a variable may be a symptom of inconsistencies in the model. A similar prediction is expected for similar input values in a robust predictor.

How we do it

Plain English

To check how a variable behaves, we need to consider all possible values. To do this efficiently, we first select a big enough subset of the data. Then, we expand this dataset so that every sample takes every possible value of the explored variable.

The result of this process is a dataset in which all possible values of the variable are considered along with different combinations of the remaining variables. Some of these combinations weren't even represented in the initial dataset. This will allow the explanation to check whether the model works properly even for rare input samples.

Finally, we compute the predictions for the expanded dataset and average over each of the values the variable can take. Thus, we obtain the average prediction in very different scenarios.

More formally

We implement Partial Dependence (PD) Plots as presented by Friedman (2000).

Formally, we define that the PD profile for the model $f()$ , sample $X$ when variable $j$ takes value $z$ as:

g_{PD}^j(z) = E_{X^{-j}}[f(X^{j|=z})]

In other words, it is computed as the expectation of the model predictions when variable $j$ in $X$ is fixed at $z$ over the joint distribution of all remaining variables ( $X^{-j})$ .

Since the true joint distribution is usually not known in Machine Learning problems, we estimate it using the empirical distribution of $n$ samples in our dataset. This results in a simplified formulation using our existing dataset:

\hat{g}_{PD}^j(z) = \frac{1}{n}[ \sum_{i=1}^{n}f(x_i^{j|=z}) ]

References

Friedman, Jerome H. 2000. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 29: 1189–1232.

PreviousVariable importance NextPrediction Explanation

Last updated 3 years ago