EXPAI Docs

Search…

Python Client Docs

Powered By GitBook

Variable Explanation

Represents the prediction as a function of a given variable

Sample numerical variable explanation.

What are we trying to explain?

This explanation represents how the prediction changes, in general, for each of the possible values that a variable can take. It shows how predictions differ from the average for a certain value.

Remember that EXPAI also allows you to explain how the model works on a limited and meaningful subgroup of your data.

Why is it useful?

For developers

How we do it

Plain English

To check how a variable behaves, we need to consider all possible values. To do this efficiently, we first select a big enough subset of the data. Then, we expand this dataset so that every sample takes every possible value of the explored variable.

The result of this process is a dataset in which all possible values of the variable are considered along with different combinations of the remaining variables. Some of these combinations weren't even represented in the initial dataset. This will allow the explanation to check whether the model works properly even for rare input samples.

Finally, we compute the predictions for the expanded dataset and average over each of the values the variable can take. Thus, we obtain the average prediction in very different scenarios.

More formally

Formally, we define that the PD profile for the model

$f()$

, sample $X$

when variable $j$

takes value $z$

as:$g_{PD}^j(z) = E_{X^{-j}}[f(X^{j|=z})]$

In other words, it is computed as the expectation of the model predictions when variable

$j$

in $X$

is fixed at $z$

over the joint distribution of all remaining variables ($X^{-j})$

.Since the true joint distribution is usually not known in Machine Learning problems, we estimate it using the empirical distribution of

$n$

samples in our dataset. This results in a simplified formulation using our existing dataset:$\hat{g}_{PD}^j(z) = \frac{1}{n}[
\sum_{i=1}^{n}f(x_i^{j|=z})
]$

References

Friedman, Jerome H. 2000. “Greedy Function Approximation: A Gradient Boosting Machine.” *Annals of Statistics* 29: 1189–1232.

Last modified 5mo ago