This explanation represents how the prediction changes, in general, for each of the possible values that a variable can take. It shows how predictions differ from the average for a certain value.
Model validation by experts: knowing how the model behaves for all possible values of a variable helps experts validate whether the model works as expected.
Knowledge generation: this information may help humans discover unknown relevant impacts in the process.
Model test at boundaries: usually, it is difficult to see how the model behaves at boundaries or uncommon regions of the data distribution. This plot can show inconsistent predictions at any region of a variable.
Ensure model consistency: a noisy average prediction along a variable may be a symptom of inconsistencies in the model. A similar prediction is expected for similar input values in a robust predictor.
To check how a variable behaves, we need to consider all possible values. To do this efficiently, we first select a big enough subset of the data. Then, we expand this dataset so that every sample takes every possible value of the explored variable.
The result of this process is a dataset in which all possible values of the variable are considered along with different combinations of the remaining variables. Some of these combinations weren't even represented in the initial dataset. This will allow the explanation to check whether the model works properly even for rare input samples.
Finally, we compute the predictions for the expanded dataset and average over each of the values the variable can take. Thus, we obtain the average prediction in very different scenarios.
We implement Partial Dependence (PD) Plots as presented by Friedman (2000).
Formally, we define that the PD profile for the model , sample when variable takes value as:
In other words, it is computed as the expectation of the model predictions when variable in is fixed at over the joint distribution of all remaining variables (.
Since the true joint distribution is usually not known in Machine Learning problems, we estimate it using the empirical distribution of samples in our dataset. This results in a simplified formulation using our existing dataset:
Friedman, Jerome H. 2000. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 29: 1189–1232.