Categorical Model Explanation using the Python Client

This is a sample code to use our EXPAI Client for a Categorical Model step by step

Download the files

You can access the complete code here.
Download the dataset here.
Download the model here. Remember we have a guide for Creating a Pipeline model.

We highly recommend using Jupyter Lab to work with EXPAI Client

Imports

Remember to install our Python client by using pip install -U expai

from expai import ExpaiAccount
import numpy as np
import pandas as pd
import os

# Trust notebook for correct plots rendering
!jupyter trust Categorical\ Model\ Explanation.ipynb

email = 'YOUR-EMAIL'
user_pass = 'YOUR-PASS'
expai_client = ExpaiAccount(email=email, user_pass=user_pass)

Create a Project

The first step to interact with your account is creating a Project.

expai_client.create_project("Fairness Project")

Interact with your Project

We will first obtain a Project object that will allow us to interact with its content and generate explanations.

project = expai_client.get_project(project_name = "Fairness Project")

Create a model

In this step, we use the model provided here. All parameters and their descriptions can be found in the Python client docs.

project.create_model(model_path = os.path.abspath("../models/fairness_pipeline.pkl"),
                     model_name="Potential model",
                     model_summary="Annual revenue over 100K$ model", 
                     model_library = 'pickle', 
                     model_objective = 'classification',
                     model_prediction_type = "binary",
                     output_classes = ["No", "Yes"])

Create a sample

In this step, we use the dataset provided here. All parameters and their descriptions can be found in the Python client docs.

# Sample - Categorical
project.create_sample(sample_path=os.path.abspath("../datasets/fairness.csv"),
                      sample_name="Dataset", 
                      sample_encoding='utf-8',
                      sample_separator=";", 
                      sample_target_col = "target", 
                      protected_columns=['race', 'sex'],
                      drop_columns=['race', 'sex'],
                      is_display=False)

Explain the model

Once we have included a model and a sample to our project, we can execute all available explanations. First of all, we must generate a Model Explainer object for our model.

# Set global variables for explanation
model_name = "Potential model"
sample_name = "Dataset"

# Get Model Explainer object
explainer = project.get_model_explainer(model_name)

# By default, set to None. Use the following section to replace
subset_indexes = None

Defining a subgroup for explanations

This step can be skipped if you are not interested on defining a subgroup.

Sometimes, we don't want to study how our model works for the whole dataset but for a specific meaningful subgroup. In this case, we could be interested on studying how the model behaves for people under 40 years old.

# Get the stored sample for filtering
df = project.get_sample(sample_name=sample_name)

# Filter the dataframe
subset = df[df['age']<=40]

# Obtain the indexes for our samples of interest
subset_indexes = list(subset.index)

Generate Explanations and Plots

Once we obtained the Model Explainer, we can use it to generate all possible explanations for our model. When an explanation is generated an Explanation object will be returned.

Model Explanation

This explanation represents the importance of each variable in the predictions. It is computed as the increase in the prediction error when this variable is removed.

# Generate the explanation
exp_model = explainer.explain_model(sample_name=sample_name, subset_indexes=subset_indexes)

# Plot
exp_model.plot_all()

Variable Explanation

In this case, we will explore the effect of a given variable in our predictions. In other words, we will represent the average prediction of the model in terms of the variable.

# Variables that we want to explain
variables = ['marital-status', 'capital-gain']

# Type for the variables (categorical or numerical)
variables_type = {'marital-status': 'categorical',
                 'capital-gain': 'numerical'}

# Generate explanation
exp_var = explainer.explain_variable_effect(sample_name=sample_name, variables=variables, variables_type=variables_type, subset_indexes=subset_indexes)

# Plot all of them
exp_var.plot_all()

Explain a unique entry

In this explanation, we will be able to understand which was the impact of each variable for the prediction of a unique entry in our dataset.

# Select the index to be explained. If we filtered, it must be within the filtered dataframe.
index_to_explain = 2

# Generate explanation
exp_sample = explainer.explain_sample(sample_name=sample_name, index=index_to_explain, subset_indexes=subset_indexes)

# Plot
exp_sample.plot_all()

WHAT IF

This explanation will allow you to see what would happen if we change only one variable in the entry while all remaining variables are kept the same. It will plot all possible values for a variable and the prediction for them.

# Select index to analyze
index_to_explain = 2

# Variables to analyze in this sample
variables = ['age']

# Type for the variables
variables_type = {'age': 'numerical'}

exp_what_if = explainer.what_if(sample_name=sample_name, index=index_to_explain, variables=variables, variables_type=variables_type, subset_indexes=subset_indexes)

# Plot
exp_what_if.plot_all()

WHAT IF BATTLE

This explanation is similar to the previous one but now we can replace all the values we want at the same time and study how predictions will change.

# Select index to analyze
index_to_explain = 2

# Values to be replaced 
replace_dict = {'age': 35,
               'capital-gain': 5000}

exp_what_if_battle = explainer.what_if_battle(sample_name=sample_name, index=index_to_explain, replace_dict=replace_dict, subset_indexes=subset_indexes)

# Plot
exp_what_if_battle.plot_all()