by - AryaXAI
xai_evals
is a Python package designed to generate and benchmark various explainability methods for machine learning and deep learning models. It offers tools for creating and evaluating explanations of popular machine learning models, supporting widely-used explanation methods. The package aims to streamline the interpretability of machine learning models, allowing practitioners to gain insights into how their models make predictions. Additionally, it includes several metrics for assessing the quality of these explanations .
Technical Report : xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods
To install xai_evals
, you can use pip
. First, clone the repository or download the files to your local environment. Then, install the necessary dependencies:
pip install xai_evals
Name | Dataset | Link |
---|---|---|
Tabualar ML Models Illustration and Evaluation Metrics | IRIS Dataset | Colab Link |
Tabular Deep Learning Model Illustration and Evaluation Metrics | Lending Club | Colab Link |
Image Deep Learning Model Illustration and Evaluation Metrics | CIFAR10 | Colab Link |
Supported Machine Learning Models for SHAPExplainer
and LIMEExplainer
class is as follows :
Library | Supported Models |
---|---|
scikit-learn | LogisticRegression, RandomForestClassifier, SVC, SGDClassifier, GradientBoostingClassifier, AdaBoostClassifier, DecisionTreeClassifier, KNeighborsClassifier, GaussianNB, LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis, KMeans, NearestCentroid, BaggingClassifier, VotingClassifier, MLPClassifier, LogisticRegressionCV, RidgeClassifier, ElasticNet |
xgboost | XGBClassifier |
catboost | CatBoostClassifier |
lightgbm | LGBMClassifier |
sklearn.ensemble | HistGradientBoostingClassifier, ExtraTreesClassifier |
The SHAPExplainer
class allows you to compute and visualize SHAP values for your trained model. It supports various types of models, including tree-based models (e.g., RandomForest
, XGBoost
) and deep learning models (e.g., PyTorch models).
Example:
from xai_evals.explainer import SHAPExplainer
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
from sklearn.datasets import load_iris
# Load dataset and train a model
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
model = RandomForestClassifier()
model.fit(X, y)
# Initialize SHAP explainer
shap_explainer = SHAPExplainer(model=model, features=X.columns, task="multiclass-classification", X_train=X)
# Explain a specific instance (e.g., the first instance in the test set)
shap_attributions = shap_explainer.explain(X, instance_idx=0)
# Print the feature attributions
print(shap_attributions)
Feature | Value | Attribution |
---|---|---|
petal_length_(cm) | 1.4 | 0.360667 |
petal_width_(cm) | 0.2 | 0.294867 |
sepal_length_(cm) | 5.1 | 0.023467 |
sepal_width_(cm) | 3.5 | 0.010500 |
The LIMEExplainer
class allows you to generate LIME explanations, which work by perturbing the input data and fitting a locally interpretable model.
Example:
from xai_evals.explainer import LIMEExplainer
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn.datasets import load_iris
# Load dataset and train a model
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
model = LogisticRegression(max_iter=200)
model.fit(X, y)
# Initialize LIME explainer
lime_explainer = LIMEExplainer(model=model, features=X.columns, task="multiclass-classification", X_train=X)
# Explain a specific instance (e.g., the first instance in the test set)
lime_attributions = lime_explainer.explain(X, instance_idx=0)
# Print the feature attributions
print(lime_attributions)
Feature | Value | Attribution |
---|---|---|
petal_length_(cm) | 1.4 | 0.497993 |
petal_width_(cm) | 0.2 | 0.213963 |
sepal_length_(cm) | 5.1 | 0.127047 |
sepal_width_(cm) | 3.5 | 0.053926 |
For LIMEExplainer and SHAPExplainer Class we have several attributes :
Attribute | Description | Values |
---|---|---|
model | Trained model which you want to explain | [sklearn model] |
features | Features present in the Training/Testing Set | [list of features] |
X_train | Training Set Data | {pd.dataframe,numpy.array} |
task | Task performed by the model | {binary,multiclass} |
model_classes (Only for LIME) | List of Classes to be predicted by model | [list of classes] |
subset_samples (Only for SHAP) | If we want to use k-means based sampling to use a subset for SHAP Explainer | True/False |
subset_number (Only for SHAP) | Number of samples to sample if subset_samples is True | int |
The TorchTabularExplainer
class allows you to generate explanations for Pytorch Deep Learning Model . Explaination Method available include 'integrated_gradients', 'deep_lift', 'gradient_shap','saliency', 'input_x_gradient', 'guided_backprop','shap_kernel', 'shap_deep' and 'lime'.
Attribute | Description | Values |
---|---|---|
model | Trained Torch model which you want to explain | [Torch Model] |
method | Explanation method. Options:'integrated_gradients', 'deep_lift', 'gradient_shap','saliency', 'input_x_gradient', 'guided_backprop','shap_kernel', 'shap_deep', 'lime' | string |
X_train | Training Set Data | {pd.dataframe,numpy.array} |
feature_names | Features present in the Training/Testing Set | [list of features] |
task | Task performed by the model | {binary-classification,multiclass-classification} |
The TFTabularExplainer
class allows you to generate explanations for Tensorflow/Keras Deep Learning Model . Explaination Method available include 'shap_kernel', 'shap_deep' and 'lime'.
Attribute | Description | Values |
---|---|---|
model | Trained Tf/Keras model which you want to explain | [Tf/Keras Model] |
method | Explanation method. Options:'shap_kernel', 'shap_deep', 'lime' | string |
X_train | Training Set Data | {pd.dataframe,numpy.array} |
feature_names | Features present in the Training/Testing Set | [list of features] |
task | Task performed by the model | {binary-classification,multiclass-classification} |
The DlBacktraceTabularExplainer
, based on DLBacktrace, a method for analyzing neural networks by tracing the relevance of each component from output to input, to understand how each part contributes to the final prediction. It offers two modes: Default and Contrast, and is compatible with TensorFlow and PyTorch. (https://github.com/AryaXAI/DLBacktrace)
Attribute | Description | Values |
---|---|---|
model | Trained Tf/Keras/Torch model which you want to explain | [Torch/Tf/Keras Model] |
method | Explanation method. Options:"default" or "contrastive" | string |
X_train | Training Set Data | {pd.dataframe,numpy.array} |
scaler | Total / Starting Relevance at the Last Layer | Integer (Default: 1) |
feature_names | Features present in the Training/Testing Set | [list of features] |
thresholding | Thresholding for Model Prediction | float (Default : 0.5) |
task | Task performed by the model | {binary-classification,multiclass-classification} |
The TorchImageExplainer
class allows you to generate explanations for PyTorch-based CNN models. This class wraps around several attribution methods available in Captum, including:
- Integrated Gradients
- Saliency
- DeepLift
- GradientShap
- GuidedBackprop
- Occlusion
- LayerGradCam
Example:
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from xai_evals.explainer import TorchImageExplainer
from torchvision import models
import numpy as np
import matplotlib.pyplot as plt
# Load CIFAR-10 dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=4, shuffle=True)
# Load pre-trained ResNet model
model = models.resnet18(pretrained=True)
model.eval()
# Initialize the TorchImageExplainer
explainer = TorchImageExplainer(model)
# Example 1: Explain using DataLoader (batch of images)
idx = 0 # Index for the image in the DataLoader
method = "integrated_gradients"
task = "classification"
attribution_map = explainer.explain(trainloader, idx, method, task)
# Visualize attribution map (simplified)
plt.imshow(attribution_map)
plt.title(f"Attribution Map - {method} for Dataloader Torch")
plt.show()
# Example 2: Explain using a single image (torch.Tensor)
single_image_tensor = torch.randn(3, 32, 32) # Random image as a tensor, [C, H, W]
attribution_map = explainer.explain(single_image_tensor, idx=None, method=method, task=task)
# Visualize attribution map for the single image
plt.imshow(attribution_map)
plt.title(f"Attribution Map - {method} for Single Image (Tensor)")
plt.show()
# Example 3: Explain using a single image (np.ndarray)
single_image_numpy = np.random.randn(3, 32, 32) # Random image as a NumPy array, [C, H, W]
attribution_map = explainer.explain(single_image_numpy, idx=None, method=method, task=task)
# Visualize attribution map for the single image (NumPy)
plt.imshow(attribution_map)
plt.title(f"Attribution Map - {method} for Single Image (NumPy)")
plt.show()
Attribute | Description | Values |
---|---|---|
testdata |
The input data, which can be a DataLoader, NumPy array, or Tensor. | [torch.utils.data.DataLoader, np.ndarray, torch.Tensor] |
idx |
The index of the test sample to explain. | int or None (for explaining a single sample or all samples) |
method |
The explanation method to use. | {grad_cam, integrated_gradients, saliency, deep_lift, gradient_shap, guided_backprop, occlusion, layer_gradcam, feature_ablation} |
task |
The type of model task (e.g., classification). | {classification} |
The TFImageExplainer
class provides a similar functionality for TensorFlow/Keras-based models, allowing you to generate explanations for images using methods like GradCAM and Occlusion Sensitivity.
Example:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import cifar10
import numpy as np
import matplotlib.pyplot as plt
from xai_evals.explainer import TFImageExplainer
# Step 1: Define a Custom CNN Model
def create_custom_model():
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Normalize pixel values to be between 0 and 1
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0
# Create a TensorFlow Dataset from the test data
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
# Initialize and train the custom model
model = create_custom_model()
# Train the model
model.fit(x_train, y_train, epochs=1, batch_size=64)
# Step 2: Use the TFImageExplainer with the Custom Model
explainer = TFImageExplainer(model)
# Example 1: Explain a Single Image (NumPy Array)
image = x_test[0] # Select the first image
label = y_test[0] # Get the label for the first image
# Generate the Grad-CAM explanation for the image
attribution_map = explainer.explain(image, idx=None, method="grad_cam", task="classification")
# Visualize the attribution map
plt.imshow(attribution_map, cmap="jet")
plt.colorbar()
plt.title("Grad-CAM Attribution Map for CIFAR-10 Image")
plt.show()
# Example 2: Explain an Image from the TensorFlow Dataset (Using idx)
idx = 10 # Select the 10th image from the test dataset
# Generate the Grad-CAM explanation for the image at index `idx`
attribution_map = explainer.explain(test_dataset, idx, method="grad_cam", task="classification")
# Visualize the attribution map
plt.imshow(attribution_map, cmap="jet")
plt.colorbar()
plt.title(f"Grad-CAM Attribution Map for Image Index {idx} in CIFAR-10")
plt.show()
Attribute | Description | Values |
---|---|---|
testset |
The input data, which can be a NumPy array, TensorFlow tensor, or Dataset. | [np.ndarray, tf.Tensor, tf.data.Dataset] |
idx |
The index of the test sample to explain. | int or None (for explaining a single sample or all samples) |
method |
The explanation method to use. | {grad_cam, occlusion} |
task |
The type of model task (e.g., classification). | {classification} |
label |
The class label for the input sample (used for classification tasks). | int |
The DlBacktraceImageExplainer
based on DLBacktrace, a method for analyzing neural networks by tracing the relevance of each component from output to input, to understand how each part contributes to the final prediction. It offers two modes: Default and Contrast, and is compatible with TensorFlow and PyTorch. (https://github.com/AryaXAI/DLBacktrace)
Example: Tensorflow Model DlBacktraceImageExplainer
# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Normalize pixel values to be between 0 and 1
x_train, x_test = x_train / 255.0, x_test / 255.0
# Create a simple CNN model for CIFAR-10
def create_cnn_model():
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
# Create the model
model = create_cnn_model()
# Train the model
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
# Save the model for later use
model.save('cifar10_cnn_model.h5')
explainer = BacktraceImageExplainer(model=model)
# Choose an image from the test set
test_image = x_test[0:1] # Selecting the first image for testing
# Get the explanation for the test image
explanation = explainer.explain(test_image, instance_idx=0,mode='default', scaler=1, thresholding=0, task='multi-class-classification')
# Plot the explanation (relevance map)
plt.imshow(explanation, cmap='hot')
plt.colorbar()
plt.title("Feature Relevance for CIFAR-10 Image")
plt.show()
Example: Torch Model DlBacktraceImageExplainer
# Define a simple CNN model for CIFAR-10 without using `view()`
class SimpleCNN(nn.Module):
def __init__(self, num_classes=10):
super(SimpleCNN, self).__init__()
self.identity = nn.Identity()
self.conv1 = nn.Conv2d(3, 16, 5,2)
self.relu1 = nn.ReLU()
self.conv2 = nn.Conv2d(16, 32, 3,2)
self.relu2 = nn.ReLU()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(32 * 6 * 6, 512)
self.relu3 = nn.ReLU()
self.fc2 = nn.Linear(512, num_classes)
def forward(self, x):
x = self.identity(x)
x = self.conv1(x)
x = self.relu1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.flatten(x)
x = self.fc1(x)
x = self.relu3(x)
x = self.fc2(x)
return x
# Load CIFAR-10 data with transforms for normalization
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=4, shuffle=True)
testloader = DataLoader(testset, batch_size=4, shuffle=False)
# Initialize and train the model
model = SimpleCNN()
model.train()
# Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
# Training loop
for epoch in range(1): # Just a couple of epochs for testing
running_loss = 0.0
for inputs, labels in trainloader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
print("Finished Training")
# Test the model using the BacktraceImageExplainer
explainer = DlBacktraceImageExplainer(model=model)
# Get the explanation for the first image
explanation = explainer.explain(testloader, instance_idx=0, mode='default', scaler=1, thresholding=0, task='multi-class-classification')
# Plot the explanation (relevance map)
plt.imshow(explanation, cmap='hot')
plt.colorbar()
plt.title("Feature Relevance for CIFAR-10 Image")
plt.show()
Attribute | Description | Values |
---|---|---|
test_data |
The input data, which can be a NumPy array, TensorFlow tensor, or Dataset. | [np.ndarray, tf.Tensor, tf.data.Dataset] |
instance_idx |
The index of the test sample to explain. | int (explaining a single sample) |
mode |
The explanation mode to use. | {default, contrast} |
task |
The type of model task (e.g., classification). | {binary-classification,multiclass-classification} |
scaler |
Total / Starting Relevance at the Last Layer | float ( Default: None, Preferred: 1) |
thresholding |
Thresholding Model Prediction to predict the actual class. | float |
contrast_mode |
Mode to Use if using 'contrast' mode of DlBacktrace Algorithm | {Positive,Negative} |
The xai_evals
package provides a powerful class, ExplanationMetricsTabular
, to evaluate the quality of explanations generated by SHAP and LIME. This class allows you to calculate several metrics, helping you assess the robustness, reliability, and interpretability of your model explanations. [NOTE: Metrics only supports Sklearn ML Models]
The ExplanationMetricsTabular
class in xai_evals
provides a structured way to evaluate the quality and reliability of explanations generated by SHAP or LIME for machine learning models. By assessing multiple metrics, you can better understand how well these explanations align with your model's predictions and behavior.
-
Initialize ExplanationMetrics
Begin by creating an instance of theExplanationMetricsTabular
class with the necessary inputs, including the model, explainer type, dataset, and the task type.from xai_evals.metrics import ExplanationMetricsTabular from xai_evals.explainer import SHAPExplainer from sklearn.ensemble import RandomForestClassifier import pandas as pd from sklearn.datasets import load_iris # Load dataset and train a model data = load_iris() X = pd.DataFrame(data.data, columns=data.feature_names) y = data.target model = RandomForestClassifier() model.fit(X, y) # Initialize ExplanationMetrics with SHAP explainer explanation_metrics = ExplanationMetricsTabular( model=model, explainer_name="shap", X_train=X, X_test=X, y_test=y, features=X.columns, task="binary" )
For ExplanationMetricsTabular Class we have several attributes :
Attribute | Description | Values |
---|---|---|
model | Trained model which you want to explain | {binary-classification, multiclass-classification} |
X_train | Training Set Data | {pd.dataframe,numpy.array} |
explainer_name | Which explaination method to use | {'shap','lime','torch','tensorflow', 'backtrace'} |
X_test | Test Set Data | {pd.dataframe,numpy.array} |
y_test | Test Set Labels | pd.dataseries |
features | Features present in the Training/Testing Set | [list of features] |
task | Task performed by the model | {binary-classification,multiclass-classification} |
metrics | List of metrics to calculate | ['faithfulness', 'infidelity', 'sensitivity', 'comprehensiveness', 'sufficiency', 'monotonicity', 'complexity', 'sparseness'] |
method | For specifying which explaination Method to use in Torch/Tensorflow/Backtrace Explainer | Torch-{ 'integrated_gradients', 'deep_lift', 'gradient_shap','saliency', 'input_x_gradient', 'guided_backprop','shap_kernel', 'shap_deep','lime'}, Tensorflow-{'shap_kernel','shap_deep','lime'},Backtrace-{'Default','Contrastive'} |
start_idx | Starting index of the dataset to evaluate | int |
end_idx | Ending index of the dataset to evaluate | int |
scaler | Total / Starting Relevance at the Last Layer Integer ( For Backtrace) | int (Default: None, Preferred: 1) |
thresholding | Thresholding Model Prediction | float (default=0.5) |
subset_samples | If we want to use k-means based sampling to use a subset for SHAP Explainer (Only for SHAP) | True/False |
subset_number | Number of samples to sample if subset_samples is True (Only for SHAP) | int |
-
Calculate Explanation Metrics
Use thecalculate_metrics
method to compute various metrics for evaluating explanations. The method returns a DataFrame with the results.# Calculate metrics metrics_df = explanation_metrics.calculate_metrics() print(metrics_df)
The ExplanationMetrics
class supports the following key metrics for evaluating explanations:
Metric | Purpose | Description |
---|---|---|
Faithfulness | Measures consistency between attributions and prediction changes. | Correlation between attribution values and changes in model output when features are perturbed. |
Infidelity | Assesses how closely attributions align with the actual prediction impact. | Squared difference between predicted and actual impact when features are perturbed. |
Sensitivity | Evaluates the robustness of attributions to small changes in inputs. | Compares attribution values before and after perturbing input features. |
Comprehensiveness | Assesses the explanatory power of the top-k features. | Measures how much model prediction decreases when top-k important features are removed. |
Sufficiency | Determines whether top-k features alone are sufficient to explain the model's output. | Compares predictions based only on the top-k features to baseline predictions. |
Monotonicity | Verifies the consistency of attribution values with the direction of predictions. | Ensures that changes in attributions match consistent changes in predictions. |
Complexity | Measures the sparsity of explanations. | Counts the number of features with non-zero attribution values. |
Sparseness | Assesses how minimal the explanation is. | Calculates the proportion of features with zero attribution values. |
Reference Values for Available Metrics :
Metric | Typical Range | Interpretation | "Better" Direction |
---|---|---|---|
Faithfulness | -1 to 1 | Measures correlation between attributions and changes in model output when removing features. Higher indicates that more important features (according to the explanation) indeed cause larger changes in the model’s prediction. | Higher is better (closer to 1) |
Infidelity | ≥ 0 | Measures how well attributions predict changes in the model’s output under input perturbations. Lower infidelity means the attributions closely match the model’s behavior under perturbations. | Lower is better (closer to 0) |
Sensitivity | ≥ 0 | Measures how stable attributions are to small changes in the input. Lower values mean more stable (robust) explanations. | Lower is better (closer to 0) |
Comprehensiveness | Depends on model output | Measures how much the prediction drops when the top-k most important features are removed. If removing them significantly decreases the prediction, it suggests these features are truly important. | Higher difference indicates more comprehensive explanations |
Sufficiency | Depends on model output | Measures how well the top-k features alone approximate or even match the original prediction. A higher (or less negative) value means these top-k features are sufficient on their own, capturing most of what the model uses. | Higher (or closer to zero if baseline is the original prediction) is generally better |
Monotonicity | 0 to 1 (as an average) | Checks if attributions are in a non-increasing order. A higher average indicates that the explanation presents a consistent ranking of feature importance. | Higher is better (closer to 1) |
Complexity | Depends on number of features | Measures the number of non-zero attributions. More features with non-zero attributions mean a more complex explanation. Fewer important features make it easier to interpret. | Lower is typically preferred |
Sparseness | 0 to 1 | Measures the fraction of attributions that are zero. Higher sparseness means fewer features are highlighted, making the explanation simpler. | Higher is generally preferred |
1. Faithfulness Correlation
- Correlates feature attributions with prediction changes when features are perturbed.
- Higher correlation indicates that the explanation aligns well with model predictions.
faithfulness_score = explanation_metrics.calculate_metrics()['faithfulness']
print("Faithfulness:", faithfulness_score)
2. Infidelity
- Computes the squared difference between predicted and actual changes in model output.
- Lower scores indicate higher alignment of explanations with model behavior.
infidelity_score = explanation_metrics.calculate_metrics()['infidelity']
print("Infidelity:", infidelity_score)
3. Comprehensiveness
- Evaluates whether removing the top-k features significantly reduces the model's prediction confidence.
- A higher score indicates that the top-k features are critical for the prediction.
comprehensiveness_score = explanation_metrics.calculate_metrics()['comprehensiveness']
print("Comprehensiveness:", comprehensiveness_score)
After calculating the metrics, the method returns a DataFrame summarizing the results:
Metric | Value |
---|---|
Faithfulness | 0.89 |
Infidelity | 0.05 |
Sensitivity | 0.13 |
Comprehensiveness | 0.62 |
Sufficiency | 0.45 |
Monotonicity | 1.00 |
Complexity | 7 |
Sparseness | 0.81 |
The xai_evals
package provides a powerful class, ExplanationMetricsImage
, to evaluate the quality of explanations generated for image-based deep learning models. This class allows you to calculate several metrics, helping you assess the robustness, reliability, and interpretability of your image explanations. [NOTE: Metrics currently support image-based deep learning models such as PyTorch and TensorFlow.]
The ExplanationMetricsImage
class in xai_evals
provides a structured way to evaluate the quality and reliability of image-based explanations, such as GradCAM, Integrated Gradients, and Occlusion. By assessing multiple metrics, you can better understand how well these image explanations align with your model's predictions and behavior. This class uses Quantus to calculate the various metrics for evaluating explanations.
-
Initialize ExplanationMetricsImage
Begin by creating an instance of theExplanationMetricsImage
class with the necessary inputs, including the model, dataset, and evaluation settings. -
Evaluate Explanation Metrics
Use theevaluate
method to compute various metrics for evaluating image-based explanations. The method returns a dictionary with the results.import torch import torchvision import torchvision.transforms as transforms from torch.utils.data import DataLoader from tensorflow.keras.datasets import cifar10 from xai_evals.metrics import ExplanationMetricsImage from torchvision import models import tensorflow as tf import numpy as np import torch.optim as optim import tensorflow.keras as keras # --- TensorFlow Setup --- # Load CIFAR-10 dataset (for TensorFlow example) (x_train, y_train), (x_test, y_test) = cifar10.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 # Normalize the images train_data = (x_train, y_train) # Tuple of data and labels test_data = (x_test, y_test) # Tuple of data and labels # Convert to TensorFlow Dataset train_dataset_tf = tf.data.Dataset.from_tensor_slices(train_data).batch(32) test_dataset_tf = tf.data.Dataset.from_tensor_slices(test_data).batch(32) # --- PyTorch Setup --- # PyTorch Dataset for CIFAR-10 transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) trainloader = DataLoader(trainset, batch_size=4, shuffle=True) # --- Custom Model Setup --- # Custom PyTorch model (simple CNN for CIFAR-10) class SimpleCNN(torch.nn.Module): def __init__(self): super(SimpleCNN, self).__init__() self.conv1 = torch.nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1) self.conv2 = torch.nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1) self.fc1 = torch.nn.Linear(64*8*8, 128) self.fc2 = torch.nn.Linear(128, 10) # 10 classes for CIFAR-10 def forward(self, x): x = torch.relu(self.conv1(x)) x = torch.max_pool2d(x, 2) x = torch.relu(self.conv2(x)) x = torch.max_pool2d(x, 2) x = x.view(x.size(0), -1) x = torch.relu(self.fc1(x)) x = self.fc2(x) return x # --- TensorFlow Model Setup --- model_tf = tf.keras.Sequential([ tf.keras.layers.Conv2D(32, kernel_size=3, activation='relu', input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), tf.keras.layers.Conv2D(64, kernel_size=3, activation='relu'), tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) # 10 classes for CIFAR-10 ]) # Compile the model for training model_tf.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # --- TensorFlow Model Training (1 Epoch) --- model_tf.fit(train_dataset_tf, epochs=1) print("Finished TensorFlow Training") # Initialize PyTorch model model_torch = SimpleCNN() model_torch.train() # Set model to training mode # --- Training PyTorch Model for 1 Epoch --- criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model_torch.parameters(), lr=0.001, momentum=0.9) for epoch in range(1): # Training for 1 epoch running_loss = 0.0 for i, data in enumerate(trainloader, 0): inputs, labels = data optimizer.zero_grad() outputs = model_torch(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() if i % 2000 == 1999: # Print every 2000 mini-batches print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}") running_loss = 0.0 print("Finished PyTorch Training") # --- Example 1: PyTorch Metrics Calculation --- metrics_image_pytorch = ExplanationMetricsImage( model=model_torch, data_loader=trainloader, framework="torch", num_classes=10 ) # Example: Calculate metrics using the PyTorch DataLoader metrics_results_pytorch = metrics_image_pytorch.evaluate( start_idx=0, end_idx=32, metric_names=["FaithfulnessCorrelation","MaxSensitivity","MPRT","SmoothMPRT","AvgSensitivity","FaithfulnessEstimate"], xai_method_name="IntegratedGradients" ) print("PyTorch Example Metrics:", metrics_results_pytorch) # --- Example 2: TensorFlow Metrics Calculation --- metrics_image_tensorflow = ExplanationMetricsImage( model=model_tf, # Use TensorFlow model for TensorFlow example data_loader=train_dataset_tf, framework="tensorflow", num_classes=10 ) # Example: Calculate metrics using the TensorFlow Dataset metrics_results_tensorflow = metrics_image_tensorflow.evaluate( start_idx=0, end_idx=32, metric_names=["FaithfulnessCorrelation","MaxSensitivity","MPRT","SmoothMPRT","AvgSensitivity","FaithfulnessEstimate"], xai_method_name="GradCAM" ) print("TensorFlow Example Metrics:", metrics_results_tensorflow) # --- Example 3: Explain using a single image (numpy array) --- single_image_numpy = np.random.randn(1,3,32, 32) # Random image as a NumPy array, [H, W, C] label = np.random.randint(0, 9,size=1) # Initialize ExplanationMetricsImage for a single image (use PyTorch framework even for NumPy array) metrics_image_single = ExplanationMetricsImage( model=model_torch, # Use PyTorch model data_loader=(single_image_numpy,label), # Pass the single image as a numpy array framework="torch", # Use the torch framework for single image num_classes=10, ) # Calculate metrics for the single image metrics_single_image = metrics_image_single.evaluate( start_idx=0, end_idx=1, metric_names=["FaithfulnessCorrelation","MaxSensitivity","MPRT","SmoothMPRT","AvgSensitivity","FaithfulnessEstimate"], xai_method_name="IntegratedGradients" ) print("Single Image Example Metrics:", metrics_single_image) # --- Example 4: TensorFlow Model with Single Image --- single_image_numpy = np.random.randn(1,32, 32,3) # Random image as a NumPy array, [H, W, C] label = np.random.randint(0, 9,size=1) # For TensorFlow, the single image example using TensorFlow framework metrics_image_single_tf = ExplanationMetricsImage( model=model_tf, # Use TensorFlow model data_loader=(single_image_numpy,label), # Pass the single image as a numpy array framework="tensorflow", # Use the tensorflow framework for single image num_classes=10 ) # Calculate metrics for the single image metrics_single_image_tf = metrics_image_single_tf.evaluate( start_idx=0, end_idx=1, metric_names=["FaithfulnessCorrelation","MaxSensitivity","MPRT","SmoothMPRT","AvgSensitivity","FaithfulnessEstimate"], xai_method_name="GradCAM" ) print("TensorFlow Single Image Example Metrics:", metrics_single_image_tf)
The ExplanationMetricsImage
class supports the following key metrics for evaluating image explanations:
Metric | Purpose | Description |
---|---|---|
FaithfulnessCorrelation | Measures the correlation between attribution values and model output changes when perturbing image features. | Higher values indicate that important features (according to the explanation) indeed cause significant changes in the model’s prediction. |
MaxSensitivity | Measures the maximum sensitivity of an attribution method to input perturbations. | Higher values suggest that the attribution method highlights the most sensitive parts of the image. |
MPRT | Measures the relevance of features based on perturbations. | Helps evaluate the robustness of the explanation when features are perturbed. |
SmoothMPRT | A smoother version of MPRT that reduces noise from perturbations. | Ensures more stable results by averaging perturbations. |
AvgSensitivity | Measures the average sensitivity of the model to input perturbations across all features. | Indicates how sensitive the model is to small changes in the input. |
FaithfulnessEstimate | Estimates the faithfulness of the attribution by comparing against a perturbation baseline. | Compares how well the explanation reflects the model’s behavior under feature perturbations. |
Reference Values for Available Metrics:
Metric | Typical Range | Interpretation | "Better" Direction |
---|---|---|---|
FaithfulnessCorrelation | -1 to 1 | Measures correlation between attribution values and changes in model output when features are perturbed. Higher indicates that more important features (according to the explanation) indeed cause larger changes in the model’s prediction. | Higher is better (closer to 1) |
MaxSensitivity | ≥ 0 | Measures how well attributions match model sensitivity when perturbing image features. Lower scores indicate that the explanations focus on the most sensitive features. | Lower is better (closer to 0) |
MPRT | ≥ 0 | Measures how the perturbation of features affects the model’s prediction. A higher score indicates that the model's prediction is heavily influenced by the perturbed features. | Higher is better |
SmoothMPRT | ≥ 0 | Measures the stability of MPRT under perturbation noise. Higher values suggest more stable explanations. | Higher is better |
AvgSensitivity | ≥ 0 | Measures the average change in prediction for small changes in input features. Indicates model robustness. | Lower is better |
FaithfulnessEstimate | 0 to 1 | Compares model predictions under perturbations and attributions. Higher values indicate better alignment. | Higher is better |
Attribute | Description | Values |
---|---|---|
model |
The trained model for which explanations will be evaluated. | [PyTorch model, TensorFlow model] |
data_loader |
The data loader or dataset containing the test data. | [PyTorch Dataset,PyTorch DataLoader,TensorFlow Dataset, tuple of (image-np.array/torch.Tensor/tensorflow.Tensor.Tensor,label-np.array/torch.Tensor/tensorflow.Tensor)] |
framework |
The framework used for the model (either 'torch' or 'tensorflow' or 'backtrace'). | {'torch', 'tensorflow','backtrace'} |
device |
The device (CPU/GPU) used for performing computations (for PyTorch models). | [torch.device (Optional)] |
num_classes |
The number of classes for classification tasks. | Integer (default: 10) |
Attribute | Description | Values |
---|---|---|
start_idx |
The starting index of the batch for evaluation. | Integer (e.g., 0) |
end_idx |
The ending index of the batch for evaluation. | Integer (e.g., 100, or None for the entire batch) |
metric_names |
The list of metric names to evaluate. | List of strings representing the metrics to compute (e.g., ["FaithfulnessCorrelation", "MaxSensitivity", "MPRT", "SmoothMPRT", "AvgSensitivity", "FaithfulnessEstimate"] ) |
xai_method_name |
The name of the XAI method used for explanations (e.g., 'IntegratedGradients', 'GradCAM', etc.). | String (e.g., for Torch {grad_cam, integrated_gradients, saliency, deep_lift, gradient_shap, guided_backprop, occlusion, layer_gradcam, feature_ablation} ; for Tensorflow {VanillaGradients, GradCAM,GradientsInput,IntegratedGradients,OcclusionSensitivity,SmoothGrad & for Backtrace {default,contrast-positive,contrast-negative} ) |
1. Faithfulness Correlation
- Correlates feature attributions with prediction changes when features (pixels) in the image are perturbed.
- Higher correlation indicates that the explanation aligns well with model predictions.
faithfulness_score = metrics_image.evaluate(
start_idx=0, end_idx=5, metric_names=["FaithfulnessCorrelation"], xai_method_name="IntegratedGradients"
)['FaithfulnessCorrelation']
print("Faithfulness:", faithfulness_score)
2. Max Sensitivity
- Measures the sensitivity of the explanation method by observing the effect of perturbing different parts of the image.
- A higher score indicates that the explanation method is sensitive to the most influential pixels.
max_sensitivity_score = metrics_image.evaluate(
start_idx=0, end_idx=5, metric_names=["MaxSensitivity"], xai_method_name="IntegratedGradients"
)['MaxSensitivity']
print("Max Sensitivity:", max_sensitivity_score)
After calculating the metrics, the method returns a dictionary summarizing the results:
Metric | Value |
---|---|
FaithfulnessCorrelation | 0.88 |
MaxSensitivity | 0.92 |
- Interpretability: Quantifies how well feature attributions explain the model's predictions.
- Robustness: Evaluates the stability of explanations under input perturbations.
- Comprehensiveness and Sufficiency: Provides insights into the contribution of top features to the model’s predictions.
- Scalability: Works with various tasks, including binary classification, multi-class classification, and regression.
By leveraging these metrics, you can ensure that your explanations are meaningful, robust, and align closely with your model's decision-making process.
We would like to extend our heartfelt thanks to the developers and contributors of the libraries Quantus, Captum, tf-explain, LIME, and SHAP, which have been instrumental in enabling the explainability methods implemented in this package.
-
Quantus provides a comprehensive suite of metrics that allow us to evaluate and assess the quality of explanations, ensuring that our interpretability methods are both reliable and robust.
-
Captum is an invaluable tool for PyTorch users, offering a variety of powerful attribution methods like Integrated Gradients, Saliency, and Gradient Shap, which are crucial for generating insights into the inner workings of deep learning models.
-
tf-explain simplifies the process of explaining TensorFlow/Keras models, with methods like GradCAM and Occlusion Sensitivity, enabling us to generate visual explanations that help interpret the decision-making of complex models.
-
LIME (Local Interpretable Model-Agnostic Explanations) has been a key library for providing local explanations for machine learning models, allowing us to generate understandable explanations for individual predictions.
-
SHAP (SHapley Additive exPlanations) is essential for computing Shapley values and provides a unified approach to explaining machine learning models, making it easier to understand feature contributions across a range of model types.
We are deeply grateful for the contributions these libraries have made in advancing model interpretability, and their seamless integration in our package ensures that users can leverage state-of-the-art methods for understanding machine learning and deep learning models.
This project is licensed under the MIT License - see the LICENSE file for details.
In the future, we will continue to improve this library.
This code is free. So, if you use this code anywhere, please cite us:
@misc{seth2025xaievalsframeworkevaluating,
title={xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods},
author={Pratinav Seth and Yashwardhan Rathore and Neeraj Kumar Singh and Chintan Chitroda and Vinay Kumar Sankarapu},
year={2025},
eprint={2502.03014},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.03014},
}
Contanct us at AryaXAI.