Extended Isolation Forest Feature Importance (ExIFFI) is a model specific post-hoc interpretability method used to provide explanations to the predictions of the popular Extended Isolation Forest (EIF) Anomaly Detection (AD) algorithm.
ExIFFI was introduced in the paper Enhancing interpretability and generalizability in extended isolation forests
by Arcudi et al. (2024). This paper also introduces an extension to the EIF AD model called EIF+ which improves AD performances especially in one-class settings (i.e. normal training data, contaminated test data).
In the official paper ExIFFI is presented from a methodological perspective and it's benchmarked on a set of hand-crafted synthetic dataset and real-world data from the Outlier Detection DataSets (ODDS) library. These dataset are not however representative of real industrial applications.
For this reason ExIFFI was also tested on a set of large scale industrial datasets in Interpetable Data-driven Anomaly Detection in Industrial Processes with ExIFFI by Frizzo et al. This paper was recently extended in Towards Transparent and Efficient Anomaly Detection in Industrial Processes through ExIFFI with the addition of new industrial benchmark datasets and additional experiments to validate ExIFFI against other popular state of the art approaches.
Code and dataset from the official ExIFFI paper are publicly available in
the official GitHub repository.
A detailed code documentation is available here.
Code and datasets from the extension paper are available in the official GitHub repository.
The repository contains a detailed guide on how to reproduce the results.
Moreover in order to get a taste of the visualizations produced by ExIFFI to explain the EIF's predictions some additional results are
available.
ExIFFI is a natural extension of DIFFI, which is also part of the AI Toolbox.
DIFFI is used to enhance the interpretability of the Isolation Forest model which however lacks in flexibility since it uses one-dimensional axis-aligned partitions to divide the feature space and isolate anomalous samples. For this reason IF is not able to recognize anomalies emerging from multivariate interactions, i.e. anomalies associated to multiple faults of a machine if the
industrial context is considered.
This limitation is overcome by the EIF algorithm which extends IF by using multi dimensional hyperplanes to partition to feature space.
As for DIFFI, ExIFFI is equipped with:
ExIFFI PipelineThe fitting process of ExIFFI is very similar to the one of DIFFI with the difference that the Induced Imbalance Coefficients and the Cumulative Feature Importances are computed on -dimensional vectors (where is the dimension of the feature space) instead of scalar values.
Another key difference is that in ExIFFI the importance scores are not weighted by the depth of a sample in an isolation tree since in the multi dimensional case the depth information has a negligible impact on the importance computation.
ExIFFI HighlightsExIFFI works on an already fitted EIF model without altering its AD performancesExIFFI is asymptotically heavier than DIFFI (i.e. a factor has to be considerd because all the importance accumulations have to be performed in dimensions) in Towards Transparent and Efficient Anomaly Detection in Industrial Processes through ExIFFI it overperforms DIFFI in computational speed thanks to an implementation focusing on efficiency. The speedup is achieved by utilizing parallel computing and integrating the standard python code with C.DIFFI, also ExIFFI can be used to perform unsupervised feature selection to select the best features to use in an AD setting. Moreover Feature Selection can also be used as a proxy task to perform a quantitative evaluation of the goodness of an explanation in case ground truth on feature importance is missing. This is achieved through the score, introduced in the ExIFFI paper.The principal use cases of ExIFFI are:
EIF and EIF+ models by providing explanations on the root causes underlying anomalous behaviors.ExIFFI an extension of IF, also explanations on the IF model's outputs can be producedLimitations of ExIFFI are:
ExIFFI is a model specific intepretation algorithm so it is tailored to explain predictions produced only by isolation based AD methods (i.e.IF,EIF,EIF+).ExIFFI does not work well with clustered anomalies, i.e. anomalies located within dense clusters. This is caused by the fact that the data partitions used both in IF, EIF and EIF+ are designed to isolate single points located far from inliers. In case an anomalous point is sorrounded by dense clusters also normal points will be included in the partition.ExIFFI@article{exiffi,
author = {Alessio Arcudi and others},
title = {Enhancing interpretability and generalizability in extended isolation forests},
journal = {Eng. App. of Artificial Intelligence},
volume = {138},
pages = {109409},
year = {2024},
issn = {0952-1976},
doi = {https://doi.org/10.1016/j.engappai.2024.109409},
url = {https://www.sciencedirect.com/science/article/pii/S0952197624015677},
}
ExIFFI Industrial Application@inproceedings{exiffi_ind,
title={Interpretable Data-driven Anomaly Detection in Industrial Processes with ExIFFI},
author={Frizzo, Davide and others},
booktitle={2024 IEEE 8th Forum on Research and Technologies for Society and Industry Innovation (RTSI)},
pages={595--600},
year={2024},
organization={IEEE}
}
ExIFFI Industrial Application Extension@misc{exiffi_ind_extended,
title={Towards Transparent and Efficient Anomaly Detection in Industrial Processes through ExIFFI},
author={Davide Frizzo and Francesco Borsatti and Alessio Arcudi and Antonio De Moliner and Roberto Oboe and Gian Antonio Susto},
year={2026},
eprint={2405.01158},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2405.01158},
}