Hauptnavigation

Buschjaeger/Honysz/2020a: Generalized Isolation Forest: Some Theory and More Applications -- Extended Abstract

Bibtype Inproceedings
Bibkey Buschjaeger/Honysz/2020a
Author Buschjäger, Sebastian and Honysz, Philipp-Jan and Morik, Katharina
Ls8autor Buschjäger, Sebastian
Honysz, Philipp-Jan
Morik, Katharina
Title Generalized Isolation Forest: Some Theory and More Applications -- Extended Abstract
Booktitle Proceedings 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA 2020)
Organization IEEE
Abstract Isolation Forest is a popular outlier detection algorithm that isolates outlier
observations from regular observations by building multiple random decision trees.
Multiple extensions enhance the original Isolation Forest algorithm including the Extended Isolation Forest which allows for non-rectangular splits and the SCiForest which improves the fitting of individual trees. All these approaches rate the outlierness of an observation by its average path-length. However, we find a lack of theoretical explanation on why these isolation-based algorithms offer such good practical performance.
In this paper, we present a theoretical framework that describes the effectiveness of isolation-based approaches from a distributional viewpoint. We show that these algorithms fit a mixture of distributions, where the average path length of an observation can be viewed as a (somewhat crude) approximation of the mixture coefficient. Using this framework, we derive the Generalized Isolation Forest (GIF) which also trains random trees, but combining them moves beyond using the average path-length.
In an extensive evaluation of over $350,000$ experiments, we show that GIF outperforms the other methods on a variety of datasets while having comparable runtime.
Year 2020
Projekt SFB876-A1
 


  • Privacy Policy
  • Imprint