Bibtype  Inproceedings 

Bibkey  Pfahler/Morik/2020a 
Author  Pfahler, Lukas and Morik, Katharina 
Ls8autor 
Morik, Katharina
Pfahler, Lukas 
Title  Semantic Search in Millions of Equations 
Booktitle  KDD '20 Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 
Publisher  ACM 
Abstract  Given the increase of publications, search for relevant papers becomes tedious. In particular, search across disciplines or schools of thinking is not supported. This is mainly due to the retrieval with keyword queries: technical terms differ in different sciences or at different times. Relevant articles might better be identified by their mathematical problem descriptions. Just looking at the equations in a paper already gives a hint to whether the paper is relevant.
Hence, we propose a new approach for retrieval of mathematical expressions based on machine learning. We design an unsupervised representation learning task that combines embedding learning with selfsupervised learning. Using graph convolutional neural networks we embed mathematical expression into lowdimensional vector spaces that allow efficient nearest neighbor queries. To train our models, we collect a huge dataset with over 29 million mathematical expressions from over 900,000 publications published on arXiv.org. The math is converted into an XML format, which we view as graph data. Our empirical evaluations involving a new dataset of manually annotated search queries show the benefits of using embedding models for mathematical retrieval. 
Year  2020 
Projekt  SFB876A1 
Url  https://dl.acm.org/doi/pdf/10.1145/3394486.3403056 

Publicfile 
pfahler_morik_2020a.pdf [672 KB]

