Hauptnavigation

Pfahler/etal/2017a: Learning Low-Rank Document Embeddings with Weighted Nuclear Norm Regularization

Bibtype Inproceedings
Bibkey Pfahler/etal/2017a
Author Pfahler, Lukas and Morik, Katharina and Elwert, Frederik and Tabti, Samira and Krech, Volkhard
Ls8autor Morik, Katharina
Pfahler, Lukas
Title Learning Low-Rank Document Embeddings with Weighted Nuclear Norm Regularization
Booktitle Proceedings of the 4th IEEE International Conference on Data Science and Advanced Analytics
Abstract Recently, neural embeddings of documents have shown success in various language processing tasks. These low-dimensional and dense feature vectors of text documents capture semantic similarities better than traditional methods.However, the underlying optimization problem is non-convex and usually solved using stochastic gradient descent.Hence solutions are most-likely sub-optimal and not reproducible, as they are the result of a randomized algorithm.We present an alternative formulation for learning low-rank representations based on convex optimization. Instead of explicitly learning low-dimensional features, we compute a low-rank representation implicitly by regularizing full-dimensional solutions.Our approach uses the weighted nuclear norm, a regularizer that penalizes singular values of matrices. We optimize the regularized objective using accelerated proximal gradient descent.We apply the approach to learn embeddings of documents. These embeddings are guaranteed to converge to a global optimum in a deterministic manner.We show that our convex approach outperforms traditional convex approaches in a numerical study. Furthermore we demonstrate that the embeddings are useful for detecting similarities on a standard dataset. Then we apply our approach in an interdisciplinary research project to detect topics in religious online discussions. The topic descriptions obtained from a clustering of embeddings are coherent and insightful. In comparison to existing approaches, they are also reproducible.
Year 2017
Projekt relnet
Publicfile