MSc. Stats Defence: Conditional Replicated Softmax for Topic Modelling with Metadata
Date and Time
Location
Summerlee Science Complex Room 1511
Details
CANDIDATE: CHARLES AUSTRIA
ABSTRACT:
Topic models are popular tools that model documents with the goal of extracting semantic topics from text. Documents often come with metadata such as authors, dates, or publication venues, however current state-of-the-art topic models do not incorporate metadata. This thesis introduces the conditional replicated softmax model, which is an undirected graphical model that models document word counts and document specific metadata using restricted Boltzmann machines. An additional input layer that is associated with the metadata is added to the replicated softmax model, thereby making the states of the hidden units conditional upon the metadata. This thesis compares the conditional replicated softmax model to other state-of the-art topic models on the NIPS conference proceedings from 1987 to 1999. The learned topics appear richer and more interpretable relative to Dirichlet multinomial regression, but comparable to replicated softmax. Regardless, the added complexity of the new model was associated with higher test perplexity, which scores their ability to predict unseen documents from a test set, and higher penalized perplexity which penalizes perplexity for model complexity.
Advisory Committee
- A. Ali, Advisor
- T. Desmond
Examining Committee
- J. Balka, Chair
- A. Ali
- T. Desmond
- K. Nadeem