Date of Award:


Document Type:


Degree Name:

Master of Science (MS)


Computer Science

Committee Chair(s)

John Edwards


John Edwards


Christopher Hartwell


Curtis Dyreson


We explored 682176 employee reviews of Fortune 50 companies from using topic discovery techniques like Latent Dirichlet Allocation (LDA) and Structural Topic Modeling (STM) to identify salient aspects in employee reviews and automatically infer latent topics that tend to drive employee satisfaction. We also studied how various satisfaction factors could be related to employee turnover. We discovered important topics in the reviews, including Management and Leadership, Advancement Opportunity, Pay and Benefits, Work-Life Balance, and Culture, which we compare to the five Job Descriptive Index (JDI) facets. Both LDA and STM discovered well-separated and distinguishable topics. We also incorporated a “Job Status” covariate in STM, which helped distinguish between what topics were talked about most by “Former” vs “Current” employees, and consequently helped us analyze the factors that could have caused employee turnover. We found that Leadership and Management and Overwork and Stressful Environment were the dominant factors contrasting between former and current employees, suggesting that they might be a leading cause of employee turnover. Furthermore, we post-processed the topic probability result from the STM model and analyzed it to determine sector-wise topic contribution for each topic, and also analyzed the company-wise topic contribution in each sector. We found that Retail sectors talked the most about Pay and Benefits and Length of Breaks, whereas the Technology sector’s employees were more concerned about the Work-Life Balance issue. Our results are directly usable to support company behavioral management decision makers to conceive and evaluate initiatives intended to enhance employee satisfaction. Furthermore, our techniques, including a novel visualization of topic composition and quality, are generalizable to any setting that uses topic discovery from unstructured text, and especially those comparing topics across entities.