Date of Award

8-2024

Degree Type

Creative Project

Degree Name

Master of Science (MS)

Department

Economics and Finance

Committee Chair(s)

Carly Fox

Committee

Carly Fox

Committee

Todd Griffith

Committee

Pedram Jahangiry

Abstract

This thesis rigorously evaluates the application of an array of natural language processing (NLP) techniques and machine learning models to identify linguistic signatures indicative of dementia, as sourced from the DementiaBank Pitt corpus. Utilizing a binary classification paradigm, this study meticulously integrates sophisticated embedding methods—including Doc2Vec, Word2Vec, GloVe, and BERT—with traditional machine learning algorithms such as Random Forest, Multinomial Naïve Bayes, ADA boost, KNN classifier, and Logistic Regression, alongside deep learning architectures like LSTM, Bi-LSTM, and CNN-LSTM. The efficacy of these methodologies is evaluated based on their capacity to differentiate between transcribed speech impacted by dementia and that from control subjects. To enhance interpretability, this research also employs feature importance analysis through LIME, SHAP, permutation importance, and integrated gradients, shedding light on the variables most instrumental in driving model predictions. The results of this comprehensive analysis not only illuminate the robust potential of these combined NLP and machine learning approaches in the context of medical screening but also contribute additional valuable insights to the field of NLP and dementia screening specifically.

Share

COinS