Automated Narrative Analysis
The accuracy of four Machine Learning methods in predicting narrative macrostructure scores was compared to scores obtained by human raters utilizing a criterion-referenced progress monitoring rubric. The machine learning methods that were explored covered methods that utilized hand-engineered features, as well as those that learn directly from the raw text. The predictive models were trained on a corpus of 414 narratives from a normative sample of school-aged children (5;0-9;11) who were given a standardized measure of narrative proficiency. Performance was measured using Quadratic Weighted Kappa, a metric of inter-rater reliability. The results indicated that one model, BERT, not only achieved significantly higher scoring accuracy than the other methods, but was consistent with scores obtained by human raters using a valid and reliable rubric. The findings from this study suggest that a machine learning method, specifically, BERT, shows promise as a way to automate the scoring of narrative macrostructure for potential use in clinical practice.
Author ORCID Identifier
Ronald B Gillam https://orcid.org/0000-0002-6077-6885
There are two csv files. AutomatedNarrativeAnalysisMIMSLData.csv contains de-identified MISL scores for 414 participants in response the the Aliens story from the Test of Narrative Language, as well as their associated de-indentified transcript and full Coh-Metrix measures. ExpertScores.csv contains the MISL double-scores for a randomly selected set of 50 narrative transcripts, which were produced by an expert doctoral student.
Utah State University
Data were collected as part of the TNL norming data-base, part of a national norming sample. Audio collected during sampling were digitally recorded and transcribed in Systematic Analysis of Language Transcripts (SALT) software by trained research assistants who were blinded to the purposes of the study. Transcripts were cleaned in R to remove unwanted characters. MISL data were cleaned in excel and processed using R and Python. Expert scores were obtained by randomly selecting 50 narrative transcripts and double-scoring them on the MISl. Expert scores were produced by an expert doctoral student with more than three years of scoring experience.
Jones, S., Fox, C., Gillam, S., & Gillam, R. B. (2019). An exploration of automated narrative analysis via machine learning. PLOS ONE, 14(10), e0224634. https://doi.org/10.1371/journal.pone.0224634
ID = de-identified assigned ID number
vecOfNarratives = narrative transcript
Char = character score
Sett = setting score
E = initating event score
Plan = plan score
Act = action score
Con = consequence score
ENP = elaborated noun phrase score
Communication Sciences and Disorders
This work is licensed under a Creative Commons Attribution 4.0 License.
Gillam, R., Jones, S. K., & Fox, C. (2019). Automated Narrative Analysis. Utah State University. https://doi.org/10.26078/CATV-BM75
Additional FilesREADME.txt (6 kB)
AutomatedNarrativeAnalysisMISLData.csv (606 kB)
ExpertScores.csv (22 kB)