Title

Automated Narrative Analysis

Description

The accuracy of four Machine Learning methods in predicting narrative macrostructure scores was compared to scores obtained by human raters utilizing a criterion-referenced progress monitoring rubric. The machine learning methods that were explored covered methods that utilized hand-engineered features, as well as those that learn directly from the raw text. The predictive models were trained on a corpus of 414 narratives from a normative sample of school-aged children (5;0-9;11) who were given a standardized measure of narrative proficiency. Performance was measured using Quadratic Weighted Kappa, a metric of inter-rater reliability. The results indicated that one model, BERT, not only achieved significantly higher scoring accuracy than the other methods, but was consistent with scores obtained by human raters using a valid and reliable rubric. The findings from this study suggest that a machine learning method, specifically, BERT, shows promise as a way to automate the scoring of narrative macrostructure for potential use in clinical practice.

Document Type

Dataset

DCMI Type

Dataset

File Format

.csv, .txt

Viewing Instructions

There are two csv files. AutomatedNarrativeAnalysisMIMSLData.csv contains de-identified MISL scores for 414 participants in response the the Aliens story from the Test of Narrative Language, as well as their associated de-indentified transcript and full Coh-Metrix measures. ExpertScores.csv contains the MISL double-scores for a randomly selected set of 50 narrative transcripts, which were produced by an expert doctoral student.

Publication Date

6-6-2019

Funder

Lillywhite Endowment

Publisher

Utah State University

Methodology

Data were collected as part of the TNL norming data-base, part of a national norming sample. Audio collected during sampling were digitally recorded and transcribed in Systematic Analysis of Language Transcripts (SALT) software by trained research assistants who were blinded to the purposes of the study. Transcripts were cleaned in R to remove unwanted characters. MISL data were cleaned in excel and processed using R and Python. Expert scores were obtained by randomly selecting 50 narrative transcripts and double-scoring them on the MISl. Expert scores were produced by an expert doctoral student with more than three years of scoring experience.

Start Date

1-2003

End Date

11-2003

Language

eng

Code Lists

ID = de-identified assigned ID number

vecOfNarratives = narrative transcript

Char = character score

Sett = setting score

E = initating event score

Plan = plan score

Act = action score

Con = consequence score

ENP = elaborated noun phrase score

Comments

See README for additional information.

Disciplines

Communication Sciences and Disorders

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Files

README.txt (6 kB)
MD5: b3aead1b9c2252a76a69f5a838ddb284

AutomatedNarrativeAnalysisMISLData.csv (606 kB)
MD5: a41039f0a14b062c6e4317cba3caf11a

ExpertScores.csv (22 kB)
MD5: 8c3457a382a0e1c6bb7517da533fe647

Share

 
COinS