1. Dataset Title: AutomatedNarrativeAnalysisMISLData.csv, ExpertScores.csv 2. Name and contact information of PI: a. Name: Ronald Gillam, PhD b. Institution: Utah State University c. Address: ECERC 224, Deparment of Communicative Disorders and Deaf Education d. Email: ron.gillam@usu.edu e. ORCiD ID: https://orcid.org/0000-0002-6077-6885 3. Name and contact information of Co-PI a. Name: Sharad Jones b. Institution: Utah State University c. Address: Animal Sciences Building, Department of Mathematics and Statistics d. Email: sharad.k.jones@gmail.com e. ORCiD ID: N/A 4. Name and contact information of Co-PI: a. Name: Carly Fox b. Institution: Utah State University c. Address: ECERC 226, Department of Special Education and Rehabilitation d. Email: carlyb.fox@gmail.com e. ORCiD ID: N/A 5. (Repeat if needed) 6. Funding source (Agency, Grant Number) if applicable: Lilywhite Endowment 7. Project summary, description or abstract: The accuracy of four Machine Learning methods in predicting narrative macrostructure scores was compared to scores obtained by human raters utilizing a criterion-referenced progress monitoring rubric. The machine learning methods that were explored covered methods that utilized hand-engineered features, as well as those that learn directly from the raw text. The predictive models were trained on a corpus of 414 narratives from a normative sample of school-aged children (5;0-9;11) who were given a standardized measure of narrative proficiency. Performance was measure d using Quadratic Weighted Kappa, a metric of inter-rater reliability. The results indicated that one model, BERT, not only achieved significantly higher scoring accuracy than the other methods, but was consistent with scores obtained by human raters using a valid and reliable rubric. The findings from this study suggest that a machine learning method, specifically, BERT, shows promise as a way to automate the scoring of narrative macrostructure for potential use in clinical practice. 8. Brief description of collection and processing of data: Data were collected as part of the TNL norming data-base, part of a national norming sample. Audio collected during sampling were digitally recorded and transcribed in Systematic Analysis of Language Transcripts (SALT) software by trained research assistants who were blinded to the purposes of the study. Transcripts were cleaned in R to remove unwanted characters. MISL data were cleaned in excel and processed using R and Python. Expert scores were obtained by randomly selecting 50 narrative transcripts and double-scoring them on the MISl. Expert scores were produced by an expert doctoral student with more than three years of scoring experience. 9. Description of files (names, or if too numerous, number of files, file type(s): There are two csv files. AutomatedNarrativeAnalysisMIMSLData.csv contains de-identified MISL scores for 414 participants in response the the Aliens story from the Test of Narrative Language, as well as their associated de-indentified transcript and full Coh-Metrix measures. ExpertScores.csv contains the MISL double-scores for a randomly selected set of 50 narrative transcripts, which were produced by an expert doctoral student. 10. Definition of acronyms, codes, and abbreviations: ID = de-identified assigned ID number vecOfNarratives = narrative transcript Char = character score Sett = setting score IE = initating event score Plan = plan score Act = action score Con = consequence score ENP = elaborated noun phrase score Full Coh-Metrix Indices found under documentation here: http://www.cohmetrix.com/ 11. Description or definition any other unique information that would help others use your data: Expert scores included on second tab of excel sheet 12. Descriptions of parameters/variables: All MISL scores pertain to either macrostructure or microstructure elements of narratives. The macrostructure elements include: character, setting, initiating event, plan, action and consequence. The microstructure element included is ENP, which stands for elaborated noun phrase. The CohMetrix variables include a large number of surface-level and deep language features related to cohesion and connective- ness. There are also measures related to narrativity and text easability. A full description of Coh-Metrix measures is available at: http://www.cohmetrix.com/ under documentation. a. Temporal (beginning and end dates of data collection) January 2003 - November 2003 b. Instruments used and units of measurements: Monitoring Indicators of Scholarly Language (MISL) is a criterion-referenced progress- monitoring tool for school-aged narrative samples. Each element contained on the MISL is scored from 0-3, and total macrostructure scores are calculated by combining all element scores (0 meaning not present and 3 meaning mastered). Transcripts were elicited from the Aliens story subtask of Test of Narrative Language and digitally recorded before being transcribed in Systematic Analysis of Language Transcripts (SALT) software. Prior to analysis, transcripts were cleaned of unwanted characters and converted to .txt files. c. Column headings of data files (for tabular data): AutomatedNarrativeAnalysisMISLScores.csv: ID vecOfNarratives Char Sett IE Plan Act Con DESPC DESSC DESWC DESPL DESPLd DESSL DESSLd DESWLsy DESWLsyd DESWLlt DESWLltd PCNARz PCNARp PCSYNz PCSYNp PCCNCz PCCNCp PCREFz PCREFp PCDCz PCDCp PCVERBz PCVERBp PCCONNz PCCONNp PCTEMPz PCTEMPp CRFNO1 CRFAO1 CRFSO1 CRFNOa CRFAOa CRFSOa CRFCWO1 CRFCWO1d CRFCWOa CRFCWOad CRFANP1 CRFANPa LSASS1 LSASS1d LSASSp LSASSpd LSAPP1 LSAPP1d LSAGN LSAGNd LDTTRc LDTTRa LDMTLD LDVOCD CNCAll CNCCaus CNCLogic CNCADC CNCTemp CNCTempx CNCAdd CNCPos CNCNeg SMCAUSv SMCAUSvp SMINTEp SMCAUSr SMINTEr SMCAUSlsa SMCAUSwn SMTEMP SYNLE SYNNP SYNMEDpos SYNMEDwrd SYNMEDlem SYNSTRUTa SYNSTRUTt DRNP DRVP DRAP DRPP DRPVAL DRNEG DRGERUND DRINF WRDNOUN WRDVERB WRDADJ WRDADV WRDPRO WRDPRP1s WRDPRP1p WRDPRP2 WRDPRP3s WRDPRP3p WRDFRQc WRDFRQa WRDFRQmc WRDAOAc WRDFAMc WRDCNCc WRDIMGc WRDMEAc WRDPOLc WRDHYPn WRDHYPv WRDHYPnv RDFRE RDFKGL RDL2 ExpertScores.csv: ID Char Sett IE Plan Act Con vecOfNarratives d. Location/GIS Coverage (if applicable to data): e. Symbol used for missing data: NA 13. Special software required to use data: Analyses were completed in R and Python 14. Publications that cite or use this data: None have used these MISL scores 15. Was data derived from another data source? If so, what source?