Friday 1:55 p.m.–2:25 p.m.

Measuring and modeling the complexity of children's books

Jeff Elmore

Audience level:: Intermediate
Category:: Science

Description

Researchers have been modeling text difficulty for over 50 years. A variety of models have been developed, but few have focused on books for emerging readers (Grades K-2). We used Python for nearly every aspect of the project including collecting data from reading educators, analyzing text features, and creating a predictive model. Tools used include scipy, scikit-learn, PiCloud, and others.

Abstract

Introduction

Researchers have been modeling the difficulty of text for over 50 years using a variety of approaches.

There are features of text in beginning reading books that are not well modeled by existing approaches.

An extremely brief introduction to psychometrics

To predict the difficulty of text we must first establish empirical measures of difficulty. We use the Rasch model to place reading materials on a scale of difficulty that students can also be placed on using read assessments. This is called a 'conjoint measurement model.'

Collection of datasets

Consulting with experts in the field, a representative sample of early reading materials was compiled.

Empirical measures of difficulty were established on the texts in our dataset. The first measure of difficulty was established through a paired-comparisons task.

For a smaller set of texts, empirical difficulties were established using an assessment task done by a set of 1,200 first and second grade students.