Change the future

Friday 1:55 p.m.–2:25 p.m.

Measuring and modeling the complexity of children's books

Jeff Elmore

Audience level:


Researchers have been modeling text difficulty for over 50 years. A variety of models have been developed, but few have focused on books for emerging readers (Grades K-2). We used Python for nearly every aspect of the project including collecting data from reading educators, analyzing text features, and creating a predictive model. Tools used include scipy, scikit-learn, PiCloud, and others.



Researchers have been modeling the difficulty of text for over 50 years using a variety of approaches.

There are features of text in beginning reading books that are not well modeled by existing approaches.

An extremely brief introduction to psychometrics

To predict the difficulty of text we must first establish empirical measures of difficulty. We use the Rasch model to place reading materials on a scale of difficulty that students can also be placed on using read assessments. This is called a 'conjoint measurement model.'

Collection of datasets

Consulting with experts in the field, a representative sample of early reading materials was compiled.

Empirical measures of difficulty were established on the texts in our dataset. The first measure of difficulty was established through a paired-comparisons task.

For a smaller set of texts, empirical difficulties were established using an assessment task done by a set of 1,200 first and second grade students.

Feature development

Based on previous research in the field and consulting with reader experts we developed a set of 166 unique quantifiable text features.

Features were developed to address these unique aspects of beginning reading books:

  • highly decodable words
  • low density of content and information
  • repeated syntactic patterns
  • formatting features

Variable selection and model creation using Random Forest Regression

Using an iterative process we reduced the set of variables down to 12 variables

Because of the large number of models evaluated, we employed PiCloud to speed up the process.

Model tuning and evaluation

Using this reduced variable set, we achieved high correlations in predicting empirically derived measures of text difficulty