Sunday 10 a.m.–1 p.m. in

Converting unstructured web data into sequenced STEM educational games

Itay Livni, Michael Wehar


Vocabulary, repetition, and examples are fundamental to human learning. These fundamental tools help humans to teach each other, communicate, and innovate. Yet, vocabulary building and reading comprehension games specifically geared for science, technology, engineering and math (STEM) disciplines are lacking. They are expensive to make and the content has a short life span. First, a topic and grade level must be chosen by the game designer. Second, an age appropriate curriculum must be developed. Third, the content must be researched and edited. And finally, the content needs to be transformed into a game by game developer(s). This process needs to be repeated for each topic. However, this process can be automated using natural language processing (NLP), the digitization of primary sourced information, and vibrant open source ecosystems. Automating this process enables educators to create STEM educational games with just four user inputs: (1) Term, (2) Topic, (3) Grade Level, (4) Game type. The corresponding output is a set of sequenced games that can be adjusted for reading comprehension levels for particular students. The process to build content for the games is built on open source packages such as beautiful soup, pandas, textacy, gensim, scikit-learn, and networkx. Client side work is done in javascript and is served by Flask.