PyCon 2022

Talks: Enhancing Code Collaboration with Conflict-Free Replicated Data Types

Presented by:

Description

Jupyter Notebook may be one of the most controversial open source projects released in the last ten years! Love them or hate them, they’ve become a mainstay of data science and machine learning, and a significant part of the Python ecosystem. While Jupyter can simplify experimentation, rapid prototyping, documentation, and visualization, it often impedes version control, code review, and test coverage. Dev teams must accept the good with the bad… but what if they didn’t have to? In this talk we introduce conflict-free replicated data types (CRDT), a special object that supports strong consistency, and which can be used to enhance Jupyter notebooks for a truly collaborative experience.

First proposed by Shapiro et al in 2011 conflict-free replicated data types (CRDTs) evolved out of the Distributed Systems community for replication of data across a network of replicas. CRDTs are objects that come with a special guarantee — namely, that two different copies of that object can be strongly consistent, meaning they can be kept in sync. While CRDTs have enjoyed a good amount of attention from academia over the last years, primarily amongst database and cloud researchers, they have not led to many practical applications for everyday developers. However, recent work by Kleppmann et al shows CRDTs can be used for real-time rich-text collaboration — creating a “Google doc”-type experience with any document in a networked file system.

In this talk, we’ll present the basics of CRDTs and demonstrate how they work with examples written in Python. Next, we’ll explain how CRDTs can enable more collaborative Jupyter notebooks, opening up features such as synchronous insertions, diffs, and auto-merges, even with multiple simultaneous contributors!