Talks: Consistency and isolation for Python programmers

Friday - April 21st, 2023 12:15 p.m.-12:45 p.m. in 355DEF

Presented by:


Experience Level:

Some experience

Description

When you use a SQL database like Postgres, you have to understand the subtleties of isolation levels from "read committed" to "serializable". And distributed databases like MongoDB offer a range of consistency levels, from "eventually consistent" to "linearizable" and many options in between. Plus, non-experts usually confuse "isolation" with "consistency"! If we don't understand these concepts we risk losing data, or money, or worse. So what's the bottom line?

Isolation: in a simple world, your database runs on one machine and executes each request one-at-a-time. In reality, databases execute requests in parallel, leading to weird phenomena called "anomalies". To see why anomalies happen, we'll look at Python code that simulates how a database executes operations. The various isolation levels make different tradeoffs between the anomalies they allow, versus the parallelism they can achieve.

Consistency: distributed databases keep copies of your data on several machines, but these copies go out of sync. This leads to new anomalies: weird phenomena that reveal the out-of-sync data, and make your application feel like it's in a time warp. The various consistency levels make tradeoffs between anomalies versus latency. It depends how long you're willing to wait for your data changes to be synced across all the machines. Again, we'll look at a Python simulation to understand these anomalies.

You don't need to know all the names and details of every consistency and isolation level. You can refer to this handy chart. And you don't need to read all the academic papers, but I'll name four or five that are worth your time. Now, make informed decisions about consistency and isolation, and use your database with confidence!