Paper club · est. 2009 · 30 cities

The papers that built distributed data — read carefully.

From Codd's relational model (1970) to vector databases and RAG (2026). Twenty-nine foundational papers, annotated for the modern engineer. One archive, fifty-six years.

Papers 29Earliest 1970Latest read 2026Cities 30

nosqlsummer — distributed databases paper club network visualization

This week's read

All 29 papers

distributed systems · 2007

Amazon's Dynamo

by DeCandia et al. (Amazon)

A distributed-systems milestone from 2007 whose ideas keep shaping modern data infrastructure.

Year 2007Category distributed systems

From the blog

All articles

Jul 9, 2026ai and databases

AI Agents and Distributed Consensus in 2026

Multi-agent AI systems face the same coordination problem distributed databases solved decades ago: how do independent processes agree on shared state without a single point of failure? This article traces the line from Lamport's Paxos to today's age

13 min read

Jul 9, 2026case studies

A Practical Guide to Database Migration Patterns at Scale

Migrating a production database — relational to NoSQL, or between NoSQL engines — is one of the riskiest operations in distributed systems engineering. This guide walks through the dual-write, backfill, and shadow-traffi...

10 min read

Jul 9, 2026distributed systems

Event Sourcing and CQRS for Distributed Systems in 2026

Event sourcing and CQRS aren't new — they're a direct descendant of the transaction-log thinking formalized in Jim Gray's Transaction Concept paper. This article explains why append-only event logs and read/write model separation remain the cleanest

12 min read

Jul 9, 2026case studies

Interview: Chaos Engineering for Distributed Databases

We interviewed a fictional staff site reliability engineer who has run chaos engineering programs at two Fortune 500 fintechs. This interview covers how deliberate fault injection tests the same trade-offs formalized by t...

12 min read

Jul 9, 2026modern nosql

Interview: A Sharding Architect on Scaling Databases in 2026

We sat down with a fictional principal database engineer who has resharded production systems three times without a customer-facing outage. This interview covers the practical decisions behind sharding strategy, the anti...

12 min read

Jul 9, 2026modern nosql

Kafka and Streaming Architectures for Distributed Databases

Event streaming platforms like Kafka have quietly become the connective tissue between distributed databases, turning point-to-point replication into a durable, replayable log. This article traces how streaming architectures apply — and complicate —

13 min read

Browse by topic

Featured categories

All categories

Classic papers

7 papers

Foundational works that defined how we think about data and distributed computation — Codd, Lamport, Gray, Brewer.

Distributed systems

11 papers

Coordination, replication, consensus, and the practical engineering of internet-scale services.

AI & databases

0 papers

Vector databases, RAG, embeddings — how LLMs are reshaping distributed data systems.

From the archive

Recently re-read

All 29 papers

#27

2011modern nosql

The Graph Traversal Pattern

Marko A. Rodriguez & Peter Neubauer

A NoSQL milestone from 2011 whose ideas keep shaping modern data infrastructure.

#29

2011tutorials

CRDTs: Consistency without concurrency control

Shapiro, Preguiça, Baquero & Zawirski (INRIA)

A engineering milestone from 2011 whose ideas keep shaping modern data infrastructure.

#27

2010modern nosql

Benchmarking Cloud Serving Systems with YCSB

Cooper et al. (Yahoo!)

A NoSQL milestone from 2010 whose ideas keep shaping modern data infrastructure.

#26

2009modern nosql

Cassandra - A Decentralized Structured Storage System

Lakshman & Malik (Facebook)

A NoSQL milestone from 2009 whose ideas keep shaping modern data infrastructure.

#23

2008case studies

BASE: an Acid Alternative

Dan Pritchett (eBay)

A production milestone from 2008 whose ideas keep shaping modern data infrastructure.

#24

2008case studies

Eventually Consistent

Werner Vogels (Amazon CTO)

A production milestone from 2008 whose ideas keep shaping modern data infrastructure.

What is nosqlsummer?

Nosqlsummer began as an ambitious and intellectually rigorous reading club that ran from 2009 to 2013. Spanning over 30 cities worldwide, it brought together enthusiasts and professionals in the field of database systems to explore some of the most foundational academic papers. The club operated on a weekly format, allowing participants to dive deep into one paper per session. This structure not only facilitated a thorough understanding of each paper but also promoted vibrant discussions and exchange of ideas.

The selection of these papers in 2009 was influenced by the burgeoning wave of NoSQL databases. At the time, the industry was witnessing seismic shifts with the publication of pivotal papers such as Google's BigTable (2006), Amazon's Dynamo (2007), and Apache's Cassandra (2008). These papers laid the groundwork for a new paradigm in data storage and retrieval, challenging the long-standing dominance of relational databases.

The club functioned through local meetups, supported by shared reading lists managed via Google Groups. Each session typically spanned two hours, allowing members to dissect and debate the intricacies of the chosen paper. As the NoSQL hype cycle reached maturity, the club gradually wound down around 2013. The demand for such focused academic exploration waned as NoSQL technologies became more mainstream and integrated into everyday use. For those interested in a more detailed account of the club's history, you can read the full history on the about page.

Why these 29 papers

The 29 papers curated by nosqlsummer represent a carefully considered cross-section of database and distributed systems literature. The editorial selection spans a timeline from 1970 to 2011, covering a broad scope from the foundational principles of relational theory to the evolution of distributed systems, the rise of NoSQL, and the nascent stages of AI foundations. This breadth ensures that readers can appreciate the historical and technical contexts that have shaped modern database systems.

Among the seminal works, Codd's 1970 paper on the relational model stands as a cornerstone of database theory, introducing concepts that revolutionized how data was structured and queried. Lamport's 1978 work on causality provided profound insights into the ordering of events in distributed systems, while Gray's 1981 exploration of the transaction concept laid critical groundwork for understanding database consistency and concurrency.

The inclusion of the Dynamo paper (2007) marks a watershed moment in the collection, as it encapsulates the shift from traditional ACID properties to the more flexible BASE properties that underpin many NoSQL systems. Reading these papers in sequence allows one to trace the rationale behind these shifts — why Dynamo broke from ACID, the emergence of BASE, and the enduring relevance of the CAP theorem in distributed systems. The Amazon Dynamo paper defined the decade, illustrating the trade-offs necessary to achieve high availability and partition tolerance in large-scale distributed systems.

Mathematical notation from distributed systems papers, academic notebook aesthetic

Distributed databases in 2026 — what changed

As we step into 2026, the landscape of distributed databases continues to evolve, adapting to new technological challenges and opportunities. Among these, vector databases and Retrieval-Augmented Generation (RAG) represent the cutting-edge of distributed systems challenges. These technologies are crucial in managing and querying high-dimensional data, which is increasingly vital for machine learning applications.

The integration of LLM inference layers atop distributed stores exemplifies the convergence of AI and database technologies. This synergy facilitates more intelligent data retrieval and processing, enabling sophisticated applications across a range of industries.

In this context, the disaggregated storage model, exemplified by platforms like Snowflake and Databricks, gains prominence. By separating compute from storage, these systems offer unparalleled flexibility and scalability, addressing the demands of modern data workloads.

The relational model, too, has evolved to absorb the innovations of the NoSQL wave, as seen in the development of Postgres extensions like pgvector and pg_embedding. These extensions demonstrate how the foundational principles of relational databases continue to adapt and thrive in the face of new data paradigms.

Tracing the thread from Dynamo through to DynamoDB, cloud-native architectures, serverless computing, and now vector indexes, we see a continuum of trade-offs and innovations that redefine the form factors of distributed databases. The CAP theorem remains relevant, particularly in vector stores where the balance between consistency and availability is pivotal in approximate nearest neighbor search.

Our article on vector databases and RAG covers this evolution in detail, highlighting how the principles that guided early NoSQL developments continue to influence the database technologies of today.

How to use this site

Navigating nosqlsummer.org is designed to be intuitive and enriching for both newcomers and seasoned veterans of database technology. The site is structured into three primary sections: Papers, Blog, and Categories. The Papers section serves as an archive of the 29 foundational papers that shaped the NoSQL movement and beyond. The Blog offers insightful essays that delve deeper into specific topics and current trends in distributed databases. Meanwhile, the Categories section organizes content into six key topics, making it easy to find material relevant to your interests.

Reader profile	Suggested starting point	Then move to
Junior / early-career engineer	Amazon Dynamo	CAP theorem → eventual consistency
Machine-learning engineer	LSM-trees	CRDTs → vector databases & RAG
Backend / platform architect	Codd's relational model	Consensus papers → NewSQL

For junior engineers, the site provides a structured entry point into the world of distributed systems. Starting with the Dynamo paper, moving through the CAP theorem, and exploring the concepts of eventual consistency, one can build a solid foundation in understanding the trade-offs and design principles of modern databases.

For machine learning engineers, the site offers a different path, beginning with an understanding of LSM-trees and CRDTs, and advancing towards the latest innovations in vector databases. This progression equips ML engineers with the knowledge to integrate advanced database technologies into machine learning workflows effectively.

Key takeaway: There is no single correct reading order — the papers are cross-linked so you can follow the thread that matches your own background instead of the club's original 2009 chronology.

We invite you to browse all 29 papers to begin your journey through the rich and evolving landscape of distributed databases and NoSQL systems.