distributed systems · 2007
Amazon's Dynamo
A distributed-systems milestone from 2007 whose ideas keep shaping modern data infrastructure.

Paper club · est. 2009 · 30 cities
From Codd's relational model (1970) to vector databases and RAG (2026). Twenty-nine foundational papers, annotated for the modern engineer. One archive, fifty-six years.

Featured paper
distributed systems · 2007
A distributed-systems milestone from 2007 whose ideas keep shaping modern data infrastructure.

Latest articles
The distributed key-value store has evolved from a pragmatic response to web-scale failures into the foundational layer for global data platforms. Early designs accepted relaxed consistency to survive partitions, yet pro...
12 min read
Distributed systems are the invisible backbone of modern technology, powering everything from your favorite social media app to global financial markets. Yet, truly understanding them — their complexities, trade-offs, an...
19 min read
We're thrilled today to host Dr. Margaret Hollis, a distinguished independent database historian, to discuss the profound evolution of database systems over the last five decades. From the foundational theories of the re...
14 min read
Dr. Aiden Vasquez is a senior distributed-systems engineer with over 20 years of experience in building consensus systems. Formerly an engineer at a Spanner-like project, Dr. Vasquez is now the CTO of ConsensusLabs, wher...
11 min read
Distributed-database research papers are treasure maps drawn in jargon. Without a shared vocabulary, the same term can mean different things to different systems—CAP theorem's 'consistency' is not the same in Dynamo's [e...
10 min read
NoSQL systems continue to underpin the data layer for nearly every large-scale service, yet the foundational papers that defined their trade-offs are often read in isolation. In 2026 the same questions of partition toler...
12 min read
Amazon's Dynamo paper introduced the industry to eventual consistency and consistent hashing as core principles for building distributed key-value stores that prioritize availability and partition tolerance. The system w...
18 min read
In 2026 the boundary between retrieval and generation has collapsed for production AI systems. Backend teams now treat vector search not as an add-on but as a first-class storage and indexing problem that must satisfy th...
12 min read
Browse by topic
Classic papers
7 papers
Foundational works that defined how we think about data and distributed computation — Codd, Lamport, Gray, Brewer.
Distributed systems
11 papers
Coordination, replication, consensus, and the practical engineering of internet-scale services.
AI & databases
0 papers
Vector databases, RAG, embeddings — how LLMs are reshaping distributed data systems.
From the archive
A NoSQL milestone from 2011 whose ideas keep shaping modern data infrastructure.
A engineering milestone from 2011 whose ideas keep shaping modern data infrastructure.
A NoSQL milestone from 2010 whose ideas keep shaping modern data infrastructure.
A NoSQL milestone from 2009 whose ideas keep shaping modern data infrastructure.
A production milestone from 2008 whose ideas keep shaping modern data infrastructure.
A production milestone from 2008 whose ideas keep shaping modern data infrastructure.
Nosqlsummer began as an ambitious and intellectually rigorous reading club that ran from 2009 to 2013. Spanning over 30 cities worldwide, it brought together enthusiasts and professionals in the field of database systems to explore some of the most foundational academic papers. The club operated on a weekly format, allowing participants to dive deep into one paper per session. This structure not only facilitated a thorough understanding of each paper but also promoted vibrant discussions and exchange of ideas.
The selection of these papers in 2009 was influenced by the burgeoning wave of NoSQL databases. At the time, the industry was witnessing seismic shifts with the publication of pivotal papers such as Google's BigTable (2006), Amazon's Dynamo (2007), and Apache's Cassandra (2008). These papers laid the groundwork for a new paradigm in data storage and retrieval, challenging the long-standing dominance of relational databases.
The club functioned through local meetups, supported by shared reading lists managed via Google Groups. Each session typically spanned two hours, allowing members to dissect and debate the intricacies of the chosen paper. As the NoSQL hype cycle reached maturity, the club gradually wound down around 2013. The demand for such focused academic exploration waned as NoSQL technologies became more mainstream and integrated into everyday use. For those interested in a more detailed account of the club's history, you can read the full history on the about page.
The 29 papers curated by nosqlsummer represent a carefully considered cross-section of database and distributed systems literature. The editorial selection spans a timeline from 1970 to 2011, covering a broad scope from the foundational principles of relational theory to the evolution of distributed systems, the rise of NoSQL, and the nascent stages of AI foundations. This breadth ensures that readers can appreciate the historical and technical contexts that have shaped modern database systems.
Among the seminal works, Codd's 1970 paper on the relational model stands as a cornerstone of database theory, introducing concepts that revolutionized how data was structured and queried. Lamport's 1978 work on causality provided profound insights into the ordering of events in distributed systems, while Gray's 1981 exploration of the transaction concept laid critical groundwork for understanding database consistency and concurrency.
The inclusion of the Dynamo paper (2007) marks a watershed moment in the collection, as it encapsulates the shift from traditional ACID properties to the more flexible BASE properties that underpin many NoSQL systems. Reading these papers in sequence allows one to trace the rationale behind these shifts — why Dynamo broke from ACID, the emergence of BASE, and the enduring relevance of the CAP theorem in distributed systems. The Amazon Dynamo paper defined the decade, illustrating the trade-offs necessary to achieve high availability and partition tolerance in large-scale distributed systems.

As we step into 2026, the landscape of distributed databases continues to evolve, adapting to new technological challenges and opportunities. Among these, vector databases and Retrieval-Augmented Generation (RAG) represent the cutting-edge of distributed systems challenges. These technologies are crucial in managing and querying high-dimensional data, which is increasingly vital for machine learning applications.
The integration of LLM inference layers atop distributed stores exemplifies the convergence of AI and database technologies. This synergy facilitates more intelligent data retrieval and processing, enabling sophisticated applications across a range of industries.
In this context, the disaggregated storage model, exemplified by platforms like Snowflake and Databricks, gains prominence. By separating compute from storage, these systems offer unparalleled flexibility and scalability, addressing the demands of modern data workloads.
The relational model, too, has evolved to absorb the innovations of the NoSQL wave, as seen in the development of Postgres extensions like pgvector and pg_embedding. These extensions demonstrate how the foundational principles of relational databases continue to adapt and thrive in the face of new data paradigms.
Tracing the thread from Dynamo through to DynamoDB, cloud-native architectures, serverless computing, and now vector indexes, we see a continuum of trade-offs and innovations that redefine the form factors of distributed databases. The CAP theorem remains relevant, particularly in vector stores where the balance between consistency and availability is pivotal in approximate nearest neighbor search.
Our article on vector databases and RAG covers this evolution in detail, highlighting how the principles that guided early NoSQL developments continue to influence the database technologies of today.
Navigating nosqlsummer.org is designed to be intuitive and enriching for both newcomers and seasoned veterans of database technology. The site is structured into three primary sections: Papers, Blog, and Categories. The Papers section serves as an archive of the 29 foundational papers that shaped the NoSQL movement and beyond. The Blog offers eight insightful essays that delve deeper into specific topics and current trends in distributed databases. Meanwhile, the Categories section organizes content into six key topics, making it easy to find material relevant to your interests.
For junior engineers, the site provides a structured entry point into the world of distributed systems. Starting with the Dynamo paper, moving through the CAP theorem, and exploring the concepts of eventual consistency, one can build a solid foundation in understanding the trade-offs and design principles of modern databases.
For machine learning engineers, the site offers a different path, beginning with an understanding of LSM-trees and CRDTs, and advancing towards the latest innovations in vector databases. This progression equips ML engineers with the knowledge to integrate advanced database technologies into machine learning workflows effectively.
We invite you to browse all 29 papers to begin your journey through the rich and evolving landscape of distributed databases and NoSQL systems.