Introduction
While pondering which database I wanted to use, I had a good conversation with a few friends about designing a chat system. There were various options like MySQL, MongoDB, Postgres, and Cassandra. During the discussion, one thing led to another, and we started discussing the CAP theorem.
After the discussion, we were left wondering:
This hypothetical scenario beckons us to reassess our understanding of distributed systems within a landscape fortified with High Availability (HA) clusters, robust replication, and advanced sharding techniques. What happens then? How does the CAP theorem work in this case?
This article aims to record the research process and discoveries in response to the question.
Before we begin, let’s do a quick database refresher.
There are two types of databases:
- Relational databases
- Organize data into tables with predefined relationships between them.
- Use Structured Query Language (SQL) to query and manage data.
- Examples: MySQL, PostgreSQL, Microsoft SQL Server.
2. NoSQL Databases
- Designed to handle unstructured or semi-structured data.
- Do not require fixed table schemas. Provide flexible data models.
- Examples: MongoDB, Cassandra, Redis.
You might be thinking, where does a Graph database like Neo4J fall? They generally fall under the NoSQL category.
All these databases can be categorized into two sets based on their characteristics and behavior. These sets include:
- ACID-compliant database
- Atomicity: All or nothing.
- Consistency: Database is always in a valid state.
- Isolation: Transactions don’t interfere with each other.
- Durability: Committed transactions are saved.
2. BASE-compliant database
- Basically available: System is available for operations, possibly with reduced guarantees.
- Soft state: State may change over time.
- Eventual consistency: System will become consistent eventually.
All right, this should be enough to help you understand the following parts of the article.
Let’s get started.
CAP Theorem: What Does It Mean?
CAP theorem is a fundamental concept that guides architects and developers in making critical decisions about system behavior. The theorem asserts that in the presence of network partitions (P), a distributed system must choose between providing either Consistency (C) or Availability (A). However, it’s important to note that the system will always maintain Partition Tolerance (P).
Let’s look into each of these further.
Consistency (‘C’)
Consistency in the context of databases ensures that any read operation on the system will reflect the most recent write operation. No matter how many nodes a system spans, all nodes present the same data view at any given time. In other words, consistency guarantees that once a write is acknowledged, any subsequent read will reflect that write.
Availability (‘A’)
Availability means that every request for data receives a response. In an available system, even if some nodes fail, the system will always (at some capacity) respond, albeit possibly with stale data. This concept is crucial in systems where uptime and responsiveness are paramount.
Partition Tolerance (‘P’)
Partition Tolerance is the ability of a system to function even when there are network partitions, meaning some nodes are temporarily unreachable or there’s a network delay. In distributed systems, network partitions are an eventuality rather than an exception.
The system can continue functioning even if certain network parts become unreachable or experience delays. This is a critical property, as network partitions are inevitable in real-world scenarios due to factors like hardware failures, network congestion, or geographic distance.
Imagine a widely used distributed blogging platform with servers distributed across multiple data centers globally, as shown in the image below. This setup ensures high availability and responsiveness for users worldwide.
Now, let’s say there’s an unexpected network issue at SYD region, possibly due to a regional internet outage. This results in a partition, where some servers in one region lose connection with the rest of the network.
Despite the partition, the blogging platform aims to remain available. Users from the USA who are accessing the platform from unaffected regions should still be able to read existing blogs, write new ones, and interact with the platform’s features. This is what partition toleration means.
Let’s look at the following example:
A user from the USA updated the title of a blog from blog to blog+1. The user in SYD will not see it immediately reflected in the title section due to the network failure. This is what we call a partition.
However, users in SYD will still see the blog title as blog even if it is not consistent. The database system would then have to implement techniques like replication, sharding, and eventual consistency to converge from a partition. After the network partition is resolved, the system must reconcile any conflicting updates that may have occurred during the partition.
For example, if two users in different partitions edited the same blog simultaneously, the system must determine which version to keep. Here, the database needs to decide which one to keep. This is where techniques like LWW (Last Writer Wins) strategy. When a partition occurs, each partition continues to operate independently with its own data set, as we mentioned above.
PACELC Theorem
Now, we clearly understand the CAP theorem and its motivation.
However, in distributed systems, we cannot avoid partition. Hence, a distributed system must choose between consistency (C) and availability (A), which are both important.
The ACID-type database chooses consistency (C), which means if data is not consistent, no response is provided.
The BASE database chooses availability and compromises consistency. This means it responds with local data that might not be consistent.
But what if there is no partition, i.e., the network is stable, and all nodes in the distributed system can communicate without any interruptions or delays? In this scenario, the system can move between latency (‘L’) and consistency (‘C’). This movement is what we call a PACELC theorem. Daniel Abadi, a professor of computer science at the University of Maryland, coined this term in this blog.
I recommend reading this blog as well. It’s definitely worth it. Now, let’s carry on.
PACELC states the following:
- If there is a partition (‘P’): The system can strike a balance between availability (‘A’) and consistency (‘C’) — a trade-off between responsiveness and ensuring that data remains coherent.
- Else (‘E’): In the absence of a partition, the system can opt for a trade-off between latency (‘L’) and consistency (‘C’) — a choice between speed and data integrity.
Choosing the Right Trade-Off: PACELC
Different types of databases lean towards specific trade-offs outlined by the CAP theorem. Here’s a breakdown of how other databases align with the CAP theorem:
- MongoDB
- MongoDB is known for being a CP/EC database.
- It emphasizes Consistency (C) and Partition Tolerance (P), even in the presence of a partition, but if there is no partition, then it emphasizes Consistency over Latency.
2. DynamoDB
- DynamoDB is also a PA/EL database.
- It opts for Availability (A) over Consistency (C) in the presence of a partition (P) and Consistency (C) over Latency (L) when there’s no partition.
3. Cassandra
- Cassandra falls into the PA/EL category.
- It prioritizes Availability (A) over Consistency (C) when there is a partition (P) and Consistency over latency (L) when there isn’t.
4. PostgreSQL
- It is known for being a CP/EC database.
- It emphasizes Consistency (C) and Partition Tolerance (P), even in the presence of a partition, but Consistency over Latency when there is no partition.
Professor Abadi states:
“A fully ACID compliant system refuses to give up consistency and will pay the Availability and Latency costs to achieve it.”
Conclusion
It’s important to note that these categorizations are broad, and databases can be configured and used in ways that may not strictly adhere to these categories. Additionally, advancements in database technologies and strategies continue to push the boundaries of these classifications.
So, while these general trends can be a helpful guide, always consider your application’s specific needs and constraints when choosing a database.
Understanding the CAP/PACELC theorem and its implications is crucial for designing robust distributed systems. Different databases make trade-offs based on their intended use cases, design philosophies, implementation, and architecture.
By knowing the strengths and trade-offs of various databases, architects and developers can decide which database is best suited for their specific applications and requirements.
Remember, there’s no one-size-fits-all solution, and choosing the right database always depends on the specific needs of your system.
Now that you’re familiar with the CAP/PACELC theorem, it’s equally important to realize that categorizing a database as strictly CP, AP, or CA is an oversimplification. The trade-off between Consistency (C), Availability (A), and Partition tolerance (P) is influenced by various factors, including the specific database, its configuration, and a range of other considerations that go beyond a simple label.
Martin Kleppmann (Author of DDIA) provides an insightful perspective on this in his article, Please Stop Calling Databases CP or AP.
Thank you for taking the time to read this article. If you found it informative, please leave a comment as a gesture of support.
When CAP Is Not Enough: Understanding PACELC in Distributed Databases was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.