X
Innovation

Faster than ever, Apache Cassandra 4.0 beta is on its way

The popular NoSQL database management system Apache Cassandra promises to be faster and more stable than ever in its next release.
Written by Steven Vaughan-Nichols, Senior Contributing Editor

If you want a fast database management system (DBMS), which can handle petabytes of data for web and mobile applications, chances are you're using the NoSQL Apache Cassandra database. After all, such companies as Hulu, Netflix, and Reddit, already do. Oh, it has competitors, such as MongoDB, DynamoDB, and Cosmos DB, but Cassandra's arguably the most popular DBMS of its breed.

And, with its new beta release coming out shortly, it may become more popular than ever. With the addition of Zero Copy streaming, Cassandra promises to have five-times faster data streaming between clusters. So, what does that mean in terms of real-world speed? The developers claim that will mean five-times faster Mean Time to Recovery when there are problems. This, in turn, means it will reduce your Total Cost of Ownership (TCO) because you'll need less cloud, server, and network resources. 

Its programmers also promise that this will be the most stable Apache Cassandra in history. Indeed, they recommend, you should start using Apache Cassandra 4.0 beta as soon as possible in your test and quality assurance environments. The Cassandra community is on a mission to deliver a 4.0.0 general availability release that will be production deployment ready.

It's doing this in part by hardening Cassandra against show-stopping problems. Cassandra keeps its data replicas in sync via a process called repair. Many improvements were made in this release to harden and optimize incremental repair for a faster, less resource-intensive operation to maintain consistency across data replicas. 

With over 1,000 bug fixes, improvements, and new features including replay, fuzz, property-based, fault-injection, and performance tests, this update looks better than ever.  Its developers claim, "Cassandra 4.0 redefines what users should expect from any open or closed-source database." Internally, they're already testing it at scale in the cloud on clusters as large as 200 nodes and with hundreds of real-world use-cases and schemas.

Despite all these changes, there will be no new features or breaking API changes in future builds. In other words, you can expect the time you put into the beta to translate into transitioning your production workloads to 4.0 in the near future.

This new release also includes improved cluster control with real-time audit logging and traffic replay. With audit logging, it will be much easier to ensure regulatory and security compliance with SOX, PCI, or GDPR. There is also a new full query logging tool (fqltool). This enables you to capture and replay production workloads for analysis. 
 
It also has new data center controls that will make it easier to securely manage data access on a per data center basis. For example, if you have one data center in the US and another in Europe, you will be able to configure a Cassandra role to only have access to a single data center using the new CassandraNetworkAuthorizer. 
 
You can also watch Cassandra clusters in action by Virtual Tables. You use these like any other Cassandra table. Older tools such as the Java Management Extension (JMX) with tools such as Instaclustr's Cassandra Exporter and DataStax's Metrics Collector are still supported.

Looking ahead, Cassandra's developers expect great things from Java 11's Z Garbage Collector (ZGC). This promises to reduce garbage collector pause times to no more than a few milliseconds with no latency degradation as heap sizes increase. But, they warn, this feature is still experimental and thorough testing should be performed before deploying to production. They're including it, though, because if it works as hoped, it will significantly improve the node availability profiles from cluster garbage collection.

The beta's not even publicly out yet, but open-source Cassandra's business partners are already adding support for Cassandra 4.0. These include the client driver libraries, Spring Boot and Spring Data, Quarkus, the DataStax Kafka Connector and Bulk Loader, The Last Pickle's Cassandra Reaper tool for managing repairs, Medusa for handling backup and restore, the Spark Cassandra Connector, The Definitive Guide for Apache Cassandra, and the list goes on. 
 
When the Cassandra 4 public beta is available for download, you can get it from the Apache Cassandra website. Come that day -- I estimate it will be in late June or early July if you're a Cassandra user -- I'd immediately download it. This release promises to be the best one yet.

Related Stories:

Editorial standards