Biz & IT —

Power tools: Sorting through the crowded specialized database toolbox

With so many choices today, matching database to need isn't getting any easier.

Choosing a database is pretty similar—it's all about the right fit.
Choosing a database is pretty similar—it's all about the right fit.

When you think of game development, the first thing that comes to mind probably isn't a database. But in the world of Jamaa, the setting for WildWorks' massively multiplayer online kids' game Animal Jam, a database keeps millions of cartoon animal characters frolicking and the cartoon trees from crashing down. The database chosen for this job was a specialized, non-relational database from Basho called Riak—one among the herd of new databases that have risen to handle Web-generated gluts of non-structured data.

The database landscape is increasingly complicated. As of April, Solid IT's DB-Engines initiative was tracking 303 separate relational and non-relational databases. In the golden years of relational databases, benchmarks such as TPC could theoretically give you some sort of way to compare databases directly. But today, it's difficult to assign a one-size-fits-all measurement to the world of non-relational databases such as Riak and Apache Cassandra (the distributed database project originally developed at Facebook). WildWorks ran its benchmarks and decided on Riak for Animal Jam, and Uber did the same for its dispatch platform. IoT car tech company VCARO decided the exact opposite: Cassandra beat Riak at handling vehicle-generated sensor data. Software company Nuance Communications opted for something else entirely, choosing Couchbase for handling speech and imaging apps.

The "why" of decisions like these are as complicated as the database technologies themselves. It may hinge on which two of three CAP theorem guarantees—consistency, availability, and partitionability—a business values most. The tipping point could alternatively be which database handles software containers or what skills you already have on hand. This list of factors is seemingly infinite.

Why WildWorks runs on Riak

The name creation page for WildWorks' Animal Jam, an MMO targeted at kids to teach them about wildlife.
Enlarge / The name creation page for WildWorks' Animal Jam, an MMO targeted at kids to teach them about wildlife.

WildWorks (originally called Smart Bomb Interactive) launched Animal Jam in 2010 through a partnership with the National Geographic Society. Since then, it has become the fastest growing gaming site in the US. Beau Brewer, Web director and software architect, says they're supporting 50,000 concurrent players at any given time.

WildWorks' tale is like many online businesses: it started out on the desktop, but it got to the point where the company had to have a mobile presence. "That's a duh," Brewer says. "Everybody was going mobile."

WildWorks developed a version of Animal Jam called Play Wild for Android and iOS mobile operating systems. While developing, the company realized it had a captive audience of millions. WildWorks didn't want to lose any of the herd by forcing it to cross bridges while migrating to mobile. It would be better to see players who already had accounts just pick up a device and seamlessly start playing.

In other words, WildWorks needed a single login for both the online and mobile games. This setup also needed to scale, big time. "We'd never played in the mobile space," Brewer says. "I knew it could start a wildfire and just go off."

Besides single sign-on and the potential to scale like mad, Brewer says that the company chose Riak because it's...

  • Fault tolerant.
  • Written in Erlang, a general-purpose language that supports concurrency, distribution, and fault tolerance. Arguably, it requires more discipline to learn and hence requires more experienced developers, Brewer says. That's a plus in his book. "This helps preserve the quality of code and stability of Riak, in my opinion," he says.
  • HTTP API and native Erlang interface. Creating an API-centric app helps build functionality that can be used in any device, be it browser, mobile, tablet, or even desktop.
  • The option of using Solr to search/index data. Solr, pronounced "solar," is an open source enterprise search platform written in Java that is said to be "blazing-fast."

By contrast, this is why WildWorks passed on DataStax Enterprise's Apache Cassandra:

  • Cassandra splits a hash ring across a cluster, but it's still a single ring, Brewer says. With Riak, you have independent clusters that act as separate individuals connected by a kind of pipeline. With Riak, WildWorks could slam one replication with searches and it wouldn't affect the actual operating cluster. Imagine a player trying to log into Animal Jam, hitting on the primary cluster, while a business analyst queries how many active users there are. She could run on a separate cluster with Riak and not affect the live player cluster. That's opposed to Cassandra, which in WildWorks' testing...
  • Allowed you to write if it experiences failure but potentially causes reads to fail. Riak always performs reads and writes, Brewer says. Those writes might not be the most up to date, but WildWorks can live with that.

Why VCARO opted for Cassandra

VCARO is on the car side of the Internet of Things (IoT). The company deals with tons of sensor data, such as engine diagnostics, teens' driving behavior, and real-time vehicle location.

It never had to migrate from a traditional database; the company started out on DataStax Enterprise Cassandra. VCARO was lucky to already have people with Cassandra skills in place, which helped push the decision. It's important, after all—you have to have the resources to implement Cassandra correctly, and most don't know how to properly set up a data schema, according to VCARO CIO Zach Altneu.

He says that VCARO went through an extensive selection process; Riak was the closest competitor. The company also looked at the more "obvious" solutions (i.e., relational databases) including Oracle, VoltDB, and Amazon Redshift. VCARO even evaluated specialized data management platforms like Google Cloud Bigtable and Amazon DynamoDB.

Ultimately, much of the decision centered on cost. Cassandra is open source, so it costs very little. VCARO uses Digital Ocean, a cloud hosting company for lightweight Linux boxes. "You can set up a Cassandra cluster for virtually nothing if you wanted to," Altneu says. The company pays less than $1,000/month. A similar system through Amazon would have cost $10,000/month, he said, and "we have faster performance."

Of course, your mileage may vary. VCARO benchmarked performance with one specific system: a straight Cassandra cluster. In this particular use case, the company saw performance that was five to six times faster.

The database also had to handle big data volume and VCARO's time-sensitive, unstructured, voluminous data flowing from sensors. That pointed to a nontraditional database that could scale linearly and could house billions of rows of sensor data.

One of the things VCARO particularly liked about Cassandra was integration of tools like the Apache Spark open source cluster computing framework and Apache Solr search. DataStax even started adding support for Docker: a software containers system that creates a virtualized Linux OS that runs across APIs and allows you to create sandboxed servers. You can run as many as you want on bare metal, Altneu says, and you can ship the Docker containers anywhere that supports Docker.

VCARO didn't want a database-as-a-service (DBaaS) model, such as Google's or Amazon's. DBaaS vendors set up the database, configure the software, and maintain it, including all operational activities. "For us, that was a negative," Altneu says. "We wanted to have full control over the entire operation."

Other options didn't hold up under examination for other reasons:

  • VoltDB is a distributed, in-memory, massively parallel "NewSQL" relational database that makes SQL scream. But while it's fast, it has memory limitations, Altneu said. And VCARO found it didn't have a lot of options when it came to getting data to persist on the disk. He said you can do it, but that's not what it's meant for. That can be a problem when you really do want to keep data around.
  • VCARO liked what Basho Riak had to offer: a "pure implementation of Amazon's Dynamo design," Altneu says. But the company preferred DataStax's Cassandra implementation. It had all you got with Riak, plus community support.
  • VCARO does use MongoDB to store things like user profiles and things that don't have high volume. That boils down to convenience and MongoDB's famous ease of use. VCARO could store such things in Cassandra, but some of their contract developers just don't have the skills to work with the database.

Channel Ars Technica