Apache Cassandra 3.x and Materialized Views

Apache Cassandra
Technical

June 07, 2016
By Ben Bromhead

Overview

Apache Cassandra 3.0 introduces a new feature called materialized views. Materialized views behave like they do in other database systems, you create a table that is populated by the results of a query. Cassandra also keeps the materialized view up to date based on the data you insert into the base table. Whilst the feature itself sounds very simple, it becomes very powerful when working with a denormalized schema where you often end up writing the data multiple times in a way that will fit future reads.

By leveraging materialized views you can have some of this logic live in the database and let Cassandra keep everything up to date (in a somewhat eventually consistent manner). Materialized views also allow you to replace some of the functionality given to us by secondary index, which can be painful to manage and create performance bottlenecks. Below is a basic example of how you can use materialized views to replace a secondary index for much better performance.

This example is based on a the idea of a dating app whereby users are matched with other users based on an arbitrary algorithm (in this example its not relevant). The matches are stored in Cassandra and the users can accept or reject the matches as they see fit.

Example

First lets create our basic schema.

 CREATE KEYSPACE netflix_and_chill WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1' : 3};

1	CREATE KEYSPACE netflix_and_chill WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1' : 3};

Our application will have replication factor of 3

 CREATE TABLE IF NOT EXISTS netflix_and_chill.users (
user_id uuid,
first_name text,
last_name text,
email text,
PRIMARY KEY (user_id)
);

CREATE TABLE IF NOT EXISTS netflix_and_chill.users (

user_id uuid,

first_name text,

last_name text,

email text,

PRIMARY KEY (user_id)

);

Our base table will contain all our users and some basic information.

 CREATE TABLE IF NOT EXISTS netflix_and_chill.user_matches (
user uuid,
matched_user uuid,
state text,
year int,
month int,
day int,
PRIMARY KEY ((user, matched_user))
);

CREATE INDEX user_match_idx ON netflix_and_chill.user_matches (state);

CREATE TABLE IF NOT EXISTS netflix_and_chill.user_matches (

user uuid,

matched_user uuid,

state text,

year int,

month int,

day int,

PRIMARY KEY ((user, matched_user))

);

CREATE INDEX user_match_idx ON netflix_and_chill.user_matches (state);

This table will contain a set of matched user pairs. Normally this is generated by our magic machine learning pipeline built with spark… but for this example we’ll just use some dummy data. We will also create our secondary index so we can query the user_matches table by user_id and state. This will allow our app to display a list of ACTIVE matches to the logged in user.

 INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 'Jordan', 'Smith', 'jordan@example.com');
INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (8c3fb75c-a713-4750-b945-c51074257643, 'Charlie', 'Green', 'charlie@example.com');
INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (3bf70628-9f24-46bc-95ff-c70eb2486ea4, 'Jamie', 'Fletcher', 'jamie@example.com');
INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (25fc8d63-673d-4d43-8735-582bfe2646d6, 'Emerson', 'Jones', 'emerson@example.com');
INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (4c3fb75c-a713-4750-b945-c51074257643, 'Casey', 'Ali', 'casey@example.com');
INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (5bf70628-9f24-46bc-95ff-c70eb2476e96, 'Amari', 'Wiat', 'amari@example.com');
// Some basic matches
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 8c3fb75c-a713-4750-b945-c51074257643, 'ACTIVE', 2015, 11, 17);
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (8c3fb75c-a713-4750-b945-c51074257643, 3bf70628-9f24-46bc-95ff-c70eb2486e96, 'ACTIVE', 2015, 11, 17);
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (5bf70628-9f24-46bc-95ff-c70eb2476e96, 25fc8d63-673d-4d43-8735-582bfe2646d6, 'ACTIVE', 2015, 11, 17);
//What about this super popular person Jamie?
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 95fc8d63-673d-4d43-8735-582bfe26e6d6, 'ACTIVE', 2015, 11, 17);
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 8c3fb75c-a713-4750-b945-c51074257643, 'ACTIVE', 2015, 11, 17);
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 3bf70628-9f24-46bc-95ff-c70eb2486ea4, 'ACTIVE', 2015, 11, 17);
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 25fc8d63-673d-4d43-8735-582bfe2646d6, 'ACTIVE', 2015, 11, 18);
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 4c3fb75c-a713-4750-b945-c51074257643, 'ACTIVE', 2015, 11, 18);
INSERT INTO netflix_and_chill.user_matches (user, matched_user, state, year, month, day)
VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 5bf70628-9f24-46bc-95ff-c70eb2476e96, 'ACTIVE', 2015, 11, 18);

INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (95fc8d63-673d-4d43-8735-582bfe26e6d6, 'Jordan', 'Smith', '[email protected]');

INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (8c3fb75c-a713-4750-b945-c51074257643, 'Charlie', 'Green', '[email protected]');

INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (3bf70628-9f24-46bc-95ff-c70eb2486ea4, 'Jamie', 'Fletcher', '[email protected]');

INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (25fc8d63-673d-4d43-8735-582bfe2646d6, 'Emerson', 'Jones', '[email protected]');

INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (4c3fb75c-a713-4750-b945-c51074257643, 'Casey', 'Ali', '[email protected]');

INSERT INTO netflix_and_chill.users (user_id, first_name, last_name, email) VALUES (5bf70628-9f24-46bc-95ff-c70eb2476e96, 'Amari', 'Wiat', '[email protected]');

// Some basic matches