Learn what you need to consider when moving from the world of relational databases to a NoSQL document store.
Hear from Developer Advocate Glynn Bird as he explains the key differences between relational databases and JSON document stores like Cloudant, as well as how to dodge the pitfalls of migrating from a relational database to NoSQL.
SQL To NoSQL - Top 6 Questions Before Making The Move
1. SQL to NoSQL: Top 6 Questions
Glynn Bird
Developer Advocate @ IBM
@glynn_bird
2. Agenda
2
• Top 6 Questions When Moving to NoSQL
1. Why NoSQL?
2. Rows and Tables Become ... What?
3. Will I Have to Rebuild My App?
4. How do I query data?
5. What's _rev?
6. Does it replicate?
• Live Q&A
4. But, What Is NoSQL, Really?
4
• Umbrella term for databases using non-SQL query languages
• Key-Value stores
• Column-family stores
• Document stores
• Graph stores
• Some also say "non-relational," because data is not decomposed
into separate tables, rows, and columns
• It’s still possible to represent relationships in NoSQL
• The question is, are these relationships always necessary?
5. NoSQL Document Stores
5
• That's databases like MongoDB, Apache CouchDB™, Cloudant,
and Dynamo
• Optimized for "semi-structured" or "schema-optional" data
• People say "unstructured," but that's inaccurate
• Each document has its own structure
7. Schema Flexibility
7
• Cloudant uses JavaScript Object Notation (JSON) as its data format
• Cloudant is based on Apache CouchDB. In both systems, a "database" is simply
a collection of JSON documents
{
"docs": [
{
"_id": "df8cecd9809662d08eb853989a5ca2f2",
"_rev": "1-
8522c9a1d9570566d96b7f7171623270",
"Movie_runtime": 162,
"Movie_rating": "PG-13",
"Person_name": "Zoe Saldana",
"Actor_actor_id": "0757855",
"Movie_genre": "AVYS",
"Movie_name": "Avatar",
"Actor_movie_id": "0499549",
"Movie_earnings_rank": "1",
"Person_pob": "New Jersey, USA",
"Person_id": "0757855",
"Movie_id": "0499549",
"Movie_year": 2009,
"Person_dob": "1978-06-19"
}
]
}
8. The Cloudant Data Layer
8
• Distributed NoSQL data persistence
layer
• Available as a fully-managed DBaaS,
or managed by you on-premises
• Transactional JSON document
database with REST API
• Spreads data across data centers &
devices for scale & high availability
• Ideal for apps that require:
• Massive, elastic scalability
• High availability
• Geo-location services
• Full-text search
• Offline-first design for occasionally
connected users
9. Not One DB Server; a Cluster of Servers
• A Cloudant cluster
• Horizontal scale
• Redundant load balancers
backed by multiple DB servers
• Designed for durability
• Saves multiple copies of data
• Spreads copies across cluster
• All replicas do reads & writes
• Access Cloudant over the Web
• Developers get an API
• Cloudant manages it all
behind the scenes
9
10. Horizontal Scaling
• Shard across many commodity servers vs. few expensive ones
• Performance improves linearly with cost, not exponentially
10
12. ... This!
SQL Terms/Concepts
database -->
table -->
row -->
column -->
materialized view -->
primary key -->
table JOIN operations -->
Document Store Terms/Concepts
database
bunch of documents
document
field
index/database view/secondary index
"_id":
entity relations
12
13. Rows --> Documents
13
• Use some field to group documents by schema
• Example:
"type":"user" or "type":"book"
"_id":"user:456" or "_id":"book:9988"
14. Tables --> Databases
14
• Put all tables in one database; use "type": to distinguish
• Model entity relationships with secondary indexes
• http://wiki.apache.org/couchdb/EntityRelationship
16. Indexes and Queries
16
• An "index" in Cloudant is not strictly a performance optimization
• Instead, more akin to "materialized view" in RDBMS terms
• Index also called a "database view" in Cloudant
• Index, then query
• You need one before you can do the other
• Create index, then query by URL
• Can create a secondary index on any field within a document
• You get primary index (based on reserved "_id": field) by default
• Indexes precomputed, updated in real time
• Indexes are updated using incremental MapReduce
• You don't need to rebuild the entire index every time a document is changed,
added, or deleted
• Performant at big-honkin' scale
20. Yes
20
By ripping out the bad parts:
• Extract, Transform, Load
• Schema migrations
• JOINs that don't scale
21. Each of My Tables Becomes a Different
Type of JSON Document?
21
22. No
• Fancy explanation:
• Best practice is to denormalize
data into 3rd normal form
• Or, less fancy:
• Smoosh relationships for each
entry all together into one JSON
doc
• Denormalization
• Approach to data modeling that
shards well and scales well
• Works well with data that is
somewhat static, or infrequently
updated
22
A smooshed and griddled cheese sandwich
26. 26
www.glynnbird.com
• My home page
• Cloudant database of articles
• Replicated to PouchDB
• Appcache for offline first
• http://www.glynnbird.com/
26
27. 6. How do I get data in and out?
27
• Yes
• https://cloudant.com/for-developers/migrating-data/
• But every use case is different and everyone’s data is different
• Lots of DIY tools on github that could work for you
• Cloudant’s Homegrown CSV --> JSON Tools
• python: https://github.com/claudiusli/csv-import
• Java: https://github.com/cavanaugh-ibm/db-data-loader
• Node: https://github.com/glynnbird/couchimport