DB Weekly Issue 284: December 13, 2019

#284 — December 13, 2019

As we reach the end of the year, things are getting pretty quiet in database land. Next week, we'll be presenting our best of 2019 roundup, so keep an eye out for that :-)

Back Under a SQL Umbrella — If anything will define the year databases have had in 2019, it’ll probably be the huge shift into thinking about everything (well, almost) in terms of SQL. This post digs into two papers that respectively cover Google’s Procella data system and a SQL-based distributed machine learning system.

Adrian Colyer

How Scylla Scaled to One Billion Rows a Second — The technical details behind how Scylla (a Cassandra-compatible NoSQL data store) has managed to scale to reading a billion rows per second across 83 nodes on Packet’s bare metal servers.

Glauber Costa (ScyllaDB)

How to Build Your Architecture with TimescaleDB & AWS — Combine AWS and TimescaleDB to manage, store, and analyze your time-series data at scale. Check out recommendations for setting up your architecture with AWS & TimescaleDB and get started with $300 in free Timescale Cloud credits.

Timescale sponsor

Project Gemini: An Open Source Automated Random Testing Suite for Scylla and Cassandra Clusters — Project Gemini is ScyllaDB’s new automated random testing suite for data integrity of Scylla and Cassandra databases, now released as open source software.

Pekka Enberg and Henrik Johansson

Field Level Encryption Now Generally Available in MongoDB — MongoDB 4.2 introduces the ability to selectively encrypt and decrypt document fields in the application before data is sent to the database. JavaScript, Python, Java, C# .NET, and Go drivers are available now supporting the feature, along with v4.2.2 of the Mongo shell.

Asya Kamsky and Kenneth White (MongoDB, Inc.)

MongoDB user? Check out MongoDB Memo, our bi-weekly newsletter :-)

Google Introduces Storage Transfer Service for On-Prem Data — Unlike AWS Snowball this isn’t a physical solution, it’s basically software to both rapidly speed up and increase the reliability of the data ingress process.

Google Cloud Blog

💻 Jobs

Data Pipelines, Reinvented. Find Your Place at Fivetran — Rooted in Oakland, we are a fast-growing company hiring across software engineering, SRE, product, and data analytics. Come join us.

Fivetran

Find a Job Through Vettery — Make a profile, name your salary, and connect with hiring managers from top employers. Vettery is completely free for job seekers.

Vettery

📒 Everything else

The History of Data Exchange — IBM and General Electric invented the first databases in the early 1960s. Things, naturally, have advanced since then, but there’s one format dating from the 1970s that we’re still using a lot.. CSVs! Liquidata thinks, however, there’s a replacement on the horizon.

Tim Sehn

Why Databases Use Ordered Indexes But 'Programming' Uses Hash Tables — "I'll briefly explain the high level differences between hash tables and B-Trees, then discuss how persistent data has different needs than in-memory data."

Evan Jones

A Beginner's Guide to SQL's CROSS JOIN — CROSS JOIN basically lets you get the Cartesian product of two sets (i.e. all the combinations of pairs of items across sets) and here it’s used to create a deck of cards and then some hands of cards.

Vlad Mihalcea

Managed PostgreSQL by DigitalOcean — Deploy a highly scalable PostgreSQL cluster with no admin overhead.

DigitalOcean sponsor

Writing Diagnostic Queries is Hard Because SQL Server Still Has (Tiny) Bugs

Brent Ozar

Fear Database Changes? Get Them Under Control with CI/CD

Jason Skowronski

Dolt: It's Git for Data — A database inspired by Git that supports fine grained value-wise version control, where all changes to data and schema are stored in a commit log.

Liquidata

Fx: A Command-Line JSON Processing Tool — If you’ve got some files full of JSON that you want to process, Fx will slice and dice it however you want, including using JavaScript one-liners to add a bit of logic to the process.

Anton Medvedev