#​353 — May 7, 2021

Read on the Web

Database Weekly

Querying a Petabyte of Cloud Storage in 10 Minutes — Elastic’s move into supporting numerous ‘data tiers’ continues to interest me. The idea is you have different levels of responsiveness (e.g. cache vs disk vs ‘cold’ storage) and while Elastic’s ‘frozen’ data tier doesn’t promise rapid response times, it opens up the opportunity to query huge datasets. This post explains what the frozen data tier is and benchmarks how it performs.

Yannick Welsch (Elastic)

 You Might As Well Timestamp It — This is entirely one developer’s opinion, but I’ve gotta say, I’m (mostly) convinced. Jerod argues that rather than storing booleans, why not store timestamps? Rather than knowing merely if a user is active or not, isn’t it better to know when they became active?

Jerod Santo

Document Database. SQL Queries. In-Memory Speed — The No. 1 reason developers choose Couchbase? You can use your existing SQL skills to easily query and access JSON. That’s more power and flexibility with less training. Learn more.

Couchbase sponsor

Timescale Raises $40M Series B Investment — First appearing on our radar just two years ago, Timescale, a company behind a Postgres-based time-series data storage system, has grown rapidly. This post outlines their mission and intentions (‘10+ launches’ throughout this month, apparently).

Ajay Kulkarni (Timescale)

Hosting SQLite Databases on GitHub Pages (or Any Static File Host) — A clever bit of hacking around here. sql.js provides an SQLite client in the browser but the author created a virtual file system to fetch chunks of remotely hosted SQLite databases over HTTP. Some interesting potential here, I think.

Phiresky

Google Dataset Search: A Search Engine for Datasets — A handy way to dig around for just the right data you need or, perhaps, to find a dataset upon which to test your data processing skills (of course, not every dataset is open licensed, so check that if relevant).

Google

How the New York Times Manages Readers’ Data Privacy — At the scale the New York Times operates, big problems often require big solutions (their paywall system allegedly cost $40 million on its own) and so they built a system to allow a single team to implement data privacy changes across a suite of 70 products.

Kelsey Johnson (NY Times)

📺 Livestream: CAP Theorem and Distributed Systems - Availability — How does CAP Theorem affect database implementations? Join us on 5/12 for this session on: Availability.

Cockroach Labs sponsor

DynamoDB vs. MongoDB: A Comparison and How to Choose — A real world comparison of two wildly successful modern replacements for traditional database systems, DynamoDB vs. MongoDB.

Brian Scanlan

How Airbnb Achieved Metric Consistency at Scale — An introduction to Minerva, Airbnb’s metrics platform, which they use to model and transform data into accurate, analysis-ready datasets. Here’s the story of how it was built and what they get out of it.

Robert Chang, Amit Pahwa, Shao Xie

In Brief

Jobs

Find Data Engineering Jobs with Hired — Take 5 minutes to build your free profile & start getting interviews for your next job. Companies on Hired are actively hiring right now.
Hired