#247 — March 29, 2019 |
Database Weekly |
Going Deep on the Design of Amazon Aurora — the morning paper recently featured two papers that dig deep into how Amazon’s Aurora database system works. Aurora is a MySQL and PostgreSQL-compatible database that uses a lot of optimizations under the hood, and these papers dig into those optimizations. The second paper covers the distributed consensus model. Adrian Colyer |
Was MongoDB Ever the Right Choice? — If you’re looking for a place to rant and rave about MongoDB, this isn’t it. As an innovation in the database world, MongoDB has received a lot of criticism over the years but the problems it solves are important, says the author. (And if you're a MongoDB user, head over to our MongoDB newsletter :-)) Justin Etheredge |
Why to Use a Relational Database for Your IoT Applications — Don’t make the mistake of creating data silos. With a relational database like TimescaleDB, you can unify time-series, metadata, & geospatial data in a single database system that scales from the cloud to the edge. Which is why Azure partnered with Timescale to power IoT & time-series workloads. Timescale sponsor |
How to Build and Run the Open Distro For Elasticsearch SQL Plugin with Elasticsearch OSS — We recently mentioned Amazon’s controversial new distribution of Elasticsearch and now we get to see one part of it in action: a plugin that lets you query an Elasticsearch store using SQL. Jon Handler |
McDonald's Bites on Big Data With $300M Acquisition — McDonalds’s brings Big Data to its Big Macs with its largest acquisition in 20 years. The plan is to use data analysis to suggest products customers are more likely to buy. Brian Barrett |
Neo4j Unveils Its 'Startup Program' Offering Graph Database Tech to Small Companies — The company behind the Neo4j graph database have started a program opening up their enterprise-level offerings to certain types of startup and small companies. Neo4j |
📖 Tutorials |
Redis Streams as a Pure Data Structure — The creator of Redis, the data structure server, looks at streams as a pure data structure and how they can be like “CSV files on steroids.” Salvatore Sanfilippo |
The Best Way to Count Distinct Indexed Things — It’s significantly more efficient to do a count of a subquery’s results than to try to do a count with Peter Bengtsson |
pganalyze eBook: How to Get a 3x Performance Improvement on Your Postgres Database — Learn our best practices for optimizing Postgres query performance for customers like Atlassian and how to reduce data loaded from disk by 500x. pganalyze sponsor |
Indexes in Postgres: A Look at B-Trees — The latest in a series of extensive posts taking a deep dive into how indexes work. Egor Rogov |
Speeding Up Hans-Jürgen Schönig |
💬 Stories and Opinions |
How We Moved a Massively Parallel Postgres DB onto Kubernetes Oz Basarir (Pivotal) |
Some Numbers You'll Know by Heart If You Have Been Working with SQL Server for A While — A light hearted, tongue in cheek piece for SQL Server users. I think every database could have an equivalent article! Denis Gobo |
A Tale Of Two Queries: Standing Up for ANSI-89 SQL Allan Hirt |
🛠 Code & Tools |
automl-gs: Quickly Perform Machine Learning on CSV Files — OK, the title doesn’t quite get at what this is, but it’s neat. Give automl-gs a CSV file and it’ll create a model for predicting the values of a field of your choice that you can work with from Python. Max Woolf |
Bitraft: A Bitcask Distributed Key/Value Store using Raft for Consensus with a Redis Compatible API — Bitcask is both a high performance Go key/value store and a storage format used by Riak. Bitraft adds Raft-powered consensus to distribute the store. James Mills |
csvq: Use an SQL-like Query Language on CSV Files — Includes an interactive REPL. Mithrandie |