#260 — June 28, 2019

Read on the Web

Database Weekly

Using AWK and R to Parse 25TB of Data — This is a fun, practical look at several approaches taken to process a large data set, including the dead ends and lessons learned before settling on a reasonably ‘rustic’ solution.

Nick Strayer

MongoDB 4.2 Previewed at MongoDB World — The 4.2 release of the popular document-oriented database will be packed with new features - here’s a look at four of them, including distributed transactions and client-side field-level encryption. (Via our MongoDB-focused newsletter.)

Dj Walker-Morgan (MongoDB)

Studio 3T Makes SQL Migration to MongoDB, Powerfully Simple — Now you can import an entire SQL database to MongoDB using Studio 3T and its new SQL Migration feature.

Studio 3T sponsor

The Major Features Coming in PostgreSQL 12 — The much esteemed Postgres expert Bruce Momijan has released a simple slidedeck highlighting the most significant improvements and features coming to PostgreSQL 12, including JIT compilation and REINDEX CONCURRENTLY. It’ll only take you a minute to scan.

Bruce Momijan

The Cloud is Now the Default Platform for Databases, Gartner Says — The overall database market is growing, says Gartner, and cloud deployments are responsible for 68% of the growth, principally on AWS and Azure. While cloud deployments are rapidly becoming ‘default’, however, the cloud only accounts for 23% of the market’s revenue, so there’s still a long way to go.

Datanami

A New Redis Benchmark Hits 200 Million Ops/Sec — Redis Labs have pushed the enterprise version of Redis, the data structure store, up to 200 million operations per second with under 1 ms latency on 40 EC2 instances - a 4x improvement on a similar test last year.

Redis Labs

Analyze BigQuery Data with Kaggle Kernels Notebooks — Kaggle is a sort of social network/sharing platform for data scientists, Google acquired it in 2017, and it’s now integrated into BigQuery, enabling BigQuery users to use Kaggle’s neat analysis tools.

Google Cloud

IN BRIEF:

💻 Jobs

Senior Software Engineer (Santa Barbara or Remote) — Join a team where everyone is striving to constantly improve their knowledge of software development tools, practices, and processes.

Invoca

Find a DB Job on Vettery — Vettery specializes in tech roles and is completely free for job seekers.

Vettery

📒 Tutorials and Stories

▶  Advanced NoSQL Data Modeling with Amazon DynamoDB — The latest in a series of YouTube videos that dig deep into using DynamoDB, Amazon’s scalable NoSQL database. DynamoDB is quite unique in how it works so content like this is valuable if you plan to use it.

Amazon Web Services

An Introduction to Hypothetical Indexes in PostgreSQL — Why would you want to create imaginary indexes for Postgres’s optimizer to chew over? It’s a way to find out if an index would be useful before you endure the expense of creating a real one. SQL Server and Oracle can do this too.

Avinash Vallarapu

Building a Data Stream for IoT with NiFi & InfluxDB — Combining NiFi & InfluxDB results in secure, accessible, and usable IoT data streams. This solution enables a single data view across all facilities providing proactive maintenance, failure detection, and more.

InfluxData sponsor

Analyzing the Performance and Cost of Large-Scale Data Processing with AWS Lambda — A serverless approach isn’t applicable for every data analytics use case but the low TCO and flexibility of AWS Lambda has a lot going for it.

Amazon Web Services

SQLsmith: Randomized SQL Testing in CockroachDB — Randomized testing lets you automate the discovery of interesting test cases that would be difficult to come up with on your own and CockroachDB has adopted the idea for its ultra-resilient SQL database.

Matt Jibson (Cockroach Labs)

MongoDB's Plan to Stop Breaches With Dead Simple Database Encryption — MongoDB has been working on a new encryption scheme that should help keep customers’ data more secure.

Lily Hay Newman (Wired)

Spring Cleaning at OverOps: How (and Why) We Changed Our DB Cleaning Strategy“after years of writing and executing code, our DB’s free disk space started to run out..” Here’s how they addressed the issue.

Aviv Danziger

RedisTimeSeries: A Redis Module for Working with Time Series Data — Redis is already a great fit for time series work but this introduces some cool new features like timed retention policies on streams, downsampling, and integration with other tools.

Redis Labs