Issue #262: The largest file transfer in history? 2.9 petabytes

#262 — July 12, 2019

Why Benchmarking Distributed Databases Is So Hard — Benchmarks are hard to get right, and many articles touting benchmarks are actually ‘benchmarketing’, showcasing skewed outcomes to sell products. This post introduces some of the motivations for benchmarking and the common tools, and discusses a few things to keep in mind when benchmarking.

Ana Hobden (PingCAP)

This Summer's Database News Roundup — We give you a roundup of the database world every week but sometimes it helps to reflect on a larger timescale, and SQL guru Markus Winand has done that here.

Markus Winand

Get Started with Timescale Cloud on AWS, GCP & Azure — Timescale Cloud gives developers the freedom to deploy and migrate time-series workloads across a variety of regions around the world, on the cloud provider of their choice, with just a few clicks. Sign up here for $300 in trial credits.

Timescale sponsor

BlobCity: An ACID-Compliant NoSQL 'Data Lake' for Documents — We’ve all heard of document-oriented databases, but BlobCity takes it to another level by supporting 17 data formats directly including Excel, PDF, CSV, XML, and JSON. It has full SQL and DML capabilities along with Java stored procedures for advanced data processing. It’s built in Java and is GPL 3 licensed.

BlobCity, Inc

Amazon Aurora PostgreSQL Serverless Now Generally Available — Amazon Aurora is a performance-oriented AWS-based database that provides MySQL and Postgres compatibility and charges per instance by the hour. However, the serverless variant auto-scales and lets you simply ‘pay as you go’ (to a point - you have to define a minimum and maximum amount of capacity). I think it stretches the term ‘serverless’ somewhat, but hey.

Amazon Web Services

When SQL Isn’t the Right Answer — If you’re a database expert there won’t be a lot you can get out of this, but otherwise it’s a cogent, high level argument for taking a NoSQL approach to many data management problems.

Aphinya Dechalert

How Twitter Is Democratizing Data Analysis with Google BigQuery — Twitter has been migrating parts of their data infrastructure to Google Cloud and have been leaning heavily on BigQuery and Data Studio to open up access to their data. Here’s the full story.

Prasad Wagle (Twitter)

The Largest Single File Transfer in History — A team of scientists at Argonne National Laboratory has broken a data transfer record by moving a staggering 2.9 petabytes of data (created by three large cosmological simulations). Tape was involved! 😬

Datanami

SQL Server Introduces Official UTF-8 Support — SQL Server 2019 introduces support for the widely used UTF-8 character encoding. This has been a longtime requested feature and can be set as a database-level or column-level default encoding for Unicode string data.

Microsoft

Hera: A MySQL and Oracle Multiplexer from PayPal — Built in Go, Hera (‘High Efficiency Reliable Access’) multiplexes connections for MySQL and Oracle databases and supports sharding for horizontal scaling.

PayPal

💻 Jobs

Senior Software Engineer (Santa Barbara or Remote) — Join a team where everyone is striving to constantly improve their knowledge of software development tools, practices, and processes.

Invoca

Land a New Dev Job on Vettery — Vettery specializes in tech roles and is completely free for job seekers.

Vettery

📒 Tutorials and Stories

A Data Model for Online Concert Ticket Sales — Great to see Vertabelo back with another of their popular data modelling articles. Just how would you model a relational database for handling ticket sales?

Tihomir Babic

The Fastest Way to Load Data into PostgreSQL with Python — For when you’ve got a large collection of ‘dirty’ data that needs to be fetched and transformed and then brought into PostgreSQL.

Haki Benita

SQL, Python, and R. All in One Platform. Free Forever — Mode combines a SQL editor, native Python & R notebooks, and viz builder in one platform. Connect, analyze, & share.

Mode sponsor

Migrating 6.5TB of Data to AWS S3 - A Journey Concluded — The tale of taking 6.5TB of FileStream data from SQL Server and getting it into AWS which wasn’t entirely straightforward.. complete with a ‘near-heart-attack moment’(!)

Michael Saunders

Goodbye Hadoop. Building a Streaming Data Processing Pipeline on Google Cloud — It’s on the Google Cloud blog, but is really a customer case study where Qubit discusses how they moved from Apache Hadoop and MapReduce to Google’s BigQuery, Dataflow and Cloud Pub/Sub.

Qubit

S3 or DynamoDB? — How to choose the right storage system for AWS Lambda functions.

Gojko Adzic