See all blog posts

ScyllaDB Open Source Release 4.3

The ScyllaDB team is pleased to announce the release of ScyllaDB Open Source 4.3, a production-ready release of our open source database.

ScyllaDB is an open source, NoSQL database with superior performance and consistently low latencies. Find the ScyllaDB Open Source 4.3 repository for your Linux distribution here. ScyllaDB 4.3 Docker is also available.

ScyllaDB 4.3 includes production ready Change Data Capture (CDC) as well as an experimental version of Alternator Streams – AWS DynamoDB Streams compatible API, Unified Installer, a first step toward removing the seed node concept, and many other improvements and bug fix (below).

Please note that only the last two minor releases of the ScyllaDB Open Source project are supported. Starting today, only ScyllaDB Open Source 4.3 and ScyllaDB 4.2 are supported, and ScyllaDB 4.1 is retired.

Related Links

Deployment

  • Unified Installer is now available (see below)
  • ScyllaDB now have an official GCP Image  (see below)
  • Support for direct upgrade from 2018 vintage releases was removed. One can still upgrade via intermediate versions (e.g. 2.3 → 3.0 → 3.1 → 3.3 → 4.0 → 4.1).

New features in ScyllaDB 4.3

Change Data Capture (CDC)

Change Data Capture (CDC) allows users to track data updates in their ScyllaDB database. While it is similar to the feature with the same name in Apache Cassandra and other databases, how it is implemented in ScyllaDB is unique, more accessible, and more powerful.

Originally introduced in ScyllaDB 4.1 as experimental, CDC is now graduating to a production ready feature.

With ScyllaDB’s CDC, you can choose to keep track of the updates, original values and/or new values. The data is stored in regular ScyllaDB tables (SSTables) and can be queried asynchronously using a standard ScyllaDB/Cassandra CQL driver. Data in a CDC table is set with a TTL, minimizing the possibility of an overflow.

Example: creating a table with CDC for delta and preimage:

CREATE TABLE base_table (
   pk text,
   ck text,
   val1 text,
   val2 text,
   PRIMARY KEY (pk, ck)
) WITH cdc = { 'enabled': 'true', 'preimage': 'true' };

CDC is also used internally to implement the Alternator Streams compatible with AWS DynamoDB Streams API (see below)

Updates from ScyllaDB 4.2:

  • The format of stream IDs: the lower 8 bytes were previously generated randomly, now some of them have semantics. In particular, the least significant byte contains a version (stream IDs might evolve with further releases).
  • Remove delta_mode::off.  CDC information is not useful without at least delta==keys, since pre/post image data contains no info on what is actually done in base table operations. #7128

More information:

Changes in CDC from ScyllaDB 4.2

  • Improved the way pre-images and post-images for batches are generated. The commit includes a detailed description of the improvement. Fixes #6597, #6598
    More information here
  • Add an “end-of-record” column. Adds an “eor” (end-of-record) column to cdc log. This is non-null only on last-in-timestamp group rows, i.e. end of a singular source “event”. A client can use this as a shortcut to knowing whether or not he has a full cdc “record” for a given source mutation (single row change). #7436
  • Add version number and grouping to stream id to allowing changing it in the future #6948
  • Ensure that CDC generation write is flushed to commitlog before ack #7619

Alternator Streams (Experimental)

Based on CDC, the following DynamoDB API stream operations are now supported in ScyllaDB:

  • DescribeStream
  • GetRecords
  • GetShardIterator
  • ListStreams

#5065

To enable Streams API with alternator, enable the experimental feature

Using --experimental-features=alternator-streams in the command line or by adding the following to scylla.yaml

experimental_features:
  - alternator-streams

Note that Alternator Streams differ in some respects from DynamoDB Streams:

  • The number of separate “shards” in Alternator’s streams is significantly larger than is typical on DynamoDB.
  • While in DynamoDB data usually appears in the stream less than a second after it was written, in Alternator Streams there is currently a 10 second delay by default. #6929
  • Some events are represented differently in Alternator Streams. For example, a single PutItem is represented by a REMOVE + MODIFY event, instead of just a single MODIFY or INSERT. #6930 #6918

More here

ScyllaDB Unified Installer

ScyllaDB is now available as an all-in-one binary tar file. You can download the tar file from the download page.

Unified Installer should be used when one does not have root privileges on the server.

For installation on an air-gap server, with no external access,  it is recommended to download the standard packages, copy them to the air gap server and install using the standard package manager, more here#6626 #6949

GCP Images

ScyllaDB is now available as a GCE image on Google Compute Platform (GCP) #7080 #6631

More on launching ScyllaDB Image for GCE here

Remove the seed concept in gossip

The concept of seed and the different behavior between seed nodes and non-seed nodes generate a lot of confusion, complication, and error for users. Starting from this release, seed nodes are ignored in the Gossip protocol. They are still in use (for now) as part of the init process. #6845

More on seedless nodes here

Security related updates

  • gnutls vulnerability: GNUTLS-SA-2020-09-04 #7212
  • New optional CQL ports  (19042, 19142, see Shard aware CQL ports below)
  • Allow users to disable CQL unencrypted native transport by setting it to zero #6997

CQL Extensions

  • A new CQL protocol extension allows client drivers to distinguish between lightweight transactions (LWT) and non-LWT statements. The intent is to prefer the primary replica when sending LWT statements, to reduce transaction aborts due to contention.
    More information
  • Shard aware CQL ports: new CQL port (19042 by default) is being open for the so-called “advanced shard-aware drivers”. It works exactly as the typical 9042 works for existing drivers (connector libraries), but it allows the client to choose the specific shard to connect to by precise binding of the client-side (ephemeral) port. Also, a TLS alternative is supported, under port 19142.
    More information

Additional Features

  • Support for SSTable “md” format (CASSANDRA-14861) #4442
  • Docker: a new  ‘--io-setup N‘ command line option, which users can pass to specify whether they want to run the “scylla_io_setup” script or not. This is useful if users want to specify I/O settings themselves in environments such as Kubernetes, where running “iotune” is problematic. #6587
  • Requests Role or User are now tracked in the tracing output #6737
  • Tracing improvements: better messages for single-key queries, and activities in the cache. The commits come with some examples.
  • REST API: Add long polling to StorageServiceRepairAsyncByKeyspaceGet #6445
  • Docker: ScyllaDB on Docker now supports the passing of extra arguments to scylla #7458

Tools

  • node_exporter is an agent used to report OS level metrics to ScyllaDB Monitoring Stack. In this release installed node_exporter is update from 0.17.0 to 1.0.1
  • Nodetool getendpoints can return wrongs nodes list #7134
  • The JMX management interface was enabled for Java 11 for Debian systems. It was already supported, but a packaging error resulting in installation problems.
  • nodetool now support the gettraceprobability command #7265

Performance Optimizations

  • Cleanup: nodetool cleanup procedure is used after adding nodes, to remove data that was migrated to the new nodes. The calculation that determines which sstables needed to be rewritten (cleaned up) was inefficient and could cause stalls in queries running at the same time, so it was optimized #6662
  • JSON: More of the code base was migrated from the jsoncpp library to the rjson library, improving JSON performance, JSON is heavily used in ScyllaDB Alternator.
  • MV: After a repair, bootstrap, or decommission (or a similar operation), the node receiving new data must update materialized views. This is done by reading staging sstables containing the new data, row by row, and updating the view rows corresponding to those rows. There can be large numbers of such sstables (one per vnode per peer), and reading from such large numbers of sstables requires a lot of memory. This memory usage is now dramatically reduced by exploiting the property that per-vnode sstables have few overlaps, and reusing the partitioned_sstable_set (which we use for leveled compaction strategy, which has similar properties) to avoid reading from all those sstables at once. #6707
  • Repair: moving from to btree_set for repair_hash eliminate the need for large allocation which cause stalls  #6190
  • The ScyllaDB cache and memtable implementations now use a btree instead of a red-black tree. This improves performance in cache-intensive and write-intensive workloads. More here
  • utf8 validation of large cells causes latency spikes. In this release, UTF8 validation update to work on fragmented buffers to fix this. #7448

Configuration

The following new configuration parameters are available in this release:

  • max_concurrent_requests_per_shard: Maximum number of concurrent requests a single shard can handle before it starts shedding extra load. By default, no requests will be shed. Default: max (disabled)
  • native_shard_aware_transport_port_ssl: Like native_transport_port_ssl, but clients are forwarded to specific shards, based on the client-side port numbers. Default: 19142.
  • enable_sstables_md_format: Enable SSTables ‘md’ format to be used as the default file format (requires enable_sstables_mc_format). Default: true
  • max_memory_for_unlimited_query was replaced by two new parameters;
    • max_memory_for_unlimited_query_soft_limit Maximum amount of memory a query, whose memory consumption is not naturally limited, is allowed to consume, e.g. non-paged and reverse queries.  This is the soft limit, there will be a warning logged for queries violating this limit. Default: 1 MB
    • max_memory_for_unlimited_query_hard_limit: Maximum amount of memory a query, whose memory consumption is not naturally limited, is allowed to consume, e.g. non-paged and reverse queries. This is the hard limit, queries violating this limit will be aborted. Default: 100 MB

#5870

  • schema_registry_grace_period: Time period in seconds after which unused schema versions will be evicted from the local schema registry cache. Default: 1 second
  • alternator_streams_time_window_s: CDC query confidence window for alternator streams. Default 10 second

Redis

The following commands have been added to ScyllaDB Redis API:

Build and Debugging

  • New documentation on ScyllaDB debugging
  • Toolchain: regenerate for gcc 10.2, As a side effect, this also brings in xxhash 0.7.4 #6813

Monitoring

For Metics updates from 4.2 to 4.3, see here

ScyllaDB 4.3 dashboard is available in the latest ScyllaDB Monitoring Stack release 3.5.3 or later.

Other bugs fix and updates in the release

  • CQL: Numbers of the ‘decimal’ type that had negative scale (which translates to positive exponent: 1.2e1 has negative scale, while 1.2e-1 has positive scale), when casted to a floating-point type (‘CAST x AS float‘), thew the node into a long loop, effectively making it unavailable. #6720
  • CQL:  A bug where impossible range restrictions (WHERE a > 0 AND a < 0) was processed incorrectly #5799
  • Stability: Internal schema change CQL queries should not be used for distributed tables #6700
  • CQL: min()/max() return wrong results on some collections #6768
  • Stability: a rare memory leak caused by improper commitlog usage from hints manager #6776 #6409
  • Stability: Row-level repair was made more robust against hash collisions. Row-level repair uses a hash to identify mismatched rows. A weak hash is used to reduce computation and network costs, but this results in the possibility of collisions. Now, when a collision is detected, repair will copy the colliding rows (even if there was no problem) rather than fail the repair.
  • Stability: Repair now uses a uuid to identify repair jobs #6786
  • Stability: compaction should print a unique id to correlate start and finish log messages #6840
  • Alternator: tracing was fix for  GetItem and BatchItem #6891
  • Stability: TWCS: compaction: partition estimate can become 0, causing an assert in sstables::prepare_summary() #6913
  • scyllatop: using `metricPattern` fails with “dictionary changed size during iteration#7488
  • Stability: Cleanup compaction in KA/LA sstables may crash the node in some cases #7553
  • Stability: secondary index updates failing after upgrade to 4.2.0 due to missing  system_schema.computed_columns. The problem is limited to secondary indexes created *before* ScyllaDB 3.2, which had their `idx_token` column incorrectly not marked as computed #7515
  • Stability: schema integrity issues that can arise in races involving materialized views may cause segmentation fault and coredump happened during starting after scylla was killed #7420
  • Stability: provide strong exception guarantees from load_sstable() #6658
  • Stability: Incrementally delete resharded sstables as they’re retired #7463
  • Stability: Node may get stuck in schema disagreement loop when bootstrapping sequentially #7396
  • CQL: token_restriction: invalid_request_exception on SELECTs with both normal and token restrictions #7441
  • lwt: store column_mapping’s for each table schema version upon a DDL change
  • Code refactoring: Move write() methods from class sstable to class sstable_writer #3012
  • Stability: Clean cluster issued ‘Exceptional future ignored‘ right after been started with no load on it #7352
  • UX: Make batchlog size warning clearer #7367
  • Stability: ascii validation of large cells causes large allocations #7393
  • Stability: ScyllaDB 4.1.7 crashing on repairs (uncaught exceptions) #7285
  • Stability: Shutting down database hangs in dtest-debug #7331
  • Stability: Useless linearization on large data during validation, of either type bytes or string-derived, potentially cause stalls due to reclaiming #7318
  • Stability: NEW_NODE should be sent after listening for CQL clients has started #7301
  • Stability (counters): runtime error: signed integer overflow cannot be represented in type ‘long int#7330
  • Non root install (Logging): can’t find scylla log by journalctl --user -xe #7131
  • Non root install: nonroot install: ubuntu18 failed to start for `NOFILE rlimit too low` #7133
  • CQL (found with Jepsen): Aborted reads of writes which fail with “Not enough replicas available#7258
  • Stability: Unsynchronized cross-shard memory operations caused by incorrectly used updateable_value #7310
  • Stability: Reduce unnecessary VIEW_BACKLOG updates in gossip #5970
  • Non root install: nonroot: systemctl –user enable scylla-server.service failed on Ubuntu 18 #7288
  • Non root install: Got offline mode warnings on nonroot mode #7286
  • CQL: (LOCAL_/EACH_)QUORUM consistency calculation is broken when RF=0 #6905
  • Install: scylla_prepare: ‘get_set_nic_and_disks_config_value‘ is not defined #7276
  • Stability: RPC server still has handlers registered in dtests involving repair-based operations #7262
  • Stability: Query pager can try to get keys from empty vector of result::partitions #7263
  • Install: scylla_cpuscaling_setup: Got warning when installing scylla-cpupower.service #7230
  • Stability: disable_autocompaction_nodetool_test failed: std::runtime_error (Too early: storage service not initialized yet) #7199
  • Stability: abstract_replication_strategy::do_get_ranges is passed a reference to token_metadata that may be invalidated if it yields #7044
  • Install: scylla_setup doesn’t support to skip install of hugepages or libhugetlbfs-bin package #7182
  • Stability: init – Startup failed: std::runtime_error (Could not find tokens for 10.0.0.155 to replace) during large-partition-4d test #7166
  • Init: scylla4 process fails to restart (perftune.py: error: argument --mode: invalid choice: 'None') #6847
  • CQL: Forbid adding new fields to UDTs used in partition key columns #6941
  • Stability: Make allocation_section decay the reserves over time #325 (from 2015!)
  • Init: scylla_setup failing with dependency errors #7153
  • CQL (found with Jepsen): Weird return values from batch updates #7113
  • Non root install: scylla-python3 isn’t loaded for setup scripts #7130
  • Stability: some non-prepared statements can leak memory (with set/map/tuple/udt literals) #7019
  • Stability: Reactor stall for 6 ms in sstables::seal_summary() #7108
  • Stability: coredump while node hit enospc: “Assertion `!this->_future && this->_state && !this->_task' failed #7085
  • Stability: gossip: Apply state for local node in shadow round #7073
  • Stability (CDC): CDC: within a batch, partition deletes and range deletes do not affect postimage #6900
  • CDC: cdc delta == keys does not produce cdc$operation nor cdc$ttl #7095
  • Stability: Hinted handoff is using very long timeout to sending some hints #7051
  • Build: dist: scylla-python3 should be separated repository #6751
  • Stability: Track repair_meta created on both repair follower and master #7009
  • Stability: Failed compaction : compaction_manager – compaction failed: sstables::malformed_sstable_exception (consumer not at partition boundary) #6529
  • Stability: sstable code needs to close files in error paths #6448
  • Stability: ScyllaDB setup failed: unit mdmonitor.service is not found or invalid #7000
  • UX: scylla_setup: include swap size in the prompt #6947
  • Untyped result sets may cause segfaults when parsing disengaged optionals #6915
  • UX: scylla_setup: default “done” when there is no disk to choose from #6760
  • Stability: Startup failed: std::runtime_error ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1: fmt::v6::format_error (invalid type specifier)})']) due to an adjusted in format strings in log messages after changing string format library #6874
  • Setup: scylla_setup does not abort RAID setup when no free disk available #6860
  • CQL: Filtering captures uninitialized/deleted values of certain types #6295
  • Stability: repair: Relax node selection in bootstrap when nodes are less than RF #6744
  • UX: scylla_prepare: Improve error message on missing CPU features #6528
  • CQL: NULL counters treated as 0 #6382
  • CQL: IN(NULL) yields different results with prepared statements #6369
  • CQL: LIKE filter ignored on column key #6371
  • CQL: NULL and empty text are considered equal #6372
  • Stability: scylla_coredump_setup always fails on CentOS 7 #6789
  • Setup: node_exporter_install --force failing #6782
  • repair: inaccurate log from check_failed_ranges #6785
  • repair: log recoverable errors as warnings rather than info messages #5612
  • scylla setup fails on Oracle Linux Artifact: OS variant not recognized #6761
  • scylla_util.py: duplication on detecting distribution #6691
  • scylla_setup: on RAID prompt, strange output when passing comma separated multiple disks #6724
  • Stability: connection storm when attempting to achieve shard-per-connection #5239
  • Stability: Nodetool Repair failing on keyspace with std::runtime_error (row_diff.size() != set_diff.size()) #6252
  • Stability: some non-prepared statements can leak memory (with set/map/tuple/udt literals) #7019
  • ScyllaDB_setup: scylla_setup failed to setup RAID when called without --raiddev argument #7627
  • ScyllaDB_setup: scylla_raid_setup: use sysfs to detect existing RAID volume, which may not be able to detect existing RAID volume by device file existence. #7383
  • Stability: “rare race condition in compaction_writer destructor may cause Segmentation fault during scylla stop, for example with CDC traffic #7821

12 Jan 2021

About Tzach Livyatan

Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signalling, policy and charging applications.