home shape

What’s new in ArangoDB 3.6: OneShard Deployments and Performance Improvements

Estimated reading time: 9 minutes

Welcome 2020! To kick off this new year, we are pleased to announce the next version of our native multi-model database. So here is ArangoDB 3.6, a release that focuses heavily on improving overall performance and adds a powerful new feature that combines the performance characteristics of a single server with the fault tolerance of clusters.

If you would like to learn more about the released features in a live demo, join our Product Manager, Ingo Friepoertner, on January 22, 2020 – 10am PT/ 1pm ET/ 7pm CET for a webinar on “What’s new in ArangoDB 3.6?”.

Need to know more about multi-model?

Get our technical White Paper

tl;dr: Highlights of ArangoDB 3.6:

  • OneShard Feature
  • Performance Optimizations
    • Subquery acceleration (up to 30x)
    • Late document materialization
    • Early pruning of non-matching documents
    • Parallel AQL execution in clusters
    • Streamlined update and replace queries
  • ArangoSearch Enhancements
  • New Cloud Service Pricing Model

ArangoDB 3.6 is also available on ArangoDB ArangoGraph – the cloud service for ArangoDB. Start your free 14-day trial today!

You will not regret upgrading to 3.6, as it most likely will improve your experience with your existing ArangoDB setup.

In 3.6 we concentrated strongly on performance optimizations for the everyday use of ArangoDB, and we picked the ones with the biggest impact first. As many users as possible should experience notable improvements and there is more in the pipeline for future releases.

Subquery performance has been improved up to 30 times, parallel execution of AQL queries allow to significantly reduce gathering time of data distributed over several nodes, and late document materialization reduces the need to retrieve non-relevant documents completely. Simple UPDATE and REPLACE operations that modify multiple documents are more efficient because several processing steps have been removed. The performance package is rounded off by an early pruning of non matching documents, essentially by directly applying the filter condition when scanning the documents, so that copying documents that do not meet the filter condition into the AQL scope can be avoided. Read more details in the AQL Subquery Benchmark or in the feature descriptions further on in this blog post.

The feature with probably the greatest impact is OneShard. Available in the Enterprise Edition of ArangoDB, customers can run use cases such as Graph Analytics on a single database node, with high availability and synchronous replication. Because the data is not distributed across multiple nodes, the graph traversal can be efficiently performed on a single node. The OneShard Cluster deployments are also available from our managed service, ArangoDB ArangoGraph.

With every release, we also improve the capabilities of ArangoSearch, our integrated full-text search engine with ranking capabilities. In 3.6 we have added support for edge n-grams to the existing Text Analyzer to support word-based auto-completion queries, improved the n-gram Analyzer with UTF-8 support and the ability to mark the beginning/end of the input sequence. ArangoSearch now also supports expressions with array comparison operators in AQL, and the `TOKENS()` and `PHRASE()` functions accept arrays. Both features enable dynamic search expressions.

If you are working with Dates you should know that AQL in 3.6 enforces a valid date range for working with date/time in AQL. This restriction allows for faster date calculation operations.

Of course, there are many other small features and improvements under the hood that you can leverage, please have a look at the Release Notes and the Changelog for all details.

ArangoDB 3.6 is already available on our Managed Cloud Service ArangoDB ArangoGraph, which offers you enterprise-quality ArangoDB clusters on AWS, Google Compute and soon Azure as well. Take ArangoDB 3.6 for a spin there with just a few clicks. First 14 days are on us!

New Cloud Service Pricing Model

In parallel to the 3.6 release, we are pleased to introduce also a new, attractive pricing system for ArangoDB ArangoGraph. You can now have your own highly available and scalable deployment from as little as $0.21 per hour (3 nodes, 4 GB RAM & 10 GB memory).

Some sample configurations for a 3-node OneShard deployment and their starting prices are listed in the table below (Please find the exact price for your desired setup within your ArangoGraph Account).

Memory per node Storage per node Starting at
4GB 10GB $0.21/hour
8GB 20GB $0.52/hour
16GB 40GB $0.91/hour
32GB 80GB $1.74/hour
64GB 160GB $3.42/hour
128GB 320GB $6.52/hour

The team worked very hard to further reduce the footprint of ArangoGraph sidecars, optimize the use of cloud resources and automate the modern ArangoGraph deployment process. In addition, we have been able to ramp up far more customers than expected in recent weeks, allowing us to pass on lower cloud costs and add support for more regions.

We hope that ArangoGraph is now an even better solution for more in the community and will continue to drive prices down further.

Register for the Webinar “What’s new in ArangoDB 3.6” on January, 22nd, 2020 – 10am PT/ 1pm ET/ 7pm CET to see a live demo of newly released features.

For those who are curious what the features are about, here are some highlights with a brief description:

OneShard (Enterprise Edition)

Not all use cases require horizontal scalability. In such cases, a OneShard deployment offers a practicable solution that enables significant performance improvements by massively reducing cluster-internal communication.

OneShard feature ArangoDB 3.6A database created with OneShard enabled is bound to a single DB-Server node but still replicated synchronously on other nodes to ensure resilience. This configuration allows running transactions with ACID guarantees on shard leaders.

This setup is highly recommended for most Graph use cases and join-heavy queries.

If an AQL query accesses only collections that are locally on the same DB-Server node, the whole execution is transferred from the Coordinator to the DB-Server.

The possibilities are a lot broader than this, so please continue to read more about multi-tenancy use cases, ACID transactions and mixed-mode in the OneShard documentation.

Early pruning of non-matching documents

ArangoDB 3.6 evaluates `FILTER` conditions on non-index attributes directly while doing a full collection scan or an index scan. Any documents that don’t match the `FILTER` conditions will then be discarded immediately.

Previous versions of ArangoDB needed to copy the data of non-matching documents from the scan operation into some buffers for further processing and finally filtering them.

With this scanning and filtering now happening in lockstep, queries that filter on non-index attributes will see a speedup. The speedup can be quite substantial if the `FILTER` condition is very selective and will filter out many documents, and/or if the filtered documents are large.

For example, the following query will run about 30 to 50% faster in 3.6 than in 3.5:

FOR doc IN collection
  FILTER doc.nonIndexedValue == "test123456"
  RETURN doc

(Mileage may vary depending on actual data, the tests here were done using a single server deployment with the RocksDB storage engine using a collection with one million documents that only have a single (non-indexed) `nonIndexedValue` attribute with unique values).

Subquery Performance Optimization

Subquery splicing inlines the execution of certain subqueries using a newly introduced optimizer rule. On subqueries with few results per input, the performance impact is significant.

Here is a self-join example query:

FOR c IN colC
  LET sub = (FOR s IN colS FILTER s.attr1 == c.attr1 RETURN s) 
  RETURN LENGTH(sub)

Inlining this basic subquery yields to 28x faster query execution time in a cluster setup and a collection of 10k documents.

Explore further details in his Subquery Performance Benchmark.

Late document materialization (RocksDB)

Queries that use a combination of `SORT` and `LIMIT` will benefit from an optimization that uses index values for sorting first, then applies the `LIMIT`, and in the end only fetches the document data for the documents that remain after the `LIMIT`.

Sorting will be done on the index data alone, which is often orders of magnitude smaller than the actual document data. Sorting smaller data helps reducing memory usage and allocations, utilize caches better etc. This approach is often considerably faster than fetching all documents first, then sorting all of them using their sort attributes and then discarding all of them which are beyond the `LIMIT` value.

Queries, as follows, could see a substantial speedup:

FOR doc IN collection
  FILTER doc.indexedValue1 == "test3"
  SORT doc.indexedValue3
  LIMIT 100
  RETURN doc

The speedup we observed for this query is about 300%. For other queries we have seen similar speedups.

(Mileage may vary depending on actual data, the tests here were done using a single server deployment with the RocksDB storage engine using a collection with one million documents and a combined index on attributes `indexedValue1`, `indexedValue2` and `indexedValue3`. There were 10 distinct values for `indexedValue1`).

That optimization is applied for collections when using the RocksDB storage engine and for ArangoSearch views.

Parallel Execution of AQL Queries

ArangoDB 3.6 can parallelize work in many cluster AQL queries when there are multiple database servers involved. For example, if the shards for a given collection are distributed to 3 different database servers, data will be fetched concurrently from the 3 database servers that host the shards’ data. The coordinator will then aggregate the results from multiple servers into a final result.

Querying multiple database servers in parallel can reduce latency of cluster AQL queries a lot. For some typical queries that need to perform substantial work on the database servers we have observed speedups of 30 to 40%.

The actual query speedup varies greatly, depending on the cluster size (number of database servers), number of shards per server, document count and size, plus result set size.

Parallelization is currently restricted to certain types of queries. These restrictions may be lifted in future versions of ArangoDB.

Optimizations for UPDATE and REPLACE queries

Cluster query execution plans for simple `UPDATE` and `REPLACE` queries that modify multiple documents and do not use `LIMIT` will now run more efficiently, as the optimizer can remove several execution steps automatically. Removing these steps reduces the cluster-internal traffic, which can greatly speed up query execution times.

For example, a simple data-modification query such as:

FOR doc IN collection
  UPDATE doc WITH { updated: true } IN collection

Here we could remove one intermediate hop to the coordinator, which also makes the query eligible for parallel execution. We have seen speedups of 40% to 50% due to this optimization, but the actual mileage can vary greatly depending on sharding setup, document size and capabilities of the I/O subsystem.

The optimization will automatically be applied for simple `UPDATE`, `REPLACE` and `REMOVE` operations on collections sharded by `_key` (which is the default), provided the query does not use a `LIMIT` clause.

ArangoSearch Enhancements

We continuously improve the capabilities of ArangoSearch. The late document materialization mentioned accelerates the search by reading only necessary documents from the underlying collections.

Search conditions now support array comparison operators with dynamic arrays as left operand:


LET tokens = TOKENS("some input", "text_en")                 // ["some", "input"]
FOR doc IN myView SEARCH tokens  ALL IN doc.title RETURN doc // dynamic conjunction
FOR doc IN myView SEARCH tokens  ANY IN doc.title RETURN doc // dynamic disjunction
FOR doc IN myView SEARCH tokens NONE IN doc.title RETURN doc // dynamic negation
FOR doc IN myView SEARCH tokens  ALL >  doc.title RETURN doc // dynamic conjunction with comparison
FOR doc IN myView SEARCH tokens  ANY <= doc.title RETURN doc // dynamic disjunction with comparison

In addition, the `TOKENS()` and `PHRASE()` function can be used with arrays as parameter. For more information on the array support, see the release notes.

In ArangoDB 3.6 we have added edge n-gram support to the Analyzer type `text` of ArangoSearch. For each token (word) `edge n-grams` are generated. This means that the beginning of the `n-gram` is anchored to the beginning of the token, while the `ngram` analyzer would generate all possible substrings from a single input token (within the defined length restrictions).

Edge n-grams can be used to cover word-based auto-completion queries with an index.

UTF-8 support and the ability to mark the start/end of the sequence for the `n-gram` Analyzer type have been added. The marker is appended to `n-grams` and allows searching for these positions in tokens.

Example Analyzer and Query:

arangosh>analyzer.save("myNgram", "ngram", { min:2, max:3, startMarker: "^", endMarker: "$", streamType: "utf8"})
FOR d IN view x
  SEARCH ANALYZER(d.category == "^new", "myNgram")

The marker “^” now restricts category results to those that begin with “new”.

Take ArangoDB 3.6 for a test drive. Any feedback is, as always, highly appreciated! If you are upgrading from a previous version, check our General Upgrade Guide.

Join the “What is new in ArangoDB 3.6?” webinar to get a hands-on overview on the new features with our Product Manager, Ingo Friepoertner, on January 22, 2020 – 10am PT/ 1pm ET/ 7pm CET.

We hope you find many useful new features and improvements in ArangoDB 3.6. If you like to join the ArangoDB Community, you can do so on GitHub, Stack Overflow and Slack.

Download ArangoDB 3.6

Continue Reading

Performance analysis with pyArango: Part II 
Inspecting transactions

Performance analysis using pyArango Part I

A geo demonstration using Foxx

Ingo

Ingo Friepoertner

Ingo is dealing with all the good ideas from the ArangoDB community, customers and industry experts to improve the value provided by the company’s native multi-model approach. In former positions he worked as a product owner and tech consultant, building custom software solutions for large companies in various industries. Ingo holds a diploma in business informatics from FHDW University of Applied Sciences.

Leave a Comment





Get the latest tutorials, blog posts and news: