Memgraph 1.1 Up to 50% Better Memory Usage and Higher Throughput

October 1, 2020

Memgraph

Introduction

At Memgraph, we put great effort into delivering a high-performance in-memory graph storage and analytics engine. We do this by investing a lot of time optimizing and continuously improving various aspects of the Memgraph core engine. In this blog post, we will explore some of the improvements we have made on the storage layer and their impact on performance and memory usage.

The two most significant improvements we introduced in recent years are a new storage engine and a new way of storing properties on both nodes and edges. Prior to version v0.50.0 Memgraph had an MVCC storage where copies of nodes and edges were used to version the data. Memgraph v0.50.0 introduced a new way of managing graph data where each node or edge consists of the latest version of data, and associated changes of data required to reconstruct previous versions.

Memgraph v1.1.0 introduced a new way of storing properties. Each node or edge has a property store that takes at least 16B of memory. Memgraph tries to hold properties data in 16B on the stack, if possible. If the properties data exceeds 16B, the first 8B of the stack buffer indicates the total number of bytes required for the storage of properties, and the second 8B a pointer to the array on the heap that stores properties. This technique is called small buffer optimization.

Additionally, there were various other smaller improvements of existing internal data structures, most notably the skip list which is used as an indexing data structure. The combined result was a massive improvement in memory usage with a substantial reduction in memory fragmentation while ensuring equivalent or better performance. Before jumping into the analysis and explanations, let's have a look at the benchmark setup.

Setup

Target Systems

The benchmark tests different Memgraph versions. The release dates and details of each version are the following:

v0.15.2, October 23, 2019, the last version of Memgraph that used an in-memory storage engine based on copies of nodes/edges.
v0.50.0, December 11, 2019, introduced the new in-memory storage engine based on data changes and a C-based storage API.
v1.0.0, April 6, 2020, introduced a Python-based storage API.
v1.1.0, July 1, 2020, added encoding and compression to node and edge properties.

The Memgraph Changelog Docs page contains more details about changes in each version.

Hardware

Server: HP DL360 G6
CPU: 2x Intel Xeon X5650 6C12T @ 2.67GHz
RAM: 144GB
Disk: 120GB SSD
OS: Debian 9 Stretch

Workload

The benchmark consists of several different queries that test the performance of the graph database. Typical graph database workloads consist of READ, CREATE, and ANALYZE queries.

Memgraph is particularly well suited for hybrid transactional-analytical workloads where it's crucial to ingest data as fast as possible and simultaneously deliver analytics as quickly as possible. For this reason, we are mostly interested in traversal queries (1-Hop, 2-Hop, etc.) which represent the majority of analytical queries.

In the following table, you can find the exact queries we have used for benchmarking.

|            Query Name             |                                                Query                                                        |  Query Type |
| --------------------------------- | ----------------------------------------------------------------------------------------------------------- | ----------- |
| Aggregation                       | `MATCH (n:User) RETURN n.age, COUNT(*);`                                                                    | ANALYZE     |
| 1-Hop Expand                      | `MATCH (s:User {id: $id})-->(n:User) RETURN n.id;`                                                          | ANALYZE     |
| 2-Hop Expand                      | `MATCH (s:User {id: $id})-->()-->(n:User) RETURN DISTINCT n.id;`                                            | ANALYZE     |
| 3-Hop Expand                      | `MATCH (s:User {id: $id})-->()-->()-->(n:User) RETURN DISTINCT n.id;`                                       | ANALYZE     |
| 4-Hop Expand                      | `MATCH (s:User {id: $id})-->()-->()-->()-->(n:User) RETURN DISTINCT n.id;`                                  | ANALYZE     |
| 2-Hop Variable Expand             | `MATCH (s:User {id: $id})-[*1..2]->(n:User) RETURN DISTINCT n.id;`                                          | ANALYZE     |
| 2-Hop Variable Expand with Result | `MATCH (s:User {id: $id})-[*1..2]->(n:User) RETURN DISTINCT n.id, n;`                                       | ANALYZE     |
| Shortest Path                     | `MATCH p=(n:User {id: $from})-[*bfs..15]->(m:User {id: $to}) RETURN extract(n in nodes(p) | n.id) AS path;` | ANALYZE     |
| Insert New Relationship           | `MATCH (n:User {id: $from}), (m:User {id: $to}) WITH n, m CREATE (n)-[e:Temp]->(m) RETURN e;`               | CREATE      |
| Find Node                         | `MATCH (n:User {id : $id}) RETURN n;`                                                                       | READ        |
| Insert a New Node                 | `CREATE (n:UserTemp {id : $id}) RETURN n;`                                                                  | CREATE      |

The benchmarking harness executed all queries against the Pokec dataset which contains 1.6M nodes and 30.6M edges. Given that the goal of each benchmark is to saturate the target system to extract its peak characteristics, we took great care in carefully designing and polishing our setup. Some of the essential elements of the harness are:

optimized C/C++ client to minimize client overhead.
fresh dataset load before each execution.
concurrent execution (up to 12 cores) to push Memgraph to its limits.

Data Import Analysis

Before delving deep into the workload queries execution analysis, a couple of notes about the data import.

The harness imported data in a real-time manner, equivalent to normal query execution by running queries. The data was imported using 8 concurrent clients. Throughput and peak memory usage were measured during the data import. A couple of interesting insights emerged. On the memory usage side, there is already a massive difference between Memgraph versions. Before v0.50.0, Memgraph stored all data modifications as whole copies of the modified objects. By removing the need to make whole copies of database objects, versions after v0.50.0 provide a significantly less memory usage. The same benefits apply to the runtime environment since the import uses regular queries.

At this point, you might be wondering about the import speed. As the chart below shows, throughput on basic CREATE queries also improved. Since data copying is generally a fast operation, throughput improvement is not huge but is still significant. One important thing to notice is the difference between v1.0.0 and v1.1.0. v1.1.0 has almost the same throughput as versions before even though more work is involved in property compression.

Query Execution Analysis

Let's start analyzing the workload queries. The following radar chart shows peak memory usage during query execution across different Memgraph versions. Keep in mind that less is better.

As you can see, v1.1.0 uses ~50% less memory compared to v0.15.2. v0.50.0 and v1.0.0 fall in-between with almost no difference because not much from the storage perspective changed between these two versions. The most significant difference is between v0.15.2 and v0.50.0 (introduction of the new storage engine), and v1.0.0 and v1.1.0 (introduction of encoded and compressed properties).

On the other hand, while looking at memory, it's also critical to observe what is happening with the throughput. Generally, there is a well-known trade-off between space and time. But, as you can see in the following chart, v1.1.0 has the best performance. Please note that in this case, more is better.

Of course, not all queries yield significant throughput improvements. E.g., the Insert New Node query performance stayed similar across different Memgraph versions. Nonetheless, the following chart illustrates well how Memgraph scales with the number of concurrent requests.

The new storage engine also introduced a feature where it is possible to disable storing properties on edges. Sometimes graph datasets don't have any data attached to edges. Memgraph offers a configuration option to remove all associated data structures required to store data on edges. As you can see in the following chart, by not having properties on edges, memory usage goes down by almost 50% (in addition to the reductions mentioned above). v0.15.2 can't disable the property storage on edges, so the chart shows nothing there. Using all the improvements in new versions of Memgraph you can store the same dataset in 3.36x less memory (comparing v0.15.2 with properties enabled and v1.1.0 with properties disabled).

The rest of the charts show performance for each query with regards to linear scalability. Linear scalability is the ability of a system to handle more work by adding more resources linearly. E.g., by doubling the number of cores, the system is capable of executing twice as many queries. In practice, it's impossible to reach perfect linear scalability due to various overheads. The real question is how far Memgraph is from linear scalability? As you can see below, not by a lot.

The following chart shows throughput data for the simple read query (Find Node). The node lookup inside Memgraph has an O(logN) complexity.

In the next case, instead of reading, Memgraph creates an edge. The operation contains two node lookups and one edge write. Which means it's very similar to the Find Node query. Instead of one lookup, the action has two lookups and one write. The chart looks very similar to the Find Node chart.

By moving towards more complex queries, there is a more significant scalability difference between previous versions of Memgraph and the latest ones. v0.15.2 is far behind the latest versions.

Conclusion

In this post, we have explored how the performance of Memgraph as evolved across version from v0.15.2 to v1.1.0. By introducing a new storage engine, encoding and compressing properties on both nodes and edges, and leveraging skip lists as an indexing data structure, we were able to reduce memory usage by as much as 50% and improve throughput towards near-linear scalability.

In our effort to bring you the most powerful graph analytics platform for your applications, we have many more improvements on the roadmap. In the meantime, if you would like to try out the latest version of Memgraph, you can download it for free or sign up for our hassle-free fully-managed service.

If you have any questions, feedback, or need help with your graph project, please let us know on our Discord server.