Graph Database with Neo4j and a .NET Client

Sep 14th, 2016 10:25am by Chris Skardon and Michael Hunger

Feature image: “Random Number Multiples – RGB” by Jer Thorp is licensed under CC BY 2.0.

Chris Skardon is the owner of Tournr and is a .NET developer in charge of maintaining the primary .NET client for Neo4j (a.k.a. the Neo4jClient).

Chris Skardon, Neo4j .Net expert and Michael Hunger, developer advocate for Neo4j, detail some interesting development attributes of the graph database approach.

We’ve all experienced the challenges of trying to map complex domain object networks into a relational database. For all the help you might get from an Entity Framework or APIs like LINQ, it’s never been simple to do — especially when queries get complex and entities get more connected.

Maybe it’s time to try a different approach: graph databases. In this article, we describe some work using one, Neo4j, working with .NET as the client, in order to show results that would be much harder to achieve using RDBMS.

Neo4j 101

All graph databases are ideal for storing related data. Neo4j is ideally suited as it was architected to be a native graph database with a design emphasis from the start on fast management, storage and traversal of nodes and relationships.

The underlying storage model treats relationships as “first-class citizens” that represent pre-materialized connections between entities, allowing constant time navigation (JOINs) from one node to another.

And by taking a different approach to storing and querying connections between entities, the optimized Neo4j graph engine provides traversal performance of up to 4 million hops per second and per core. Note: As most graph searches are local to the larger neighborhood of a node, the total amount of data stored in your database doesn’t affect the runtime of your operations, which is another clear advantage of native graph technology.

Scaling

For business critical and high-performance operations, Neo4j can be deployed as a scalable, fault-tolerant cluster of machines. Due to the high levels of scalability – even on a single machine – Neo4j clusters only require single digits in terms of machines, not hundreds or thousands, saving a lot of costs and operational complexity. Neo4j uses a replicated master-slave cluster setup and support for hot-backups and extensive monitoring. Neo4j can also store very large numbers of entities while being sensitive to compact storage.

The Property Graph Model

The underlying property graph model of Neo4j is one of labeled nodes connected by directed, named relationships, both of which can hold arbitrary properties. This allows for an expressive representation of any domain or use case.

Nodes are not constrained by a rigid schema, and their efficiency depends on the length of the pathways that are searched rather than the overall size of the graph. Their structure is very simple, a network of nodes connected to each other by way of a relationship object, as shown here:

This is called the labeled property graph model, allowing you consistent use of the same model through conception, design, implementation, storage and visualization. This allows not only developers but all business stakeholders to actively and usefully participate throughout the development of the application, as the model is intuitive and easier for non-technical people to grasp rather than relational database schemas and tables. You can evolve your domain model as quickly as your requirements change, as no costly schema changes and migrations are necessary.

Querying Graphs

Michael Hunger, head of developer relations of Neo4j, Neo Technology

Michael Hunger has been passionate about software development for a very long time. For the last few years, he has been working with Neo Technology on the open source Neo4j graph database filling many roles. As caretaker of the Neo4j community and ecosystem he especially loves to work with graph-related projects, users and contributors. As a developer Michael enjoys many aspects of programming languages, learning new things every day, participating in exciting and ambitious open source projects and contributing and writing software related books and articles.

However, while it is straightforward to draw graph models and pose useful application flow simulations with them on a whiteboard, expressing them with SQL is not so easy. For several years now we have evolved Cypher — our graph query language — to be able to query, modify, create and describe data in graphs. Last year Cypher turned into an open source project (openCypher.org) for anyone to use and implement.

Cypher uses ASCII-art to represent graph patterns of nodes and relationships, deeply integrating the graph model into the query language. Centered around the patterns expressing concepts or questions from a domain are additional clauses and functions that form a highly capable, expressive and readable query language. The language also offers means for custom extensions for specific purposes.

The relationship shown in this diagram would be expressed in Cypher as:

(:Actor {name:'Tom Hanks'})-[:ACTED_IN]-&gt;(:Movie {title:'Cast Away'})

1	(:Actor {name:'Tom Hanks'})-[:ACTED_IN]->(:Movie {title:'Cast Away'})

The node with the label ‘Actor’ is connected to the node labeled ‘Movie’ through the relationship ‘ACTED_IN’. The direction of the arrow defines the direction of the relationship, in this case, it is Tom Hanks acted in Cast Away.

Nodes can have properties as well as labels. In this example, the Actor node has a single property ‘name’ with the value ‘Tom Hanks’ and the Movie node has the property ‘title’ set to the value ‘Cast Away.’ Relationships can also have properties that qualify the relationship between entities, like cost, weight, ratings, distance, time-intervals.

Searching is carried out by using label and property-based lookups and then following node and relationship pathways. Returning all the movies acted in by Tom Hanks would involve looking in the Actor index for the name ‘Tom Hanks,’ to locate his node, and then following all the outgoing ACTED_IN relationships to find all of the Movie nodes.

Relationships can be traversed in either direction, so, finding the cast of Cast Away would involve a similar process as before: First, locate the ‘Cast Away’ node and then follow all incoming ACTED_IN relationships.

Using Neo4j with .NET

Let’s consider some coding examples in context. We have selected .NET as the environment, and by definition as .NET is a mainly Windows-based development ecosystem, the following will cover the installation on Windows, but the examples should work in your choice of delivery client.

Windows Installation

To play with Neo4j, it will need to be installed. It can be downloaded here. Note you have the option of downloading the executable installer, or the zipped up package with more manual installation required.

First Look

With an RDBMS, you typically have a management studio or some other heavyweight environment, but with Neo4j you have the “Neo4j Browser” a lightweight, web-based tool.

The Neo4j Browser is accessible at the port you’ve installed Neo4j to and, by default, that’s 7474, so if you open http://localhost:7474/ in your web browser of choice, you’ll be shown to the start page of the database.

Log in with the default credentials and choose a new password. There’s not a lot to see at the moment; we’ve got an empty database, and an empty database is no fun for anyone. The Neo4j Browser comes with a number of built-in guides which make you familiar with the property graph concepts mentioned before and also walks you through the basics of Cypher.

Those guides cover both a movie graph (:play movie graph) and show how to import the well-known Northwind database (:play northwind graph). After you’ve checked them out, you can clean out your database with a quick command: “MATCH (n) DETACH DELETE n”.

But being developers, we want to access Neo4j programmatically. So let’s look at how we can connect.

Connectivity

Neo4j 2.3.x has one way we (as .NET developers) can connect to it, and that’s via an HTTP API exposed via the server. With the Neo4j 3.0 release, we got access to a newer API, over the bespoke Bolt binary protocol.

If you need to run against 2.x database instances, you’ll need to use an HTTP-based client to Neo4j (such as Neo4jClient). If you’re running against a 3.x database, you can choose between an HTTP-based client or a Bolt-based client such as the Neo4j-Dotnet-Driver.

The best practice is first to try your Cypher in the Neo4j Browser, and then translate to your driver of choice.

Neo4jClient

Neo4jClient is the most commonly used client to access Neo4j on the .NET platform, originally written by Tatham Oddie et al. at Readify Australia. It is now maintained by Chris Skardon, with contributions from a lot of others. It’s based on the HTTP API at the moment, but there are plans to make it work with Bolt soon.

Neo4jClient uses a fluent interface, and the writing of code closely matches the Cypher statement you would write to access graph information directly in the Neo4j Browser.

Neo4j.Driver

Neo4j.Driver is the new kid on the block, built by the engineers at Neo4j and Chris Skardon, and designed to be a low-level, high-performance driver for executing Cypher against the database. It uses the same simple concepts as the other official drivers for Java, JavaScript and Python to lower the overall learning curve. All those drivers are idiomatic to their language but built on the similar principles.

Neo4jClient Examples

For the examples that follow, we’ll be using the example movies database that you’ve seen in the guide before. You can install it from the Neo4j Browser by typing:

:play movies

1	:play movies

Then, follow the instructions to insert the data.

As we now have some data in our database, the first step should be to get some of it back! The assumption is that the database is running at localhost:7474 and for your driver of choice you have connected to the database:

var client = new GraphClient(new Uri("http://localhost:7474/db/data",
"neo4j", "password"));
client.Connect();

var client = new GraphClient(new Uri("http://localhost:7474/db/data",

"neo4j", "password"));

client.Connect();

Example 1 — Reading

We want to find the Persons who ACTED_IN a given Movie. A Person has two properties — when they were born, and their name, but we’re only interested in their names in this case.

IEnumerable&lt;string&gt; results = client.Cypher
.Match("(p:Person)-[:ACTED_IN]-&gt;(m:Movie {title: 'Top Gun'})")
.Return&lt;string&gt;("p.name")
.Results;

IEnumerable<string> results = client.Cypher

.Match("(p:Person)-[:ACTED_IN]->(m:Movie {title: 'Top Gun'})")

.Return<string>("p.name")

.Results;

We can then parse the results in whatever way we choose.

Example 2 — Reading More

We probably want more information than just the names, and that’s where we need to define some Plain Old CLR Objects (POCO) for our results, in this case, a Person class and a Movie class:

class Movie {
public string title { get; set; }
}
class Person {
public string name { get; set; }
public int born { get; set; }
}

class Movie {

public string title { get; set; }

}

class Person {

public string name { get; set; }

public int born { get; set; }

}

The casing of the property is lowercase to match the property names in the sample database, to get it more .NET-y — you can use [JsonProperty(“title”)] to name your properties as you like. I’ve opted to match the case of the database to make the code smaller in this case.

var results = client.Cypher
.Match(
"(actor:Person)-[:ACTED_IN]-&gt;(movie:Movie {title: {nameParam}})",
"(movie)&lt;-[:DIRECTED]-(director:Person)"
)
.WithParam("nameParam", movieName)
.Return((actor, director, movie) =&gt; new
{
Movie = movie.As&lt;Movie&gt;(),
Actors = actor.CollectAs&lt;Person&gt;(),
Director = director.As&lt;Person&gt;()
})
.Results.Single();

var results = client.Cypher

.Match(

"(actor:Person)-[:ACTED_IN]->(movie:Movie {title: {nameParam}})",

"(movie)<-[:DIRECTED]-(director:Person)"

)

.WithParam("nameParam", movieName)

.Return((actor, director, movie) => new

{

Movie = movie.As<Movie>(),

Actors = actor.CollectAs<Person>(),

Director = director.As<Person>()

})

.Results.Single();

In this query, we’re returning a Movie, it’s Director and all the Actors. We’re also passing in the name of the movie as a parameter — this is important both from a performance point of view (allowing the database to compile and cache queries) but also security — restricting Cypher injection attacks. The response is returned in an anonymous type, with all the normal benefits and restrictions that that entails. We can parse through it with something like:

Console.WriteLine($"{results.Movie.Title} directed by {results.Director.name}");
foreach (var actor in results.Actors)
{
Console.WriteLine($"\t{actor.name}");
}

Console.WriteLine($"{results.Movie.Title} directed by {results.Director.name}");

foreach (var actor in results.Actors)

{

Console.WriteLine($"\t{actor.name}");

}

As you can see, we’re accessing the properties directly, and Neo4jClient has automatically parsed the results (via Json.NET).

Example 3 — Updating in Transactions

Neo4j is a database, and real databases have transactions — so let’s update a movie and add an actor. First, we need to start a transaction — with Neo4jClient — the solution is to cast your IGraphClient instance to an ITransactionalGraphClient instance.

ITransactionalGraphClient txClient = client;

1	ITransactionalGraphClient txClient = client;

Then we need to perform our queries within a transaction:

using (var tx = txClient.BeginTransaction()) {
txClient.Cypher
.Match("(m:Movie)")
.Where((Movie m) =&gt; m.title == originalMovieName)
.Set("m.title = {newMovieNameParam}")
.WithParam("newMovieNameParam", newMovieName)
.ExecuteWithoutResults();

txClient.Cypher
.Match("(m:Movie)")
.Where((Movie m) =&gt; m.title == newMovieName)
.Create("(p:Person {name: {actorNameParam}})-[:ACTED_IN]-&gt;(m)")
.WithParam("actorNameParam", newActorName)
.ExecuteWithoutResults();

tx.Commit();
}

using (var tx = txClient.BeginTransaction()) {

txClient.Cypher

.Match("(m:Movie)")

.Where((Movie m) => m.title == originalMovieName)

.Set("m.title = {newMovieNameParam}")

.WithParam("newMovieNameParam", newMovieName)

.ExecuteWithoutResults();

txClient.Cypher

.Match("(m:Movie)")

.Where((Movie m) => m.title == newMovieName)

.Create("(p:Person {name: {actorNameParam}})-[:ACTED_IN]->(m)")

.WithParam("actorNameParam", newActorName)

.ExecuteWithoutResults();

tx.Commit();

}

We’re doing two things with this query: First, we rename a given movie (given by the originalMovieName parameter) to a new name (newMovieName parameter). After that, we add a new actor (newActorName) to the renamed movie.

The transaction is only committed with the tx.Commit() call. If an exception or tx.Rollback() is called, the transaction is aborted and the changes are rolled back.

Real-world Uses of .NET Neo4j

It’s simple to use Neo4j for test or demo projects but are there any real world uses of Neo4j from a .NET point of view? There are great examples you can check out from ACL / Filesystem management systems (Aikux) to competition-running sites (Tournr) to a private story matching system (Onty) — even an atlas of history (The Codex.)

The graph model has proved to be compelling enough for the likes of Tournr to switch from a document database to Neo4j. That’s because the team there found runtime issues around lots of DB hits to look up information, or (to prevent that) use of larger documents with lots of duplication of information which was held elsewhere in the DB.

From a visual standpoint, looking at a tournament and registrants is far simpler and more intuitive — and the important thing is that you can rough out the model on a whiteboard/pad of paper/napkin, and it’s the same in the database. There is no conceptual leap needed, unlike with RDBMS.

Document DB Representation

That’s because, to increase performance of the pages, the information about a user is duplicated in many places, so there would only be one call to the DB. Mentally, there is a bit of work required here to get it.

Simple queries such as who is registered for a class are pretty simple — just look at the class in question. But slightly more complex queries, such as find who is registered for the competition, are more complex as you need to look through each class then join the results.

Graph DB Representation

The graph way of doing that is far easier to understand, as the relationships modeled to make it simple to ask basic questions as it’s a simple traversal from the root Tournament.

In summary, the native graph approach to working with data as instantiated in Neo4j is better because it allows developers to:

● maintain a rich data model.
● handle relationships efficiently.
● write queries easily.
● develop applications quickly.

For .NET developers, the list increases, as you are also availed of a:

● Neo4j installer.
● drivers for Neo4j from .NET.
● a host database on Azure.
● the ability to deploy apps to Azure.

If you are frustrated with RDBMS slowness, rigid schema structures and type-safe entities, graphs could be an ideal next move.

Neo4j’s API will, of course, be initially unfamiliar, but stick with it and you’ll be able to see how graph technology can help you build great apps quickly and easily with your users.

Chris Skardon is the owner of Tournr and is a .NET developer in charge of maintaining the primary .NET client for Neo4j (a.k.a. the Neo4jClient).