Why You Should Use Neo4j in Your Next Ruby App

Share this article

neo4j-logo-2015 (1)

I have needed to store a lot of data in my time and I’ve used a lot of the big contenders: PostgreSQL, MySQL, SQLite, Redis, and MongoDB. While I’ve built up extensive experience with these tools, I wouldn’t say that any of them have ever made the task fun. I fell in love with Ruby because it was fun and because it let me do more powerful things by not getting in my way. While I didn’t realize it, the usual suspects of data persistence were getting in my way. But I’ve found a new love: let me tell you about Neo4j.

What is Neo4j?

Neo4j is a graph database! That means that it is optimized for managing and querying connections (relationships) between entities (nodes) as opposed to something like a relational database which uses tables.

Why is this great? Imagine a world with no foreign keys. Each entity in your database can have many relationships referring directly to other entities. If you want to explore the relationships there are no table or index scans, just a few connections to follow. This matches up well with the typical object model. It is more powerful, though, because Neo4j, while providing a lot of the database functionality that we expect, gives us tools to query for complex patterns in our data.

Introducing ActiveNode

To connect to Neo4j we’ll be using the neo4j gem. You can find instructions for connecting to Neo4j in your Rails application in the gem’s documentation. Also the app with the code shown below is available as a running Rails app in this GitHub repository (use the sitepoint Git branch). When you’ve got your database up and running use the rake load_sample_data command to populate your database.

Here is a basic example of an Asset model from an asset management Rails app:

app/models/asset.rb

class Asset
  include Neo4j::ActiveNode

  property :title

  has_many :out, :categories, type: :HAS_CATEGORY
end

Let’s break this down:

  • The neo4j gem gives us the Neo4j::ActiveNode module, which we include to make a model.
  • The class name Asset means that this model will be responsible for all nodes in Neo4j labeled Asset (labels play a similar role to table names except that a node can have many labels).
  • We have a title property to describe the individual nodes
  • We have an outgoing has_many association for categories. This association helps us find Category objects by following HAS_CATEGORY relationships in the database.

With this model we can perform a basic query to find an asset and get it’s categories:

2.2.0 :001 > asset = Asset.first
 => #<Asset uuid: "0098d2b7-a577-407a-a9f2-7ec4153cfa60", title: "ICC World Cup 2015 ">
2.2.0 :002 > asset.categories.to_a
 => [#<Category uuid: "91cd5369-605c-4aff-aad1-b51d8aa9b5f3", name: "Classification">]

Anybody familiar with ActiveRecord or Mongoid will have seen this hundreds of times. To get a bit more interesting, let’s define a Category model:

class Category
  include Neo4j::ActiveNode

  property :name

  has_many :in, :assets, origin: :categories
end

Here our association has an origin option to reference the categories association on the Asset model. We could instead specify type: :HAS_CATEGORY again if we wanted to.

Creating Recommendations

What if we wanted to get all assets that share a category with our asset?

2.2.0 :003 > asset.categories.assets.to_a
 => [#<Asset uuid: "d2ef17b5-4dbf-4a99-b814-dee2e96d4a09", title: "WineGraph">, ...]

So what just happened? ActiveNode generated a query to the database which specified a path from our asset to all other assets which share a category. The database then returned just those assets to us. Here’s the query that it used:

MATCH
  asset436, asset436-[rel1:`HAS_CATEGORY`]->(node3:`Category`),
  node3<-[rel2:`HAS_CATEGORY`]-(result_assets:`Asset`)
WHERE (ID(asset436) = {ID_asset436})
RETURN result_assets

Parameters: {ID_asset436: 436}

This is a query language called Cypher, which is Neo4j’s equivalent to SQL. Note particularly the ASCII art style of parentheses surrounding node definitions and arrows representing relationships. This Cypher query is a bit more verbose because ActiveNode generated it algorithmically. If a human were to write the query it would look something like:

MATCH source_asset-[:HAS_CATEGORY]->(:Category)<-[:HAS_CATEGORY]-(result_assets:Asset)
WHERE ID(source_asset) = {source_asset_id}
RETURN result_assets

Parameters: {source_asset_id: 436}

I find Cypher easier and more powerful than SQL, but we won’t worry too much about Cypher in this article. If you want to learn more later you can find great tutorials and a thorough refcard.

As you can see, we can use Neo4j to span across our entities. Big deal! We can also do this in SQL with a couple of JOINS. While Cypher seems cool, we’re not breaking any major ground yet. What if we wanted to use this query to make some asset recommendations based on shared categories? We’ll want to sort the assets to rank those with the most categories in common. Let’s create a method on our model:

class Asset
  ...

  Recommendation = Struct.new(:asset, :categories, :score)

  def asset_recommendations_by_category(common_links_required = 3)
    categories(:c)
      .assets(:asset)
      .order('count(c) DESC')
      .pluck('asset, collect(c), count(c)').reject do |_, _, count|
      count < common_links_required
    end.map do |other_asset, categories, count|
      Recommendation.new(other_asset, categories, count)
    end
  end
end

There are a few interesting things to note here:

  • We are defining variables as part of our chain to use later (c and asset).
  • We are using the Cypher collect function to give us a result column containing an array of the shared categories (see the table below). Also note that we are getting full objects, not just columns/properties:
asset collect(c) count(c)
#<Asset> [#<Category>] 1
#<Asset> [#<Category>, #<Category>, …] 4
#<Asset> [#<Category>, #<Category>] 2

Did you notice that there is not a GROUP BY clause? Neo4j is smart enough to realize that collect and count are aggregation functions and it groups by the non-aggregation columns in our result (in this case that’s just the asset variable).

Take that SQL!

As a last step we can make recommendations on more than just categories in common. Image that we have the following sub-graph in Neo4j:

In addition to shared categories, let’s account for how many creators and viewers assets have in common:

class Asset
  ...
  Recommendation = Struct.new(:asset, :score)

  def secret_sauce_recommendations
    query_as(:source)
      .match('source-[:HAS_CATEGORY]->(category:Category)<-[:HAS_CATEGORY]-(asset:Asset)').break
      .optional_match('source<-[:CREATED]-(creator:User)-[:CREATED]->asset').break
      .optional_match('source<-[:VIEWED]-(viewer:User)-[:VIEWED]->asset')
      .limit(5)
      .order('score DESC')
      .pluck(
        :asset,
        '(count(category) * 2) + (count(creator) * 4) + (count(viewer) * 0.1) AS score').map do |other_asset, score|
      Recommendation.new(other_asset, score)
    end
  end
end

Here we delve deeper and start forming our own query. The structure is the same but, rather than finding just one path between two assets via a shared category, we also specify two more optional paths. We could make all three paths optional, but then Neo4j would need to compare our asset with every other asset in the database. By using a match rather than an optional_match for our path through Category nodes we require that there be at least one shared category. This vastly limits our search space.

In the diagram there is one shared category, zero shared creators, and two shared viewers. This means that the score between “Ruby” and “Ruby on Rails” would be:

(1 * 2) + (0 * 4) + (2 * 0.1) = 2.2

Also note that we’re doing a calculation (and sorting) on a count aggregation of these three paths. That’s so cool to me that it makes me tingle a little to think about it…

Easy Authorization

Let’s tackle another common problem. Suppose your CEO comes by your desk and says “We’ve built a great app, but customers want to be able to control who can see their stuff. Could you build in some privacy controls?” It seems simple enough. Let’s just throw on a flag to allow for private assets:

class Asset
  ...
  property :public, default: true

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user")
      .pluck(:asset)
  end
end

With this you can display all of the assets which a user can see either because the asset is public or because the viewer owns it. No problem, but again not a big deal. In another database you could just do a query on two columns/properties. Let’s get a bit crazier!

The Product Manager comes to you and says “Hey, thanks for that, but now people want to be able to give other users direct access to their private stuff”. No problem! You can build a UI to let users add and remove VIEWABLE_BY relationships for their assets and then query them like so:

class Asset
  ...

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user")
      .pluck(:asset)
  end
end

That would have been a join table otherwise. Here you just throw in another path by which users can have access to an asset. You take a moment to appreciate Neo4j’s schemaless nature.

Satisfied with your days’ work you lean back in your chair and sip your afternoon coffee. Of course, that’s when the Social Media Customer Care Representative drops by to say “Users love the new feature, but they want to be able to create groups and assign access to groups. Can you do that? Oh, also, could you allow for an arbitrary hierarchy of groups?” You stare deeply into their eyes for a few minutes before responding: “Sure!”. Since this is starting to get complicated, let’s look at an example:

If both of the assets are private your code so far gives Matz and tenderlove access to Ruby and DHH access to the Ruby on Rails. To add group support you start by following directly assigned groups:

class Asset
  ...

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user OR asset-[:VIEWABLE_BY]->(:Group)<-[:BELONGS_TO]-user")
      .pluck('DISTINCT asset')
  end
end

That was pretty easy, since you just needed to add another path. It’s two hops, sure, but that’s old hat for us by now. Tenderlove and Yehuda will be able to see the “Ruby on Rails” asset because they are members of the “Railsists” group. Also note: now that some users have multiple paths to an asset (like Matz to Ruby via the Rubyists group and via the CREATED relationship) you need to return DISTINCT asset.

Specifying an arbitrary path through a hierarchy of groups takes you a bit more time, though. You look through the Neo4j documentation until you find something called “variable relationships” and give it a shot:

class Asset
  ...

  def self.visible_to(user)
    query_as(:asset)
      .match_nodes(user: user)
      .where("asset.public OR asset<-[:CREATED]-user OR asset-[:VIEWABLE_BY]->user OR asset-[:VIEWABLE_BY]->(:Group)<-[:HAS_SUBGROUP*0..5]-(:Group)<-[:BELONGS_TO]-user")
      .pluck('DISTINCT asset')
  end
end

Here you’ve done it! This query will find assets accessible to a group and traverse any set of zero to five HAS_SUBGROUP relationships, finally ending on a check to see if the user is in the last group. You’re the hero of the story and your company showers you with bonuses for getting the job done so quickly!

Conclusion

There are many awesome things that you can do with Neo4j (including using it’s amazing web interface to explore your data with Cypher) which I’m not able to cover. Not only is it a great way to store your data in an easy and intuitive way, it provides a lot of benefits for efficient querying of highly connected data (and believe me your data is highly connected, even if you don’t realize it). I encourage you to check out Neo4j and give it a try for your next project!

Frequently Asked Questions about Using Neo4j in Your Next Ruby App

What are the benefits of using Neo4j with Ruby?

Neo4j is a graph database that provides a flexible and efficient way to store, process, and query data. When used with Ruby, a dynamic, open-source programming language, it allows developers to create powerful, data-driven applications. The combination of Ruby’s simplicity and Neo4j’s powerful graph processing capabilities makes it an excellent choice for developing complex applications that require efficient data handling and manipulation.

How do I get started with Neo4j in Ruby?

To get started with Neo4j in Ruby, you first need to install the ‘neo4j’ gem. This can be done by adding gem 'neo4j' to your Gemfile and running bundle install. Once the gem is installed, you can establish a connection to your Neo4j database using the Neo4j::Session.open method.

How do I perform CRUD operations in Neo4j using Ruby?

CRUD operations in Neo4j using Ruby can be performed using the ActiveGraph gem. This gem provides a set of methods that allow you to create, read, update, and delete nodes and relationships in your Neo4j database. For example, to create a new node, you can use the create method, like so: Person.create(name: 'John Doe').

How can I use Cypher queries in Ruby?

Cypher is Neo4j’s query language, and it can be used in Ruby through the query method provided by the ‘neo4j’ gem. This method allows you to write and execute Cypher queries directly from your Ruby code. For example, to find all people named ‘John Doe’, you could use the following code: Neo4j::Session.query("MATCH (p:Person {name: 'John Doe'}) RETURN p").

What are the best practices for using Neo4j with Ruby?

When using Neo4j with Ruby, it’s important to follow best practices to ensure your application is efficient and maintainable. These include using indexes to speed up queries, keeping your Cypher queries as simple as possible, and using the ActiveGraph gem to handle CRUD operations.

How can I handle relationships in Neo4j using Ruby?

Relationships in Neo4j can be handled in Ruby using the ActiveGraph gem. This gem provides methods for creating, querying, and manipulating relationships between nodes. For example, to create a relationship between two nodes, you can use the relate_to method, like so: john_doe.relate_to(jane_doe, 'KNOWS').

How can I optimize my Neo4j queries in Ruby?

Optimizing your Neo4j queries in Ruby can be done by using indexes, keeping your queries as simple as possible, and avoiding expensive operations like full graph scans. Additionally, you can use the EXPLAIN and PROFILE keywords in your Cypher queries to understand how they are being executed and identify potential performance issues.

How can I handle errors in Neo4j using Ruby?

Errors in Neo4j can be handled in Ruby using standard error handling techniques. The ‘neo4j’ gem provides a set of custom error classes that you can use to catch and handle Neo4j-specific errors. For example, you can use a begin-rescue block to catch and handle a Neo4j::ActiveNode::Labels::RecordNotFound error.

Can I use Neo4j with Ruby on Rails?

Yes, you can use Neo4j with Ruby on Rails. The ‘neo4j’ gem provides full support for Rails, including integration with ActiveRecord and ActiveSupport. This allows you to use Neo4j as your database in a Rails application, and take advantage of Rails’ powerful ORM capabilities.

How can I secure my Neo4j database when using it with Ruby?

Securing your Neo4j database when using it with Ruby can be done by following standard security practices. These include using strong, unique passwords for your database, enabling encryption for data in transit and at rest, and keeping your Neo4j and Ruby software up to date. Additionally, the ‘neo4j’ gem provides support for using SSL to secure your database connections.

Brian UnderwoodBrian Underwood
View Author

Brian Underwood is a developer advocate for Neo4j and one of the maintainers of the neo4j.rb project. He is currently traveling the world with his wife and three year old son. You can find him as cheerfulstoic on GitHub, Twitter, Google+, or his website.

GlennGneo4j
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week