Graph Databases FTW

I always try to slip in some new technology when working on some new idea or project to keep myself from doing the same ‘ol.  Recently, I’ve been working on my own (very many) weekend project.  Because this is a social website, I had to go through the paces of persisting the “social graph”.  Namely, who your friends are and who their friends are (rinse and repeat).

Having chosen MongoDB as my somewhat trusty back-end database, it would have been convenient to shoe-horn that fat social graph into the database.  I mean, blogs and books have written about how to do this in your typical relational database.   Some are still figuring out the best way.  Some even got a really big shoe-horn.  Even MongoDB, which is a document db but similar enough, has its own solution page for this common task.

Getting it right with one of these solutions isn’t easy.  Especially when you start to scale and new features demand deeper traversals into the social graph.  Not to mention is it quite the maintenance nightmare.

So from day one, I figured i’d go all out and get myself a shiny graph database to add to my technology stack and see what a difference it would make.  Neo4J seemed like a good a choice as any, so I downloaded it and plugged away.  Luckily, I caught it right in the transition to a REST-based server which is a much better design for my needs.

The first thing I noticed, having read through all the documentation and their cheat sheet on reproducing IMDB, is that designing a graph database schema is a pretty natural thought process.  I literally grabbed a piece of paper and was planning to rewrite my whole data layer to just use a Neo4J! I mean, it has: transactions, embedded Lucene for full-text search and indexing, solves my use-cases, and a nifty back-end management tool.

Having some sanity though, I stopped myself before actually doing it and scaled it back just to dealing with the social graph.  I figured i’d give MongoDB a chance to do what I already wrote it to do – which is persist all the crap I gave it – and I added Neo4J to only contain the social graph information, i.e. user identifiers and their relationships.  Each user document in MongoDB kept the node ID of the Neo4J node – and each node in Neo4J kept the user ID of the user document in MongoDB. That way I could look up any user node in the graph and start any traversals as needed.  And when I found what I was looking for, I could look it up in MongoDB to get all the details. (Side-note: the built-in Lucene in Neo4J is also a great solution for many lookup/search use-cases but I personally didn’t have any use for it since I’m using Solr already).

Honestly, it was a painless process and works pretty damn nicely.  It’s also very fast and flexible now.  Do I want to know who your friends are? No problem.  Do I want to know your friends’ friends’ friends’ friends are?  I can do that too by changing a function parameter.  And the results will come back pretty much immediately.

Overall I think i’ll be using Neo4J – and graph databases in general – for those tasks involving non-trivial relationships.  For everything?  Probably not….but getting it to support paging is a good start.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: