Quite a topical technology given its recent use by the International Consortium of Investigative Journalists (ICIJ) in their analysis of the Mossack Fonseca “Panama Papers” where they used it to help uncover highly interconnected networks of off-shore tax structures.
Whilst, in the past, I’ve been a hard-core programmer (originally in assembler and since in various other languages, including object oriented varieties), for many years now I’ve been a data professional, working with all manner of relational database systems and becoming proficient with SQL. However, until more recently, I hadn’t done much with web or app programming nor with NoSQL databases.
So, where to start? I wasn’t looking to get into semantic graphs or RDF / SPARQL, my first thought was to get myself a decent grounding in general Graph Databases. Some research (for “research” read “Googling”) seemed to lead me to two options: Apache Tinkperpop or Neo4j. I could download and try out either for free, so how to choose?
Some time ago I’d read “Seven Databases in Seven Weeks” and so I’d heard of Neo4j (which features as one of the seven), so for no other reason that seemed like a good place to start.
My next dilemma was, what query language to use to interact with it?
Neo4j comes with its own language “Cypher”, surely that would be the best place to start?
But wait a minute, I’m trying to lean Graph Databases in general not necessarily get tied in to a proprietary language, are there any “open” graph languages out there? And so I came across “Gremlin” and saw that it too could be used with Neo4j.
So, Cypher or Gremlin?
Well without too deep an investigation I was leaning towards Gremlin given that it was an open source language, but then I found that Neo Technologies were moving in the same direction with openCypher, so that put that argument to bed; time to dig deeper.
As I said, in a past life I’ve been a low-level programmer and I’ve used object oriented languages, but I’ve not seriously used Java. I can read the code, I know the basic syntax and have used it within tools link Talend, but I don’t know all the libraries and at this point I’d certainly not been anywhere near Groovy.
Why is this of interest I hear you ask? Well a quick look at Gremlin showed me that it was very programmatic, with a deeper dig I found that it was based on Groovy and so Java too. Now this isn’t a bad thing, especially if you are a web developer or an all-around Java-head, but I’m not; these days I am much more familiar and comfortable with SQL. This Gremlin is starting to look, for me, like a bit of a mountain.
This is where my pragmatic side kicks in… “I’m looking to learn this technology as quickly as I can, it’s all unfamiliar to me at the moment; surely if I learn Cypher the basic understanding of graphs will transfer if I need to learn Gremlin later” …and so I went for Cypher as it has a much more SQL-like feel to it; however, SQL it most certainly is not.
Like SQL, Cypher is a declarative query language and uses an ASCII Art style where the queries form node pictograms; you describe the desired query results as opposed to the Java imperative style of Gremlin and this certainly seems to suit my more recent SQL based experience. I got to work, learnt the language and, after a couple of projects, even got myself certified!
So, which should you choose?
Well, if you are a Java/Groovy programmer, why wouldn’t you choose Gremlin? Absolutely no reason whatsoever, it’ll be right up your street.
However, if like me you have a lot of experience in declarative database query languages like SQL you’ll probably learn Cypher an awful lot faster.
If you choose Cypher, you’ll probably hear or see arguments that Gremlin is more powerful and that there are things you can do in Gremlin that you can’t in Cypher. Well some of that may be true, but I’ve developed a few solutions now and I’m yet to come across anything that I’ve had to resort to Gremlin for.
In the past the Java guys would tell you that Gremlin can provide you with custom procedures that Cypher can’t but the latest version of Cypher released with Neo4j 3 addresses that, so I’ll stick with my assertion above. If you come from a Java background then embrace Gremlin, you won’t regret it (and it has a cool logo after all), but if you come from a SQL background you’ll get up to speed and be productive much more quickly with Cypher.
Mark Fulgoni is a Principal Consultant in Red Olive’s Data Practice.