A Standard Graph Query Language Could Be Coming—Here’s What To Know

standard graph query language
Shutterstock Licensed Photo

IT standards make everything better. It is not a stretch to say that coming together as a community and developing, evolving, and committing to standards make better, cheaper, and more secure software and systems.

The International Standards Organization is considering a standard for Graph Query Language (GQL) which would ultimately allow for interoperability not only between Graph DBMS but eventually with the older, admittedly pervasive, SQL. A standardized graph query language would not only help vendors but would help customers accelerate digital transformation and re-platforming. The existing lack of this standard means that adjacent technologies, such as BI tools, have difficulty integrating with a graph database. Organizations are unlikely to adopt a new technology, such as a graph data base, when that adoption means they will be unable to leverage their existing investments in things such as BI tools.

Few people have been as articulate on this topic as Alastair Green, Director of Project Management at Neo4J. I had the opportunity to connect with Alastair and discuss the GQL initiative.

Clark: Can you tell us a little about where the GQL initiative came from?

Alastair: The idea was to get a single property graph query language which would do for graph what SQL does for relational data. The idea came up from two sources. First, there are three existing languages – Cypher, PGQL and a research language called G-CORE, which are very similar. They are like dialects of one language – and we thought it would be a good idea to bring them together.  But we didn’t just want to bring them together, we also wanted to try to get a more advanced graph query language.  This graph deserves its own language – and it’s not really going to happen as an extension to SQL.

Clark: When FactGem was designing our product, we explored a number of potential directions. But what was attractive about Cypher – and I think is attractive to many in the community – is its accessibility. It has a very shallow learning curve. It’s easily read, even by layman, to get the basic intent out of the language. It’s also very concise.  You can be very expressive in a small number of words.  I think those are all traits that the community wants to see continue in a unified graph query language.  Beyond that, there is a desire to have more advanced features, as Alastair references.  For example, you can look at some of the abilities in SQL that you can’t do in a standard Cypher language yet – like obtaining locks on nodes, which is very useful in advanced scenarios. Those are all things that will help graph databases obtain adoption in the marketplace.

Clark: What is the next step in the process going from the vote that is happening currently and moving forward?

Alastair: We put forward The GQL Manifesto, asking people in the property graph data community to vote on whether GQL for property graphs, side by side with SQL, would be a good idea. At this point, we’re running towards 3,000 votes (in just over a month) with 95 percent in favor.  The vote was an attempt to see what level of interest there is, and to point out that there is a group of people from different companies and academia working on these three languages.  In fact, they are all trying to take the languages in a similar direction.

Soon after the Manifesto went out, there was a discussion at the ISO SQL standards meeting, which took place in May.  There has also been support for GQL from the LDBC (Linked Data Benchmark Council). They’ve had a working group for the last couple years which produced G-CORE. A majority were very enthusiastic about the idea of taking G-CORE forward in the context of GQL.  This was discussed again at a recent meeting of LDBC in Austin, Texas.

There is going to be a more extended discussion of the GQL initiative within the American National Standards body in July.  We’re trying to move in two directions.  One is very open and easily accessible – starting the groundwork on technicalities, like analyzing the different languages and how they could overlap and how they could intersect or combine.  The other is working on getting the starting framework to ultimately go into the international standards process. We want to combine the openness of something like Cypher or the Apache-licensed, PGQL language from Oracle with the official stamp and character of SQL as an international standard.  It’s early days, but I think there is a lot of interest and we’re trying to bring people together to start working on what GQL is going to look like: what the scope is going to be, what the roadmap will be, what kinds of GQL language artifacts are needed, like specifications, technology compatibility kits or reference implementations, etc.

Clark:  This is incredibly important for our customers that are looking to simply their ETL and MDM processes. I think it will be a factor not to any organization that wants to adopt graph technology.  While graph technology adoption has very much increased over the last several years, it is still a fairly new technology and organizations always worry about lock in, particularly with new technologies.  GQL allows us to support multiple vendors with much less effort in a much shorter period of time, which gives consumers the ability to choose the best possible graph database that is on the market that fits their needs. Additionally, it will encourage vendors of supporting technologies, such as BI, to invest in the engineering required to support graph databases. Most of these companies have some ability to auto generate SQL but they can’t invest the engineering effort to do the same for graph until there is a standard language to target.

Clark: How do you see GQL relating to the semantic and Triples world of RDF and OWL?

Alastair: There is kind of a balance here.  On the one hand, we have the property graph data model as a well-established separate model.  There is a strong demand and a strong push, as Clark was just describing, to arrive at a single declarative language for property graph query and we think that that is something that’s achievable.  The basis is there in terms of these pre-existing languages, two of which are out there in production use.  So that’s a community that can come together to create one language.  On the other hand, there are pre-existing standards.  You referred to OWL.  There’s the SPARQL query language.  These are part of a repertoire of standards that were developed by WC3 for the RDF triplestore graph model.

The property graph world is more immature, so we want to catch up in terms of standards.  But that poses the question: how do we profitably co-exist? Because there are things you can do in the RDF world, in terms of schema and ontology, for example, it’s much more advanced than what we have in the property graph world.  There are reasons for using that kind of model and plenty of data stored in that model.  We’re not trying to compete at a language level with the RDF world.  We’re trying to provide something that can stand alongside what you can do in that world, much as we want property graph querying to stand alongside relational or tabular querying in SQL.  In both of those cases, if you take SQL or SPARQL, we want these languages to cleanly interrelate.

Clark: Through your dialogue with the community, are you seeing any patterns about the types of organizations that have been interested in GQL, either by industry or use case, anything like that?

Alastair: There’s interest both from users and vendors in respect to graph Cloud services, like Amazon Neptune and Azure Cosmos DB.  Those have come out initially with Gremlin support for property graphs.  Gremlin is an engine with a valid API and a particular style of programming, but it doesn’t do the kind of job that SQL does for relational databases. I think that there is a lot of interest in changing that situation and getting to the point where there is declarative querying for all property graph databases.  An interesting point about these Cloud services is that they are not built on preexisting relational technology.  They are built on other technologies or they are built in a more native way for the graph problem, much as Neo4j’s database was.  So there is interest in a language which is really oriented toward property graph query and isn’t based on how we adapt to or cover over existing SQL capabilities.

People are also looking at complex advanced graph problems and the utilization of graph in AI environments, for example. In some cases, people discussed this as being of interest – being able to really support a graph data model as the underpinning for artificial intelligence and machine learning work.  Lastly, property graphs are a kind of super set data model.  People are thinking about much more ubiquitous sets of graphs, data sets that can be managed independently and so forth.  It’s like a “graphs everywhere” theme.  There is a lot of interest in more sophisticated, concise, regular expression-based extensions to the pattern matching principle of Cypher and GQL, to make it easier for people to very concisely express complex explorations of the graph. I think this is a testament to the fact that people are looking at more and more complex problems where the graph model actually makes it easier and makes handling those graph problems more tractable.

Clark: GQL is good for the industry and there is plenty of precedent for it.  You can go back in time to the standardization of ANSI-C – which was vast – to almost 15 years ago to the open sourcing of Java, originally by Sun Microsystems. There was a large impact on the community, as it opened up that language to be managed by the community.  Obviously, GQL is a way to bring what was a very new – and for a long-time niche – technology into a much wider position of adoption across the spectrum of IT.  It allows companies to rely on a unified access point by way of the standardized language.

In chatting with Alastair, I realized this GQL initiative reminds me a bit of the decisions around the standardization of Java that occurred many years ago.  I worked for BEA in the relatively early days of Java, and we saw some of the same kinds of problems being cause by lack of standard specifications.  BEA had a very well thought of app server, but there was a lot of concern about lock in because only some of the specifications were standardized.  The opening of all the specifications and releasing them to a community process to be governed and further developed was helpful to everyone who was trying to sell technology based upon those standards.

The world is just starting to see the potential of Graph Databases. Standardization like GQL will be a key step in wide adoption.

Clark Richey (@crichey) is the Chief Technology Officer at FactGem. He has over 20 years of experience designing and developing software, primarily for the defense and intelligence sectors. He has also taught in the master’s program at Loyola University and undergraduate program at UMBC. Clark has investigated non-traditional methods and technologies that use data more efficiently for over 10 years.