Reflections on using DGraph
Categories: programming
Tags: graph
Recently I have been playing around with Dgraph in an attempt to describe relationships between people to implement a flexible permission system. There was a lot of hype around this particular database, I assume due to the backing company. These are my pain points after a bit.
Overview of a Graph Database
A Graph Database desire to organize information according to Graph Theory, linking nodes with edges. Most graph databases support nodes being a container for a key-value pair with edges containing either scalars or just conncetive values. Edges may be oriented or bidirectional to describe the relationship between nodes.
Dgraph takes a node centeric view of the graph. According to Dgraph’s view a node contains a number of edges, some of which contain scalars while others contain a unidirectional connection to another node. The datastore offers now mechanism to allow for hard logical separation between graphs. Dgraph uses multiple cooperating process to execute maintain the system. In my time working with the database I have been unable to ascertain data reliabliity stories such as backup and restore tooling. It appears like these might only be available in the enterprise edition.
The database claims to be schema free, however like many NoSQL databases who started that way they definitely require the queryable columns ahead of time.
Query is a bit strange
Dgraph utilizes a derriative of GraphQL for querying locating data. As far as I can tell you may only chose one
property on a node to query without using an extension for further filtering. Although you can choose any name for the
result set the func
parameter must be set to your initial query. For example:
{
query(func: eq(graph, "example")) {
member
attributes
some_relationship {
member
attributes
}
}
}
This is rather exciting since one can query for subattritued. This means you can pull back the desired data through whatever number of hops you need.
Selections Operators use a GraphQL extension
Filtering for what might be core attributes feels a bit clunky to me, however in the official documentation
the best method for subselection appears to be using the @filter
GraphQL extension. In part, I was hoping this would
have been either more Prolog-constraint like or closer to SQL. It works though.
{
query(func: eq(graph,"selection")) @filter(has(friend)) {
nmae
}
}
There are additional features such as subqueries and pagiation I haven’t quiet gotten to yet. These will be exciting.
No mechanims to query for everything
This one stumped me for a bit. Surely this doesn’t make sense, right? As of now I have been unable to find a mechanism to reliably query for all live values within the data store. Their opinion is simply you should know what you are looking for.
When developing there are many legitimate cases to query for all available data. In many diagnostic cases there are perfrectly valid reasons to see all data, such as ensuring the proper edges are established.
Deletions do not Delete
I have only been able to get deletions to work using their RDF, unforunatly the JSON version to work. The JSON version is translated into RDF according to the documentation, so one must be familar with both.
Even once you trudge past how to actually issue the command, the nodes stick around for a period of time. Flushing the indexes seems to remove the nodes from the query results. For a database claiming to be ACID compliant that is a bit a worrying.
DevXP is not friendly
To get a minimal viable example up and running you either run two daemons locally, or need to orchestrate mutliple Docker containers. Dgrpah has made this easy by providing a docker-compose.yml file, however to get the standard WUI they include one must export it on port 8080.
Conclusion
None of these are deal breakers yet, although this does leave me a bit on the fence. Especially the way the community posts where the authors advocating Dgraph’s position are a bit opinionated. At this point it looks like I know I don’t know enough to really render a judgement. From what I’ve seen, operational data isn’t available yet. However the approach taken seems to be that of Cassandra: they only want to work on giant data sets and are not consider with the small to medium group. The backed up by the minimum of 1GB commitment to run the data nodes in in the system.