Relational to Graph Databases

In reality relational databases are not actually storing the relationship between different data elements. In many cases, queries utilize JOINs which require excessive computing power and ultimately increased query response times. The problem is exacerbated with recursive self-joins. Relational databases can only accommodate minor changes and in the ever evolving tech and corporate landscape this has posed a problem. The alternative to this is a graph based system.

In a graph based system, all of the relationships and logical connections are stored. The system is composed of nodes, edges, and properties. Nodes would act as a record in a relational model. Edges or relationships would act as the connections that are made in a standard SQL query. They represent logical connections and can be constantly created as need arises. This is the most important part of the graph structure. Rather than using key pair relationships stored in each table, the relationships themselves are stored. Using a relational model, multiple JOINs might be needed to gather enough data to find the relationship.  There are properties that are basically denoting what the relationship is. For example if you had two nodes, Company and Employee, there would be a relationship between the two and the relationship would have the property “belongs to” and something like “since: arbitrary date“. As a traditional relational database scales more JOINs are needed but in a graph based database model a new connection is just created. This vastly reduces computing power and increases overall efficiency.

There are some major disadvantages to graph databases. First of all, they are not widely used. The most prominent example are social networks like Facebook and Twitter. Their backend is composed of nodes (profiles) and their associative relationships. SQL is the industry standard when it comes to relational databases but the same cannot be said for graph databases. Many of the databases are proprietary and have no universally adopted query languages. The most popular languages are Gremlin, SPARQL, and Cypher. I have found the easiest way to foray into graph databases is to use Neo4j (neo4j.com),  an open source GDMS (graph database management system). It also supports labels and unlimited attributes which allow for even faster queries. They have recently announced landmark partnerships with conglomerates such as IBM, Microsoft, and Amazon.

Many Big Data startups are implementing graph databases and I feel that they will be the standard very soon. The hesitancy that arises from large companies is the database migration and recreation costs as well as the imprint factor for developers. Many developers are familiar with SQL and do not want to have to learn a new language (which is/would be markedly similar). Speaking to many different business students who have gone on co-op, more specifically supply chain majors, one of the first things they learn is basic SQL. The effort needed to reteach and adjust is just too much for many companies and honestly may not be worth it in many cases. These developers have and will simply rely on  upgrading hardware to improve query processing.

 

 

By | 2018-04-03T21:56:58+00:00 March 29th, 2018|0 Comments