Spotting a Graph-Shaped Problem

How to tell if your data would benefit from a different database

Ljubica Lazarevic
Geek Culture

--

Alina Grubnyak on Unsplash

Introduction

Heard of graph databases but not exactly sure what they are? Or maybe you have but you’re not quite sure where they apply. You’re in the right place, we’re going to be looking at both in this post, and by the end of it, hopefully you’ll have those creative graph juices flowing! You can also check out this webinar after.

So what is a graph?

Originating from a branch of mathematics, a graph is a set of discrete objects, each of which may have a set of relationships with other objects. Graph theory has many important applications, such as route planning, finding dependencies and managing topologies. To read more about graphs and graph theory, check out the Wikipedia page.

Graphs are really powerful — nearly anything can be represented with a graph.

Connections in the Internet

For example, the Internet can be represented as a graph in many different ways. We may think about how we are connected to each other via the Internet: we email each other, or send instant messages. Perhaps we’re having a Voice Over IP (VOIP) call, or using Google Meet or Zoom to chat to each other. We can use a graph to represent all of the different communication mechanisms the Internet provides. Alternatively, we can think about all of the different devices that use and make up the Internet. The devices we connect with, such as phones, tablets, laptops; other systems such as smart speakers, light bulbs, fridges, cars and other Internet of Things (IoT) devices. They are joined together through Wi-Fi, ethernet cables, managed by routers, switches, load balancers, firewalls, all the way through to the servers and down to the physical ‘tin and wires’ that gives us what the Internet is today.

Water molecule.

Another example of how a anything can be represented by a graph is to think about a water molecule. It’s not enough to just know that a water molecule is comprised of one oxygen atom and two hydrogen atoms, it’s specifically how they’re related to each other that gives us a water molecule.

What is a graph database?

A graph database continues this philosophy of not only capturing information about distinct entities, but also capturing how these entities are connected to each other. What is that specific relationship that defines the structure and context of the data as a whole.

Graph databases straddle the worlds of Relational and NoSQL databases — separating the structure of the data from the data itself. Relationships between data are considered as important as the data itself, and as such, the storage of relationships is given equal weighting.

Why are graph databases so special?

Rather than computing where the joins may lie between data entities, a graph database creates a physical manifestation of that connection as soon as it is known — think ‘joins on write’. This allows the graph to be ‘traversed’ by chasing pointers from node to node via these relationships, rather than trying to scan foreign key indexes to create joins at runtime (think ‘joins on read’), typical of traditional Relational Database Management Systems (RDBMS).

This delivers a superior query performance — making graph databases the ultimate storage system for the querying, interpretation and analysis of “graphy”, connected data.

How to spot when a graph database is a good fit — the graph-shaped problem

Whilst graph databases can be used in place of RDBMS systems for the vast majority of use-cases, they do have a clear sweet-spot for solving ‘graph-shaped’ problems. Let’s look at four scenarios to help us identify them:

  • Does our problem involve understanding relationships between entities?
  • Does the problem involve a lot of self-referencing to the same type of entity?
  • Does the problem explore relationships of varying or unknown depth?
  • Does our problem involve discovering lots of different routes or paths?

Scenario 1:Does our problem involve understanding relationships between entities?

Let’s look at an example of where we want to understand relationships between entities. Figure 1 below is describing buying behaviors of customers. Whilst the customers are buying two different t-shirts, they are of the same category type t-shirt. If we look at the other categories of products being bought, we can start to form a view of behaviors, based on these relationships. For example, what if Lisa also buys a pair of shorts, baseball cap and sunglasses. What if these four product category groups are bought over and over again by different people? We could use this information to make product recommendations to Jane through understanding these relationships

Figure 1 — understanding relationships between entities

We commonly see this scenario in the following use cases:

  • Recommendations
  • Next best action
  • Fraud detection
  • Identity resolution
  • Data lineage

Scenario 2:Does the problem involve a lot of self-referencing to the same type of entity?

With graph databases, we commonly emphasize the benefit for queries that would typically have many joins. Self-referencing to the same entity still involves joins, even if all of the data is on the same table. In our next example in Figure 2, we’ve been asked the question, who are Jane’s direct and indirect reports — a common question when looking at an organizational hierarchy. In this data set, all of the nodes have an ‘Employee’ type (or label when explicitly talking about graphs!). What is potentially computationally expensive to do in a traditional database is quick and easy in a graph database, due to the different way the data is stored and treating the relationships between the distinct data elements as important as the element itself.

Figure 2 — self-referencing to the same entity type

We commonly see this scenario in the following use cases:

  • Organizational hierarchies
  • Social influencers
  • Friends of friends
  • Churn detection

Scenario 3: Does the problem explore relationships of varying or unknown depth?

Another very “graphy” use case is looking at supply chains, where there are many interdependencies and unknown depth between suppliers, producers, traders, and so forth. In Figure 3 Pencils R Us produces pencils for wholesale stationary stores. They purchase the raw material wood from We Sell Wood to produce these pencils. We Love Stationary sell an assortment of different stationary items, including selling pens to We Sell Wood. We Sell Wood, a family business, has decided to call it a day, with the current owners deciding to go into retirement and to liquidize the business. Quickly we can see there are a number of dependencies — What happens to Pencils R Us who have now lost a source of their raw materials? Do they now experience a drop in production and product to sell? We Love Stationary potentially has a double impact, they may now need to find another source of pencils to stock, as well as a (probably minimal!) loss of sales due to no longer selling pens to We Sell Wood.

Figure 3 — exploring relationships of varying or unknown depths

We commonly see this scenario in the following use cases:

  • Supply chain visibility
  • Bill of materials
  • Network management

Scenario 4: Does our problem involve discovering lots of different routes or paths?

The last scenario we’re going to look at in Figure 4 is reminiscent of a journey I made in 2019. Based in London, if you were looking to head to Edinburgh by train, you’ve broadly got two different routes you can take: the east coast, via York and Newcastle, or the west cost, via Rugby, Crewe etc. Which route you take will be dependent on a number of factors — what’s the shortest distance between London and Edinburgh? Which route has the cheapest train ticket? Or availability at all? Or has a train broken down on a major track which means we need to pick a different route altogether? Discovering routes and paths could be argued as one of the most quintessential applications of graphs, and for good reason!

Figure 4 — discovering different routes or paths

We commonly see this scenario in the following use cases:

  • Logistics and routing
  • Infrastructure management
  • Dependency tracing

What next?

Hopefully this article has sparked some thinking around why graph databases are special, and how you might go around getting the best out of them. If you’d like to have a go of some of the above discussed scenarios with pre-canned data, why not try out the following use cases available in a Neo4j sandbox, with examples such as:

  • Movies (includes recommendations example)
  • Network and IT management
  • Fraud detection
  • Open Street Map
  • Contact Tracing

If you’d like to learn more about graphs in general, why not investigate the Neo4j Graph Academy which offers self-paced training for all things graph.

And if you’ve got some questions about whether your problem is graphy or not, come join the Neo4j community forum, or the Discord server.

Happy graphing!

--

--

Ljubica Lazarevic
Geek Culture

Technologist — data geek — solver of problems