How to recreate a dimensional model in Neo4j — a quick example

Ljubica Lazarevic
3 min readAug 20, 2019

--

Introduction

We recently received a tweet on migrating dimensionally modelled data into Neo4j, so this would be a great opportunity to pull together an example of how to approach it.

We’re using this post which gives a simple example and data set for how we might approach this in Neo4j.

Model

The data model we’re going to use is the same as the star schema used in the example blog. This should not come to too much of a surprise due to the nature of modelling for graph databases.

Example data model — in this scenario it looks the same as the dimensional table provided in the original example — some fictitious data used!

Data Load

Neo4j is schema on write, so we don’t need to declare a model ahead of time. I’ve made the data available from here. Just copy and paste the CSVs into the import folder of your Neo4j database (instructions here). A confession beforehand, I’ve just randomly generated the ratings for the movies!

To load the data, first ensure you’ve enabled multi-line statements in Browser (go to Settings, and tick ‘ Enable multi statement query editor’), and then run the following code:

//Just setting these up in case we MERGE in the future, or do searches by id or title for speedier queries
CREATE INDEX ON :Movie(id);
CREATE INDEX ON :Movie(title);
//Create Movie and RevenueFact nodes, link together
//ignoring the percentage values for now
LOAD CSV WITH HEADERS FROM 'file:///MovieRevenue.csv' AS line
CREATE (m:Movie {id:line.Rank, studio:line.Studio,
title:line.Title, year:line.Year})
CREATE (r:RevenueFact {rank:line.Rank,
worldwide:toFloat(line.Worldwide),
domestic:toFloat(line.Domestic),
overseas:toFloat(line.Overseas)})
CREATE (m)-[:HAS_REVENUE_FACT]->(r);
//Create RatingFact nodes and lookup Movie node, link together
LOAD CSV WITH HEADERS FROM 'file:///MovieRatingGenerated.csv' AS line
MATCH (m:Movie {id:line.Rank})
CREATE (r:RatingFact {rating:toFloat(line.Rating)})
CREATE (m)-[:HAS_RATING_FACT]->(r);

If we look at CALL db.schema() we can see our model, generated by the import:

Querying the database

Now let’s ask some questions. Let’s start with the one provided in the example blog:

//Movies by worldwide revenue and rating
//sorted by greatest worldwide revenue
MATCH (revFact:RevenueFact)<--(m:Movie)-->(rateFact:RatingFact)
RETURN m.title AS Title,
revFact.worldwide AS Worldwide,
rateFact.rating AS Rating
ORDER BY Worldwide DESC;

How about some aggregation?

//Studios by worldwide revenue 
//sorted by greatest worldwide revenue
MATCH (revFact:RevenueFact)<--(m:Movie)
WITH m.studio AS Studio, sum(revFact.worldwide) AS totalRev
RETURN Studio,
totalRev as `Total Worldwide Revenue`
ORDER BY totalRev DESC;

Summary

In this very brief post we’ve shown how you might use your existing dimensionally modelled data into Neo4j, along with some example queries.

--

--

Ljubica Lazarevic
Ljubica Lazarevic

Written by Ljubica Lazarevic

Technologist — data geek — solver of problems

No responses yet