Neo4J’s Cypher is a powerful query language for graph databases. The language shares a heritage with traditional database tools like Structured Query Language (SQL), but it does occupy a distinct phase within that heritage.
To differentiate the two languages, one might liken them to two related yet mostly no longer mutually intelligible natural languages, such as Dutch and English.
Whereas SQL is the classic, table-oriented data storage, retrieval, and manipulation tool, Cypher is not quite based on that same notion of a table.
Cypher, and the databases it allows you to query like AuraDB, are instead concerned with the components of graphs as we know them in the traditional computer science sense:
This crash course will attempt to cover a similar curriculum for Cypher Query Language (CQL?) as I covered in the past for Kusto Query Language (KQL). However, this one will diverge somewhat from the KQL course since the nature of this language is a bit different.
Also, the KQL course contains a mostly flavor-agnostic and in-depth introduction to regex; we will discuss regex in this course, but I’m not reproducing that section specifically because it would be almost entirely duplicate content. Check out the KQL course for more on regex, and you can even follow along with the KQL course’s regex introduction in Cypher using a setup for regex testing we develop later on.
In this course, I’m using a free AuraDB instance with the Northwind dataset — in case you want to follow along.
There are a few different ways of running Neo4J Cypher queries.
The most normal among you will probably just use the web interface the Neo4J folks give you for running queries against your databases. This approach among all of the options seems perhaps most well-adjusted.
Another option is to use the Cypher Shell tool, which allows you to interact with your Neo4J databases from text-based interfaces like your terminal. If you want to do this, I’m not going to stop you.
But these options, in my view, are not accessible to everyone. Not everyone has regular access to computing devices which are amenable to running those tools.
In that vein, I’m providing here a small Node.js/JavaScript framework for running Cypher queries. I happen to run this on Replit since they have an easy-to-use, mobile IDE available for free, and it works great with Node.js.
Another benefit is this allows you to dip your toe into the world of writing a server which can communicate with a Neo4J database 🤓.
The first thing your setup needs is a few environment variables
containing the connection details for your Neo4J DataBase Management
System (DBMS). These secrets should be provided to you in a simple text
file when you spin up your instance. On Replit, you add these in the
Secrets
tab. These are the names of the secrets you need to
add:
NEO4J_URI
NEO4J_USERNAME
NEO4J_PASSWORD
Next from NPM you need to install the official
neo4j-driver
package. On Replit you do this in the
Packages
tab of your Node.js REPL.
We’re going to make for ourselves a simple config file called
query.config.json
in the root directory:
{
"outputMode": "csv",
"permissibleOutputModeValues": [
"console",
"json",
"csv"
],
"driverOptions": {
"maxConnectionLifetime": 3000000,
"disableLosslessIntegers": true
}
}
The outputMode
value, when read by the script, will
determine the method used to output the query’s result set.
permissibleOutputModeValues
is just there to remind you
what options there are.
And finally, the driverOptions
are optional but just a
quality-of-life preference for me; if you need high precision integer
support then leave disableLosslessIntegers
as
false
.
Now we’re ready to write our script. In index.js
:
const neo4j = require('neo4j-driver')
const fs = require('fs/promises')
const neo4jURI = process.env['NEO4J_URI']
const neo4jUsername = process.env['NEO4J_USERNAME']
const neo4jPassword = process.env['NEO4J_PASSWORD']
;(async function(){
let driver, session, before
try {
const query = await fs.readFile(
'query.cypher',
'utf8'
)const {
,
outputMode
driverOptions= JSON.parse(
} await fs.readFile(
"query.config.json",
"utf8"
)
)= neo4j.driver(
driver ,
neo4jURI.auth.basic(
neo4j,
neo4jUsername
neo4jPassword,
)
driverOptions;
)= driver.session()
session console.log(
`
n----------------n
| Session Opened |
u----------------u
`
)= performance.now()
before const result = await session.run(
query
)const after = performance.now()
const durationMS = after - before
console.log(
`Query Duration (milliseconds): ${
durationMS}`
)if(
typeof outputMode != "undefined" &&
=== "json"
outputMode
){await fs.writeFile(
"results.json",
JSON.stringify(
.records.map(
result=> record.toObject()
record
),
)
{encoding: "utf8"
}
)else if(
} typeof outputMode != "undefined" &&
=== "csv"
outputMode
){await fs.writeFile(
"results.csv",
[!result.records.length ? "" : result.records[0].keys.sort().map(
=> `"${
k .replace(
k"\"",
"\"\""
)}"`
.join(","),
)"\n",
.records.map(
result=> record.keys.sort().map(
record => `"${
key String(
.get(key)
record.replace(
)"\"",
"\"\""
)}"`
.join(",")
).join("\n")
).join(""),
]
{encoding: "utf8"
}
)else {
} .records.forEach(
result=> {
record console.log(
`
n---------n
| Record: |
u---------u`
).keys.forEach(
record=> {
key console.log(
`\nKey: ${
key} | Value: ${
JSON.stringify(record.get(
key
))}`
)
}
)
}
)
}catch (e) {
} const after = performance.now()
const durationMS = after - before
console.log(
`Duration (milliseconds): ${
durationMS}`
)console.log(
`
n---------------n
| Caught Error: |
u---------------u
`
)console.log(e)
finally {
} .close()
sessionconsole.log(
`
n----------------n
| Session Closed |
u----------------u
`
).close()
driver
} })()
Again, if you don’t care, move along to the next section and query the DB however you want — no judgement. But, if you’re curious about the mobile-compatible and JavaScript-infused detour we’ve taken, keep reading.
First we require
the neo4j-driver
and
fs/promises
, the latter for file reading and writing.
We import our environment variables we set earlier for connecting to our Neo4J DB.
And then we are off to the races. Our business of querying the
database and handling the output is taken within the async
IIFE. For convenience, I’d just like to use
async/await
syntax for JavaScript promises. So we define an
asynchronous function and immediately invoke that function definition.
Inside that function’s scope we can await
promises to our
heart’s content.
Naturally, the behaviors we implement inside of the async function
are encapsulated within a try/catch/finally
setup.
try
is for the business logic of querying the database and
handling its responses. When those behaviors fail, we catch
those errors and log them clearly. And finally
, we always
make a good-faith effort to clean-up the resources we create for
querying.
We get started by reading the query file, where our cypher queries will be stored, as well as the query config json file. Using those contents, we instantiate the driver, make a session with it, and fire off a query.
With the query results, we take one of three avenues (which is determined by our config) for rendering the output.
My personal favorite is the console output, and you can see I enjoyed some simple ASCII rendering.
JSON is easily supported, since JavaScript has in-built support for serializing objects to JSON.
The CSV-rendering is slightly trickier since JavaScript doesn’t have
native support for CSV. We tackle this by first CSV-escaping each field,
where each doublequote "
(if any) is replaced by two
doublequotes ""
; then, the resulting field value is
surrounded in double quotes. The double-quoted fields are joined by
commas to form records. And finally we join that array of records with
newlines.
Now you’re ready to pop a query in the query file and tap the run button!
The first thing you may be curious about is which databases are available to you with the DBMS connection info you provided.
To see those databases, run this cypher command:
SHOW DATABASES;
You’ll notice one or more databases you’re interested in, plus a
so-called system
database which controls all the other databases under its care.
Great. How do you know what your actual capabilities are with the connection info you’ve used?
To understand this properly, you’ll need a way to enumerate these details. If this sounds boring to you, I admonish you to not skip this section. Many of the secrets of how Neo4J and Cypher work will be unlocked when you solve this problem.
Cypher is willing to deliver to you lists of functions and procedures you can utilize in your queries. It’s a kind of dictionary pertaining to the Cypher language, and you can use Cypher itself to query this dictionary!
First, what is the difference between procedures and
functions in Cypher? The general theme is that functions are
more broadly usable, as you can use them within any expression.
Procedures, on the other hand, are more limited in terms of the types of
Cypher clauses within which they can be invoked, mainly
CALL
clauses.
Let’s see an example of a query you can use to list invokable procedures:
SHOW PROCEDURES EXECUTABLE BY CURRENT USER
YIELD *
SKIP 0 LIMIT 10;
The first line is self-explanatory. The second line means, for each
record representing a procedure, YIELD
all fields in that
record. You could provide an allowlist of fields in place of the
asterisk. Finally, the third line is what allows you to page through the
results. Since you probably don’t know how many procedures are available
or want to read them all, it might be better to grab, say, ten at a time
to read about. Set the LIMIT
to whatever you want, and when
you’re ready to read the next “page” of results, increment the
SKIP
value by the quantity you chose for your
LIMIT
.
The above approach is great for acquainting yourself with the language. But let’s say you’re looking for a particular kind of procedure. Now the paging strategy alone is not likely to satisfy you, as it will be a linear search through a potentially large quantity of result pages.
Actually, you can utilize a WHERE
clause like in many
query languages to narrow down the returned results. It only requires a
few modifications to the query:
SHOW PROCEDURES EXECUTABLE BY CURRENT USER
YIELD *
WHERE name CONTAINS "graph" OR description CONTAINS "graph"
RETURN *
SKIP 0 LIMIT 10;
We are searching for procedures where the name
or
description
fields contain the text graph
.
After filtering and before the paging at the end of the query, we have
to RETURN *
for the query to work. This is similar to
YIELD
, but we will learn more about the details of this
syntax later.
Let’s modify the above to search for functions instead. And this
time, instead of a simple CONTAINS
, we’ll try a regex
filter to search for anything that fits one of a number of
functionalities we might be interested in:
SHOW FUNCTIONS EXECUTABLE BY CURRENT USER
YIELD *
WHERE name =~ "(?i).*(?:regex|rex|ip|coalesce|json|xml|jpath|xpath|parse|bin|float|int|decimal|double|rand|extract).*"
RETURN COUNT(*);
Here we’re just fishing through the function dictionary for any hits based on popular terms from other languages we might already know. And we don’t come up dry!
Instead of returning the fields directly, we returned the result of
our first aggregation function, count
. When I ran this, it
told me there are 52 functions that matched! So let’s page through these
52 records and see all the functions that matched:
SHOW FUNCTIONS EXECUTABLE BY CURRENT USER
YIELD *
WHERE name =~ "(?i).*(?:regex|rex|ip|coalesce|json|xml|jpath|xpath|parse|bin|float|int|decimal|double|rand|extract).*"
RETURN name, description
SKIP 0 LIMIT 10;
What a cornucopia of tools we unearthed with a bit of regex. If you
don’t understand the regex pattern — don’t worry. What we did with regex
can be done via the CONTAINS
syntax shown earlier by
chaining multiple CONTAINS
predicates together with
OR
as shown with procedures.
We will cover regex in more detail later. However, for those of you who are curious about the regex above (Cypher uses Java flavor of regex, by the way), here’s what we did in that pattern:
(?i)
a flag indicating case-insensitive matching.*
0 or more of any character; you’ll see this at the
end of the pattern too because, in this kind of regex, patterns are
implicitly fully anchored, which means the pattern you define has to
somehow describe the string in its entirety.(?:
the start of a non-capturing group|
, meaning
alternation or a series of possible options.)
the end the non-capturing group.*
0 or more of any characterOverall, we want the pattern to match one member from a set of search terms, where the matched search term can have an arbitrary, optional prefix and an arbitrary, optional suffix. You can reuse this pattern to search with your own search terms, as long as you’re mindful of regex syntax constraints.
At last, you’d like to learn how to start exploring the contents of your graph database.
In a traditional, table-oriented database, this would probably start
with pulling a list of tables from which you’re able to
SELECT
.
As mentioned earlier, Neo4J is a graph-based database and so is not
based on the same notion of tables. Instead of tables of records, we
store nodes
and the relationships
between
them.
So how do we categorize these entities if not with tables? We do so in this way:
node
is assigned 0 or more labels
,
(potentially multiple)relationship
is assigned exactly one
type
In fact, there are procedures you can CALL
to enumerate
the labels
and types
accessible to your
queries within the database:
CALL db.labels()
YIELD label
RETURN *
ORDER BY label
SKIP 0 LIMIT 100;
This enumerates labels
you can use to filter
nodes
in queries of your database graph.
The same kind of enumeration, but of types
you can use
to filter relationships
in queries of your database
graph:
CALL db.relationshipTypes()
YIELD relationshipType
RETURN relationshipType
ORDER BY relationshipType
SKIP 0 LIMIT 100;
Databases always use some form of data indexing in order to support fast retrieval as well as various types of constraints like uniqueness of values stored. Even though you don’t explicitly chose which indexes your query uses in Cypher, it can be very important to understand what indexes are in place before you start querying certain types of databases.
To page through the different indexes your Neo4J database has configured:
SHOW INDEXES
YIELD *
RETURN *
SKIP 0 LIMIT 20
I’ll draw attention to a few things:
state
, where you want this value to
be ONLINE
. If an index is not in that state, the query
planner can’t use it to help your query be fast.In my Northwind database, I only have a series of Range indexes (the
most general purpose type), and two Token Lookup indexes. The latter two
are critical indexes that come with every database to enable you to
quickly lookup nodes
and relationships
by
their labels
and types
; hence, under ordinary
circumstances these two indexes should never be dropped.
Finally, we are ready for match game.
These are like SQL SELECT
statements, but more
graph-oriented. Cypher MATCH
statements offer a
delightfully nerdy ASCII syntax for declaratively matching sub-graphs
within your database.
Here’s a very simple MATCH
query:
MATCH (n)
RETURN n
LIMIT 2
We request to match at most two nodes. We know we are matching nodes
because of the pair of parentheses. Within the pair of parentheses, we
specify n
as the variable name or key to which each node
should be assigned, which allows us to reference the matched nodes
within the RETURN
clause by way of that key. In this case,
we just return n
instead of transforming it (more on that
later); owing to that, you’ll see each record’s key, in the result set
returned to you, is also n
.
You can rename that key in your RETURN
clause:
MATCH (n)
RETURN n AS MyNode
LIMIT 2
A slightly more verbose but similar syntax allows you to pull two relationships:
MATCH ()-[r]-()
RETURN r
LIMIT 2
The empty parentheses indicate to match nodes, but don’t bother to remember them for later in the query.
Under ordinary circumstances, relationships always have two nodes associated with them, a start and an end node. This does mean that all relationships in Neo4J are “directed”. A directed graph is one where the edges or relationships are like one-way streets or lanes.
You can try modifying the above query to also return to you the nodes alongside the connecting relationship so you can figure out which is the start and which is the end:
MATCH (one)-[rel]-(two)
RETURN r
LIMIT 2
This a more “investigative” approach to determining which node is the start and which is the end. The match syntax may be modified to add a left or right angle bracket character to indicate which name to assign the start and end nodes:
MATCH (start)-[rel]->(end)
RETURN start, rel, end
LIMIT 2
You can flip the syntax the other direction just as well:
MATCH (end)<-[rel]-(start)
RETURN start, rel, end
LIMIT 2
If you wish, you can assign a name to the entire ASCII path you’ve defined and return that:
MATCH MyPath=(end)<-[rel]-(start)
RETURN MyPath
LIMIT 2
Your paths can easily span more than 2 nodes:
MATCH (start)-[rel1]->(mid)-[rel2]->(end)
RETURN start, rel1, mid, rel2, end
LIMIT 2
So far we are just returning the names we give to the matched
nodes
and relationships
. You see that, for
each name you return, you get in the response a key with that name set
to an object with multiple properties.
However, the RETURN
clause allows you to use all kinds
of expressions (and functions) to derive the ultimate form of the
response:
MATCH (start)-[rel1]->(mid)-[rel2]->(end)
RETURN id(start) AS StartId,
labels(start) AS StartLabels,
type(rel1) AS Relation1Type,
id(mid) AS MidId,
labels(mid) AS MidLabels,
type(rel2) AS Relation2Type,
id(end) AS EndId,
labels(end) as EndLabels
LIMIT 2
Here we have enumerated some labels and types that are associated with parts of the path matched by the query.
With that knowledge in hand, we can refine the query further so the
MATCH
statement ensures matched elements have particular
labels or types. When that happens, we can more reliably dig into the
properties
associated with a particular element and
selectively return them:
MATCH (start:Employee)-[rel1:IN_TERRITORY]->(mid:Territory)-[rel2:IN_REGION]->(end:Region)
RETURN apoc.text.join(
[
start.titleOfCourtesy,
start.firstName,
start.lastName
], " "
) AS FullName,
mid.territoryDescription AS Territory,
end.regionDescription AS Region
LIMIT 10
We added label filters to the nodes by appending, within the
parentheses, a colon after our chosen variable name and then the label
by which we are filtering. For instance, for the node which we are
calling start
, only allow for that node matches which have
the label Employee
.
The relationship type filters are a similar syntax, but of course within the square brackets.
We return the result of a more complex expression than we have seen before, being a list with three entries.
The first, FullName
is constructed by utilizing the
apoc.text.join
function, which takes two arguments:
.
accessor syntax to access name properties of the node with name
start
.The remaining two fields are more simply defined by using the dot accessor syntax and a renaming operation.
It can seem redundant to assign names to the matched relationships if those names are not used later in the query. Feel free to drop those names if they are not used:
MATCH (start:Employee)-[:IN_TERRITORY]->(mid:Territory)-[:IN_REGION]->(end:Region)
RETURN apoc.text.join(
[
start.titleOfCourtesy,
start.firstName,
start.lastName
], " "
) AS FullName,
mid.territoryDescription AS Territory,
end.regionDescription AS Region
LIMIT 10
The MATCH
clause also allows you to declaratively match
nodes
and relationships
based on the values of
properties they may have.
Let’s say you want a match mostly like what we have been matching in
the last few queries, but you also want all of the matched
Region
nodes to have a regionDescription
property equal to Eastern
:
MATCH (start:Employee)-[:IN_TERRITORY]->(mid:Territory)-[:IN_REGION]->(end:Region { regionDescription: "Eastern" })
RETURN apoc.text.join(
[
start.titleOfCourtesy,
start.firstName,
start.lastName
], " "
) AS FullName,
mid.territoryDescription AS Territory,
end.regionDescription AS Region
LIMIT 10
Now we only have Employees from that Region.
This syntax is convenient, but limited to exact-matches. Remember
WHERE
statements from before? Those allow you to perform
various kinds of partial matching, like prefix, suffix, substring, and
regex. Here’s the above example modified to use an additional prefix
filter:
MATCH (start:Employee)-[:IN_TERRITORY]->(mid:Territory)-[:IN_REGION]->(end:Region { regionDescription: "Eastern" })
WHERE mid.territoryDescription STARTS WITH "B"
RETURN apoc.text.join(
[
start.titleOfCourtesy,
start.firstName,
start.lastName
], " "
) AS FullName,
mid.territoryDescription AS Territory,
end.regionDescription AS Region
LIMIT 10
Now we have only Employee nodes which are in a Territory that starts with the letter “B” in the “Eastern” Region.
After your MATCH
statement, you’ll commonly want to
transform your results. So far we have done this to a limited extent
within the confines of the RETURN
statement. Yet, you might
miss the fluent pipelines of transformations you composed with languages
like Splunk Processing Language (SPL) and Kusto Query
Language (KQL).
Fret not! Cypher also possesses this pipeline-esque capability. The
key to accomplishing this is the WITH
statement.
Basically, WITH
statements are much like
RETURN
statements in functionality, with the difference
being that RETURN
must be the final stop. You can chain as
many or few WITH
statements (even with other types of
statements in between), as long as at the end there is a
RETURN
statement. In case you are familiar with KQL, these
WITH
and RETURN
statements function a bit like
the KQL project
command.
Here’s an example which chains several of these statements together, producing some statistical summarizations along the way:
MATCH (o:Order)
WITH toFloat(o.freight) AS Freight,
o.shipCountry AS ShipCountry
WITH avg(Freight) AS AvgFreight,
stdev(Freight) AS StdevFreight,
ShipCountry
RETURN StdevFreight/AvgFreight AS CVFreight,
ShipCountry
ORDER BY CVFreight
MATCH
— We match only nodes with label
Order
, assigning them to key o
.WITH
— We transform that resulting set of
Order
nodes into a set of results where each record has two
fields:
Freight
— the freight
property of the
Order
node converted to a float value.ShipCountry
— the shipCountry
value of the
Order
node renamedWITH
— We transform the result set of the previous
WITH
statement so that each record has three fields:
AvgFreight
— the average of all Freight
values calculated once per unique combination of the grouping keys,
if any. In this case, we only have one grouping key, the
ShipCountry
field. We know ShipCountry
is a
grouping key because it is not produced by way of an aggregation
function (like avg
).StdevFreight
— the standard deviation of all
Freight
values calculated once per unique value of
ShipCountry
. This is a measure of the dispersal or spread
of the data points under consideration; as it gets bigger, it typically
means the values are more spread out.ShipCountry
— this value is not transformed but passed
on “as is”. In this case, it’s a grouping key.RETURN
CVFreight
— we calculate the coefficient of variation
by dividing the standard deviation by the average. This is also known as
relative standard deviation, and overcomes some limitations of the
standard deviation.ShipCountry
— again, passing this value along
untouched.The final ordering statement impacts the order of the returned results, so they are sorted in the manner you specify.
Here’s an expanded version of that query which additionally includes calculation of percentiles, followed by taking ratios of them to the calculated averages:
MATCH (o:Order)
WITH toFloat(o.freight) as Freight,
o.shipCountry as ShipCountry
WITH avg(Freight) AS AvgFreight,
stdev(Freight) AS StdevFreight,
percentileCont(Freight,0.05) AS P05Freight,
percentileCont(Freight,0.95) AS P95Freight,
ShipCountry
RETURN StdevFreight/AvgFreight AS CVFreight,
P05Freight/AvgFreight AS P5ToAvgFreight,
P95Freight/AvgFreight AS P95ToAvgFreight,
ShipCountry
ORDER BY CVFreight
Now we are at a level of calculating summary statistics and performing simple table projections.
What if we want to produce a time series? This is a table where we calculate statistical aggregation functions once for each span or bin of time (say, 5 minutes) across a larger range of time (say, 24 hours).
This requires a type of transformation where we truncate
the timestamps associated with particular nodes in order to remove a
specified degree of precision from them.
Here’s a query that demonstrates that:
MATCH (o:Order)
WITH apoc.text.regexGroups(
o.orderDate,
"^([0-9]+)[-]([0-9]+)[-]([0-9]+).*$"
) AS DateParts, o.orderDate AS OgOrderDate
WITH DateParts[0][1] AS Year, DateParts[0][2] AS Month, DateParts[0][3] AS Day, OgOrderDate
WITH datetime({
year: toInteger(Year),
month: toInteger(Month),
day: toInteger(Day)
}) AS DateTime, OgOrderDate
WITH toString(
datetime.truncate(
"month",
DateTime
)
) AS Truncated, OgOrderDate
RETURN COUNT(*), Truncated
ORDER BY Truncated
The purpose of this query is to return a separate count of orders for each month where there are orders in the graph.
It follows this logic:
Order
and assign to
variable o
DateParts
and
ogOrderDate
ogOrderDate
is just a new name for the
orderDate
property of a given Order
nodeDateParts
is a set of extracted substrings from the
orderDate
property. The regex pattern is fully anchored and
describes the orderDate
string like this:
^([0-9]+)
— starting at the beginning of the
orderDate
string, match a sequence made of one or more
digits (and remember that sequence).[-]
— match a literal hyphen character (but don’t
bother to remember it).([0-9]+)
— match another sequence made of one or more
digits (and remember that sequence).[-]
— match another literal hyphen character (and don’t
bother to remember it).([0-9]+)
— match another sequence made of one or more
digits (and remember that sequence)..*$
— match a sequence made of zero or more of any
character all the way to the end of the string.ogOrderDate
simply remaining the same, and Year
, Month
,
and Day
variables set to the respective digit sequences
pulled out above via regex.DateTime
variable by using the
datetime
function, passing into it the result of converting
our Year
, Month
, and Day
strings
to integers.DateTime
variable has a precision of one day granularity, the truncation removes
that level of granularity and produces new values which have the
specified level of granularity of one month. So all timestamps that fall
in the same month have the same new timestamp, regardless of which
individual day they originally had. This effectively bins our
orderDate
values by month and year.I’ve written about this before in my Epic Kusto Query Language (KQL) Crash Course. People often schedule events on a recurring basis. For example, a paycheck may commonly be received from an employer every other week on a Thursday or Friday.
Building upon the paycheck example, it may be more useful to you to know which days of the week tend to have the most paychecks issued, rather than a timechart of daily paycheck volumes going back several months or years. This is an example of trying to comprehend the seasonality of a phenomenon.
To do this, you only need to manipulate your time range bin labels a bit more:
MATCH (o:Order)
WITH apoc.text.regexGroups(
o.orderDate,
"^([0-9]+)[-]([0-9]+)[-]([0-9]+).*$"
) AS DateParts, o.orderDate AS OgOrderDate
WITH DateParts[0][1] AS Year,
DateParts[0][2] AS Month,
DateParts[0][3] AS Day,
OgOrderDate
WITH datetime({
year: toInteger(Year),
month: toInteger(Month),
day: toInteger(Day)
}) AS DateTime, OgOrderDate
WITH DateTime.weekDay AS WeekDay,
DateTime.week AS Week, OgOrderDate
WITH apoc.text.lpad(
toString(Week),
2,
"0"
) + " " + toString(WeekDay) AS Week_WeekDay,
OgOrderDate
RETURN COUNT(*) AS Count, Week_WeekDay
ORDER BY Count
Here we take our earlier produced DateTime
field and
separate out two of its properties: weekDay
and
week
. The first is an integer referring to which of the 7
days of the week the DateTime
lands on. The second is a
number referring to which of the 52 weeks in a year the
DateTime
lands on. We convert these integers to strings,
padding the week value with a 0 on the left in cases where the week
number is less than 10. Finally, we produce a count of orders binned by
our week and weekday label field.
This gives us an idea of the highest volume days for this business, regardless of how many years of data exist in the dataset. It also allows you to start noticing trends, like high volume days being clustered in particular weeks or if a particular day of the week tends to be high volume regardless of which week of the year contains it.
Properties are not limited to holding scalar values, like integers. They can also hold multi-valued data types, like lists. As in other languages, Cypher provides facilities for “expanding” these multi-valued datatypes so that each item within the multi-valued field becomes its own record.
First, run this query:
WITH [
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyy"
] AS X
RETURN X
Notice how it returns a single record with a single property set to an array of strings.
Now run this modified version:
WITH [
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyy"
] AS X
UNWIND X AS UX
WITH apoc.text.regexGroups(
UX,
"^([xXoO]{75,200}?)"
)[0][1] AS EX
RETURN SIZE(EX) AS Length
Notice how this returns 3 separate records. This is owing to the UNWIND command. The regex pattern demonstrates a lazy match inside a capturing group. Feel free to re-use this setup to test out various regex patterns you come up with.
Let’s further modify the last query to “wind” the result back up into
a single array by using the collect
function:
WITH [
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyy"
] AS X
UNWIND X AS UX
WITH apoc.text.regexGroups(
UX,
"^([xXoO]{75,200}?)"
)[0][1] AS EX
WITH SIZE(EX) AS Length
RETURN collect(Length) AS Lengths
Taking an example from my Kusto Query Language course:
WITH [
'{"id": 1, "brand": "TechMaster", "model": "X9000", "cpu": "QuadCore 3.0GHz", "ram": "16GB", "storage": "512GB SSD", "graphics": "NVIDIA GTX 1080", "price": "$1299.99"}',
'{"id": 2, "brand": "QuantumTech", "model": "UltraCore", "cpu": "OctaCore 2.5GHz", "ram": "32GB", "storage": "1TB HDD", "graphics": "AMD RX 5700", "price": "$1499.99"}',
'{"id": 3, "brand": "CyberFusion", "model": "StealthX", "cpu": "HexaCore 2.8GHz", "ram": "64GB", "storage": "256GB NVMe", "graphics": "GeForce RTX 3060", "price": "$1699.99"}',
'{"id": 4, "brand": "SysGenius", "model": "ElitePro", "cpu": "QuadCore 3.2GHz", "ram": "8GB", "storage": "1TB SSD", "graphics": "Intel UHD Graphics", "price": "$1099.99"}',
'{"id": 5, "brand": "MegaByte", "model": "PowerSpeed", "cpu": "OctaCore 3.5GHz", "ram": "128GB", "storage": "2TB HDD", "graphics": "NVIDIA RTX 3080", "price": "$1999.99"}',
'{"id": 6, "brand": "NeoGaming", "model": "TitanX", "cpu": "HexaCore 3.0GHz", "ram": "16GB", "storage": "512GB NVMe", "graphics": "AMD RX 5600 XT", "price": "$1299.99"}',
'{"id": 7, "brand": "Futurix", "model": "FusionWave", "cpu": "QuadCore 2.7GHz", "ram": "32GB", "storage": "1TB SSD", "graphics": "GeForce GTX 1660", "price": "$1399.99"}',
'{"id": 8, "brand": "TechWave", "model": "SwiftForce", "cpu": "OctaCore 2.2GHz", "ram": "64GB", "storage": "256GB SSD", "graphics": "NVIDIA GTX 1070", "price": "$1599.99"}',
'{"id": 9, "brand": "QuantumPulse", "model": "Xplorer", "cpu": "HexaCore 2.5GHz", "ram": "8GB", "storage": "512GB HDD", "graphics": "AMD RX 550", "price": "$1099.99"}',
'{"id": 10, "brand": "EpicGen", "model": "ThunderBlaze", "cpu": "QuadCore 3.5GHz", "ram": "128GB", "storage": "2TB SSD", "graphics": "Intel Iris Xe", "price": "$1899.99"}',
'{"id": 11, "brand": "CyberPulse", "model": "NovaSpeed", "cpu": "OctaCore 3.2GHz", "ram": "16GB", "storage": "1TB NVMe", "graphics": "NVIDIA RTX 3070", "price": "$1799.99"}',
'{"id": 12, "brand": "TechFusion", "model": "VelocityX", "cpu": "HexaCore 2.8GHz", "ram": "32GB", "storage": "512GB SSD", "graphics": "AMD RX 570", "price": "$1299.99"}',
'{"id": 13, "brand": "InnoTech", "model": "BlazePro", "cpu": "QuadCore 2.5GHz", "ram": "64GB", "storage": "1TB HDD", "graphics": "GeForce GTX 1650", "price": "$1499.99"}',
'{"id": 14, "brand": "FusionTech", "model": "SwiftNova", "cpu": "OctaCore 3.0GHz", "ram": "8GB", "storage": "256GB NVMe", "graphics": "NVIDIA GTX 1060", "price": "$1199.99"}',
'{"id": 15, "brand": "MegaTech", "model": "TurboWave", "cpu": "HexaCore 2.2GHz", "ram": "128GB", "storage": "2TB SSD", "graphics": "AMD RX 580", "price": "$1699.99"}',
'{"id": 16, "brand": "QuantumFusion", "model": "RapidX", "cpu": "QuadCore 3.2GHz", "ram": "16GB", "storage": "512GB HDD", "graphics": "Intel UHD Graphics", "price": "$1299.99"}',
'{"id": 17, "brand": "EpicPulse", "model": "BlitzTech", "cpu": "OctaCore 2.8GHz", "ram": "32GB", "storage": "1TB SSD", "graphics": "NVIDIA RTX 3060", "price": "$1499.99"}',
'{"id": 18, "brand": "TechNova", "model": "XtremeSpeed", "cpu": "HexaCore 3.5GHz", "ram": "64GB", "storage": "256GB SSD", "graphics": "GeForce GTX 1660", "price": "$1599.99"}',
'{"id": 19, "brand": "CyberBlaze", "model": "ThunderX", "cpu": "QuadCore 2.2GHz", "ram": "8GB", "storage": "512GB SSD", "graphics": "AMD RX 5600 XT", "price": "$1199.99"}',
'{"id": 20, "brand": "FutureTech", "model": "SwiftFusion", "cpu": "HexaCore 3.0GHz", "ram": "128GB", "storage": "1TB NVMe", "graphics": "NVIDIA GTX 1070", "price": "$1799.99"}'
] AS JSONs
UNWIND JSONs AS JSON
RETURN JSON
This is a set of JSON-encoded data pertaining to an inventory of computers. It’s not part of our actual Neo4J database, but a kind of in-memory table of strings we conjured up to practice Cypher manipulations. Think of it as a small sample of the kind of language you could find in much larger quantities in a real database.
I’ll say at the outset, ordinarily you would want to keep parsing logic, as much as possible, outside of the database tier. Some say it’s a better task for application servers. Nonetheless, there are plenty of situations where doing this within the database may be your best option.
We want to separate out this data into proper fields, the kinds which we could use to perform valid statistical calculations. Of course you can pull these values out with some regexes, but this is ill-advised. In Cypher, we have ready-made JSON parsing functionality, and we should use it.
Here’s the entirety of the Cypher script we are looking for, with explanations following:
WITH [
'{"id": 1, "brand": "TechMaster", "model": "X9000", "cpu": "QuadCore 3.0GHz", "ram": "16GB", "storage": "512GB SSD", "graphics": "NVIDIA GTX 1080", "price": "$1299.99"}',
'{"id": 2, "brand": "QuantumTech", "model": "UltraCore", "cpu": "OctaCore 2.5GHz", "ram": "32GB", "storage": "1TB HDD", "graphics": "AMD RX 5700", "price": "$1499.99"}',
'{"id": 3, "brand": "CyberFusion", "model": "StealthX", "cpu": "HexaCore 2.8GHz", "ram": "64GB", "storage": "256GB NVMe", "graphics": "GeForce RTX 3060", "price": "$1699.99"}',
'{"id": 4, "brand": "SysGenius", "model": "ElitePro", "cpu": "QuadCore 3.2GHz", "ram": "8GB", "storage": "1TB SSD", "graphics": "Intel UHD Graphics", "price": "$1099.99"}',
'{"id": 5, "brand": "MegaByte", "model": "PowerSpeed", "cpu": "OctaCore 3.5GHz", "ram": "128GB", "storage": "2TB HDD", "graphics": "NVIDIA RTX 3080", "price": "$1999.99"}',
'{"id": 6, "brand": "NeoGaming", "model": "TitanX", "cpu": "HexaCore 3.0GHz", "ram": "16GB", "storage": "512GB NVMe", "graphics": "AMD RX 5600 XT", "price": "$1299.99"}',
'{"id": 7, "brand": "Futurix", "model": "FusionWave", "cpu": "QuadCore 2.7GHz", "ram": "32GB", "storage": "1TB SSD", "graphics": "GeForce GTX 1660", "price": "$1399.99"}',
'{"id": 8, "brand": "TechWave", "model": "SwiftForce", "cpu": "OctaCore 2.2GHz", "ram": "64GB", "storage": "256GB SSD", "graphics": "NVIDIA GTX 1070", "price": "$1599.99"}',
'{"id": 9, "brand": "QuantumPulse", "model": "Xplorer", "cpu": "HexaCore 2.5GHz", "ram": "8GB", "storage": "512GB HDD", "graphics": "AMD RX 550", "price": "$1099.99"}',
'{"id": 10, "brand": "EpicGen", "model": "ThunderBlaze", "cpu": "QuadCore 3.5GHz", "ram": "128GB", "storage": "2TB SSD", "graphics": "Intel Iris Xe", "price": "$1899.99"}',
'{"id": 11, "brand": "CyberPulse", "model": "NovaSpeed", "cpu": "OctaCore 3.2GHz", "ram": "16GB", "storage": "1TB NVMe", "graphics": "NVIDIA RTX 3070", "price": "$1799.99"}',
'{"id": 12, "brand": "TechFusion", "model": "VelocityX", "cpu": "HexaCore 2.8GHz", "ram": "32GB", "storage": "512GB SSD", "graphics": "AMD RX 570", "price": "$1299.99"}',
'{"id": 13, "brand": "InnoTech", "model": "BlazePro", "cpu": "QuadCore 2.5GHz", "ram": "64GB", "storage": "1TB HDD", "graphics": "GeForce GTX 1650", "price": "$1499.99"}',
'{"id": 14, "brand": "FusionTech", "model": "SwiftNova", "cpu": "OctaCore 3.0GHz", "ram": "8GB", "storage": "256GB NVMe", "graphics": "NVIDIA GTX 1060", "price": "$1199.99"}',
'{"id": 15, "brand": "MegaTech", "model": "TurboWave", "cpu": "HexaCore 2.2GHz", "ram": "128GB", "storage": "2TB SSD", "graphics": "AMD RX 580", "price": "$1699.99"}',
'{"id": 16, "brand": "QuantumFusion", "model": "RapidX", "cpu": "QuadCore 3.2GHz", "ram": "16GB", "storage": "512GB HDD", "graphics": "Intel UHD Graphics", "price": "$1299.99"}',
'{"id": 17, "brand": "EpicPulse", "model": "BlitzTech", "cpu": "OctaCore 2.8GHz", "ram": "32GB", "storage": "1TB SSD", "graphics": "NVIDIA RTX 3060", "price": "$1499.99"}',
'{"id": 18, "brand": "TechNova", "model": "XtremeSpeed", "cpu": "HexaCore 3.5GHz", "ram": "64GB", "storage": "256GB SSD", "graphics": "GeForce GTX 1660", "price": "$1599.99"}',
'{"id": 19, "brand": "CyberBlaze", "model": "ThunderX", "cpu": "QuadCore 2.2GHz", "ram": "8GB", "storage": "512GB SSD", "graphics": "AMD RX 5600 XT", "price": "$1199.99"}',
'{"id": 20, "brand": "FutureTech", "model": "SwiftFusion", "cpu": "HexaCore 3.0GHz", "ram": "128GB", "storage": "1TB NVMe", "graphics": "NVIDIA GTX 1070", "price": "$1799.99"}'
] AS JSONs
UNWIND JSONs AS JSON
WITH apoc.convert.fromJsonMap(JSON) as parsed
WITH parsed.id AS id,
parsed.brand AS brand,
parsed.model AS model,
tail(
apoc.text.regexGroups(
parsed.cpu,
"^([Qq][Uu][Aa][Dd]|[Hh][Ee][Xx][Aa]|[Oo][Cc][Tt][Aa])[Cc][Oo][Rr][Ee][ ](\\d+[.]?\\d*)GHz$"
)[0]
) AS cpuSpecs,
toInteger(
apoc.text.regexGroups(
parsed.ram,
"^([0-9]+)[Gg][Bb]$"
)[0][1]
) AS ramGB,
tail(
apoc.text.regexGroups(
parsed.storage,
"^([0-9]+)([GgTt][Bb])[ ]([A-Za-z0-9_.-]+)$"
)[0]
) AS storage,
parsed.graphics AS graphics,
toFloat(
apoc.text.regexGroups(
parsed.price,
"^[$](.*)$"
)[0][1]
) AS priceDollars
WITH id, brand, model, graphics, priceDollars,
CASE cpuSpecs[0]
WHEN "Quad" THEN 4
WHEN "Hexa" THEN 6
WHEN "Octa" THEN 8
ELSE -1
END AS cpuCores, toFloat(
cpuSpecs[1]
) AS cpuGHz, ramGB,
toInteger(
storage[0]
) AS storageSize, storage[1] AS storageUnit, storage[2] AS storageType
RETURN id, brand, model, graphics,
priceDollars, cpuCores, cpuGHz, ramGB,
CASE storageUnit
WHEN "GB" THEN storageSize
ELSE storageSize * 1000
END AS storageGB
JSON
property
containing one of the strings from the original array.apoc.convert.fromJsonMap
to create
a Cypher map data structure based on the JSON string. This parses the
JSON string and allows us to start referencing the values stored with
its keys.WITH
to destructure the key-value pairs from the
parsed
field containing the map we just created. For
several of them, we perform various transformations via regex:
cpuSpecs
is produced, capturing
the core quantity and gigahertz rating. We pass the resulting array to
the tail
function to return the entire array passed to it
except for the first element (we don’t want to retain the entire matched
string, only the individual capturing groups).ramGB
value is recognized, extracted, and
converted to a Cypher integer.storage
as we used for
cpuSpecs
$
character and then convert the resulting value to a
Cypher floatWITH
clause we use to project the results into
a further transformed format which accomplishes a few things:
CASE
clause (Cypher’s version of switch statement)
allows us to convert the core count indicator words into integer
valuesRETURN
the final version of our results,
performing a final transformation via another CASE
statement that ensures our storage values are all in gigabyte
units.What a drastic transformation! It shows what kind of transformations you can accomplish with the assistance of JSON parsing utilities.
This is yet another example derived from my KQL course. It’s true these fictional documents are quite contrived, but this is a good practice problem for learning how to construct your own rudimentary parsing routines in a language like Cypher.
Following is the initial table of strings. I’m not going to copy and paste the table again as we proceed, so just remember that the subsequent commands in this section are operating on the result of this initial set of strings:
WITH [
"computer = server016; <batches><batch>{\"DurationMS\": 90, \"Payload\": \"- Name: John Doe\\n DOB: 1990-05-15\\n Email: john.doe@email.com\\n- Name: Jane Smith\\n DOB: 1985-08-22\\n Email: jane.smith@email.com\\n- Name: Alex Brown\\n DOB: 1995-11-10\\n Email: alex.brown@email.com\"}</batch></batches>",
"computer = server081; <batches><batch>{\"DurationMS\": 81, \"Payload\": \"- name: Emily White\\n dob: 1992-09-18\\n email: emily.white@email.com\\n- name: Mark Johnson\\n dob: 1980-03-05\\n email: mark.johnson@email.com\\n- name: Sarah Miller\\n dob: 1995-11-30\\n email: sarah.miller@email.com\"}</batch><batch>{\"DurationMS\": 65, \"Payload\": \"- name: Alex Turner\\n dob: 1987-06-24\\n email: alex.turner@email.com\\n- name: Jessica Adams\\n dob: 1994-12-15\\n email: jessica.adams@email.com\\n- name: Brian Parker\\n dob: 1982-08-03\\n email: brian.parker@email.com\\n- name: Rachel Lee\\n dob: 1998-02-09\\n email: rachel.lee@email.com\\n- name: Kevin Scott\\n dob: 1989-05-17\\n email: kevin.scott@email.com\"}</batch></batches>",
"computer = server056; <batches><batch>{\"DurationMS\": 72, \"Payload\": \"- name: Olivia Jenkins\\n dob: 1991-10-07\\n email: olivia.jenkins@email.com\\n- name: Marcus Turner\\n dob: 1985-04-18\\n email: marcus.turner@email.com\"}</batch></batches>",
"computer = server081; <batches><batch>{\"DurationMS\": 15, \"Payload\": \"- name: Laura Davis\\n dob: 1993-08-26\\n email: laura.davis@email.com\\n- name: Ethan Wilson\\n dob: 1984-12-03\\n email: ethan.wilson@email.com\\n- name: Amanda Parker\\n dob: 1990-02-14\\n email: amanda.parker@email.com\\n- name: Tyler Mitchell\\n dob: 1988-06-11\\n email: tyler.mitchell@email.com\"}</batch><batch>{\"DurationMS\": 48, \"Payload\": \"- name: Grace Roberts\\n dob: 1997-03-29\\n email: grace.roberts@email.com\"}</batch><batch>{\"DurationMS\": 7, \"Payload\": \"- name: Benjamin Hall\\n dob: 1981-09-22\\n email: benjamin.hall@email.com\\n- name: Natalie Adams\\n dob: 1996-11-15\\n email: natalie.adams@email.com\\n- name: Christopher Hill\\n dob: 1986-07-01\\n email: christopher.hill@email.com\"}</batch></batches>",
"computer = server056; <batches><batch>{\"DurationMS\": 33, \"Payload\": \"- name: Lily Moore\\n dob: 1999-01-12\\n email: lily.moore@email.com\\n- name: Jackson Brown\\n dob: 1983-05-04\\n email: jackson.brown@email.com\\n- name: Chloe Taylor\\n dob: 1995-09-17\\n email: chloe.taylor@email.com\"}</batch><batch>{\"DurationMS\": 13, \"Payload\": \"- name: Daniel Wright\\n dob: 1989-12-30\\n email: daniel.wright@email.com\\n- name: Sophia Smith\\n dob: 1992-04-23\\n email: sophia.smith@email.com\"}</batch></batches>",
"computer = server016; <batches><batch>{\"DurationMS\": 9, \"Payload\": \"- name: Ryan Miller\\n dob: 1987-11-08\\n email: ryan.miller@email.com\"}</batch></batches>",
"computer = server091; <batches><batch>{\"DurationMS\": 8, \"Payload\": \"- name: Emma Turner\\n dob: 1994-06-20\\n email: emma.turner@email.com\\n- name: Noah Johnson\\n dob: 1980-02-01\\n email: noah.johnson@email.com\\n- name: Madison Scott\\n dob: 1998-07-14\\n email: madison.scott@email.com\\n- name: Owen Harris\\n dob: 1982-10-27\\n email: owen.harris@email.com\"}</batch></batches>"
] AS Documents
Next, we unwind the document strings so each record has one
Document
string:
UNWIND Documents as Document
We use regex extract the computer name listed at the start as well as
the XML document that takes up most of the rest of the string. We use
tail as we have before to skip the intial element in the regex matches
array, which is the entire matched string itself. This is all assigned
to the DocParts
field:
WITH tail(
apoc.text.regexGroups(
Document,
"^[cC]omputer[ ]*[=][ ]*([^;]+)[;][ ]*([\\s\\S]+)$"
)[0]
) AS DocParts
A random UUID is produced and assigned to DocumentID
.
The ComputerName
and XML
are separated out
from the DocParts
field into their own fields:
WITH randomUUID() AS DocumentID, DocParts[0] AS ComputerName, DocParts[1] AS XML
We parse the XML document field into a Cypher map:
WITH DocumentID, ComputerName, apoc.xml.parse(
XML
) as parsed
We UNWIND
the _children
array within the
parsed
field which was produced from the XML parsing. These
represent the batch elements from the XML documents. We assign the
result to the BatchJSON
field:
UNWIND parsed._children AS BatchJSON
We reach into the BatchJSON
field and pull out the
_text
field, parsing the JSON it contains into a Cypher map
which we call Batch
:
WITH DocumentID,
ComputerName,
apoc.convert.fromJsonMap(
BatchJSON._text
) AS Batch
Due to the earlier unwinding, some of the DocumentID
values are now repeated across multiple records. We create another UUID
for each record and call it BatchID
. We also split out the
DurationMS
integer value and the Payload
(YAML-formatted) string value from the Batch
field:
WITH DocumentID,
ComputerName,
randomUUID() AS BatchID,
Batch.DurationMS AS DurationMS,
Batch.Payload AS Payload
We split the Payload
string on newline characters to
separate the YAML records. This produces a list of strings we call
YAMLs
:
WITH DocumentID,
BatchID,
ComputerName,
DurationMS,
split(
Payload,
"\n"
) as YAMLs
In addition to the list of YAML strings, we require a companion array
which contains the indices of each element in the YAML list. We
accomplish this with the range
function, which creates a
list of integers ranging from 0 to one less than the length of the list
of YAML strings:
WITH DocumentID,
BatchID,
ComputerName,
DurationMS,
YAMLs,
range(
0,
size(YAMLs) - 1
) AS YAMLsIndices
Now we need to unwind the list of YAML indices:
UNWIND YAMLsIndices As YAMLIndex
We filter all records so that only the ones with a YAML index divisible by 3 remain, as these are the indices of the start of each YAML record (each one spans 3 lines):
WITH DocumentID,
BatchID,
ComputerName,
DurationMS,
YAMLs,
YAMLIndex
WHERE YAMLIndex % 3 = 0
Wrapping up, we create a unique UUID to serve as record ID for final records. Additionally, we split out the YAML strings into dedicated fields by way of our index value and some arithmetic, subjecting the values to some final post-processing with regex (optional).
RETURN DocumentID,
BatchID,
randomUUID() AS RecordID,
ComputerName,
DurationMS,
apoc.text.regexGroups(
YAMLs[YAMLIndex],
"^[-][ ]name[:][ ](.*)$"
)[0][1] AS Name,
apoc.text.regexGroups(
YAMLs[YAMLIndex + 1],
"^[ ]{2}dob[:][ ](.*)$"
)[0][1] AS DOB,
apoc.text.regexGroups(
YAMLs[YAMLIndex + 2],
"^[ ]{2}email[:][ ](.*)$"
)[0][1] AS Email
The primary way to create new nodes and relationships in your
database is with the CREATE
command. The syntax is similar
to the syntax we’ve used with the MATCH
command.
Let’s add a new type of node to our Northwind database:
CREATE (:Reviewer { name: "Mr. Smithers" }), (:Reviewer:Approver { name: "Mr. Mann" })
We added two nodes to our database graph, each with a label of
Reviewer
and each having its own name
property. We also gave Mr. Mann an additional label of
Approver
.
We can create entire paths with this syntax which includes relationships. Additionally, we can match existing nodes first, assign them to variables, and reference those matched nodes in a path we are creating.
For example:
MATCH (MrSmithers:Reviewer { name: "Mr. Smithers" }), (MrMann:Approver { name: "Mr. Mann" })
CREATE (MrMann)-[:SUPERVISES]->(MrSmithers)
This last command first retrieves nodes we created earlier and
assigns them to variables. Then, we refer to those variables to specify
the nodes that are being connected by the SUPERVISES
relationship we are creating. Of course, we could also have created the
relationship in the first command that we ran.
A different method of creating nodes is available. This method,
called MERGE
, will do the same thing as CREATE
if the node described doesn’t exist, and if it does exist, the node will
simply be MATCH
ed and bound to the specified variable.
MERGE MyPath=(MrSmithers:Reviewer { name: "Mr. Smithers" })<-[:EMPLOYS]-(MrMann:Approver { name: "Mr. Mann" })
RETURN MyPath
A simple way of updating details of your graph elements is with the
SET
command. It can update node labels and properties, and
for relationships it can update properties.
MERGE MyPath=(MrSmithers:Reviewer { name: "Mr. Smithers" })<-[e:EMPLOYS]-(MrMann:Approver { name: "Mr. Mann" })
SET MrSmithers.Rating = 85, MrMann.Rating = 75, e.since = "1995-03-15"
RETURN MyPath
That should be enough to get you started exploring the Matrix with Neo4J Cypher. Thanks for reading, hope this was useful.