January 2026
After the initial release of my JavaScript Adaptive Radix Tree project where we compiled massive regular expressions that match all the unique words in Moby Dick and the Complete Works of Shakespeare, I was mulling over diagram-as-code options for planning future projects.
I happened upon Mermaid.js State Diagrams. This would seem to be exactly what I was after! The label on the tin hits a lot of notes:
As I was researching this Mermaid.js library, it struck me that these features could be a great candidate for possibly rendering visual representations of the non-deterministic finite automata that correspond to the huge regular expressions we compiled for unique words in Moby Dick and Shakespeare.
So, I set out to do just that!
And actually, I had a lot of problems with Mermaid — enough that I went elswhere. This isn’t a hit piece by any means, but I was disappointed and so do want to note why I ended up abandoning the library.
The first reason for my chagrin was that (as of this writing in late December 2025) the Mermaid-CLI tool is not usable on ARM64 processors. This means you can’t use it in Termux on Android (huge bummer!), but it also means you can’t even run this on a Raspberry PI.
That’s seriously disappointing. Let’s say I don’t want something resource- or time-intensive running on my phone; in that case I might want to throw it on the PI and check back later. But we cannot do this because ARM64 is not supported.
Wow! I guess that means Mermaid is doing some really fancy and excellent things. So I plan on using my laptop for the build and meanwhile move on with development.
The Mermaid documentation, while giving some syntax examples, does
not appear to have actual BNF or other grammar. It’s not the end of the
world but it does require some more elbow grease to derive correct
syntax in some cases. It did not prevent me from ultimately producing a
valid .mmd file for mermaid-cli.
Finally, it was the moment of truth. I had compiled my
.mmd file of commands from the intermediate result returned
by my Adaptive Radix Tree’s toRegex method. The resulting
file was in the neighborhood of 3MB in size. I was ready to run the
mermaid-cli to compile an epic SVG.
I opened my gaming laptop and hit run.
Before it would really run I had to research and create two config files to override various default limits (a timeout and a text limit) to give Mermaid a chance to actually run the file. But in the end, it did start running the file.
It ran for a while.
And then it ran some more.
And it kept running…
After some time it became clear to me it could run for an unrealistic
amount of time. Some cursory research gave weight to that theory given
the size of my .mmd file.
While it was still running, I looked into alternate options.
GraphVizGraphViz has a number of things going for it.
pkg managerI modified my compilation script to target graphviz format and
voila, I had a .gv command file ready. I plugged
my phone in and kicked off this command in Termux:
dot -Tsvg mobyDick.gv -o mobyDick.svg
I laid down for a nap and upon waking a little more than an hour later, GraphViz had finished rendering the SVG… and Mermaid was still churning on my gaming laptop.
We have a winner!
The SVG for the NFA which recognizes all the words in Moby Dick is around 44MB in size. The aspect ratio when properly rendered is a wild 1×~307. That means if I physically printed it on a banner with a height of 1 foot, the banner would be longer than an American football field!
The SVG for the Complete Works of Shakespeare NFA (compiled after Moby Dick) is about 66MB. Actually, it took more than 4 hours to complete in Termux (with full battery and mostly plugged in, too); since the file sizes are not even greater by a factor of two, this to me indicates even the GraphViz algorithms are superlinear somehow in terms of time complexity. However, clearly far faster than Mermaid. The aspect ratio for the Shakespeare SVG is about 1×328, or more than 1.25 American football fields if printed on an a 1-foot tall banner.
And so it is hard to really get a proper view of these huge state machine diagrams in any medium, but I found LibreOffice Draw as usual just works for this random need I had.
These extreme aspect ratios are a testament to one of the chief advantages of trie data structures and also trie-style NFAs (trie regexes) for recognizing natural language words. That is, the height of the data structures, and therefore time required for membership tests, is limited to the length of words stored therein. As additional words are added, primarily the data structures keep growing horizontally.
A bit about the compilation procedure and how it differs from what we’ve done so far with the Radix Tree project.
toRegex Intermediate ResultThe intermediate result from the ART’s toRegex is a
tree-like data structure. The root is always a Group (Array-like with a
boolean instance property called optional). Each Group has
a layout like this:
ByteStack — if you recall from the ART project
overview this is a vector data structure for byte values.ByteStack or another group.Based on the source representation of this data in the ART, we
interpret differently the meaning of the boundaries in the array between
ByteStack and Group instances depending on the order of the
entries. In cases where a ByteStack preceeds a group, this is a place
where a character in a regex pattern would be followed by a
concatenation operator and then the start of a group. Conversely, in
cases where a ByteStack succeeds a Group, this means the
regex alternation operator would go between them. Finally, abutting
ByteStacks naturally indicate the alternation operator
between two strings.
And so in the regular expression compilation adventures we determined, in a somewhat odd way, whether two neighboring entries in a Group were separated by an alternation or concatenation operator by way of a quick analysis of the last token pushed onto the output tokens array.
While okay for the purposes of compiling a regex pattern, a more robust approach was called for when compiling diagram-as-code commands for an NFA.
Before commencing my iterative depth-first traversal of the intermediate result to produce rules, I did a recursive traversal which reduced each Group array to another array of entries potentially smaller in size. Mainly, cases where a ByteStack had a Group concatenated after it were transformed to an instance of a single compound type.
This makes managing the current prefix of the depth-first traversal tremendously easier and more accurate! Importantly, both Mermaid.js and GraphViz formats require unique identifier strings for each vertex defined. Naturally for that I use the prefix of the node/vertex.
Anyway, for a closer look at the compilation logic check the
*.diagram.js files in the ART project’s
benchmarks/ directory.