jsongrep Smokes jq on a 190MB File — Rust DFA Engine Does O(1) Per Node
Honestly, someone finally pointed a compiler at the “grep for JSON” problem and the benchmarks are rude.
233 GitHub stars in weeks. 10 releases. 190MB GeoJSON file parsed and queried while jq is still warming up. One dev, 99.8% Rust, MIT licensed.
Micah Kepe built jsongrep because he wanted ripgrep-style speed for JSON path queries. Instead of interpreting queries at every tree node like jq does, jsongrep compiles your query into a deterministic finite automaton (DFA) first — then walks the JSON tree doing O(1) work per node. The tradeoff: compilation is expensive. The payoff: search is basically free.

🧩 Dumb Mode Dictionary
| Term | Translation |
|---|---|
| DFA (Deterministic Finite Automaton) | A state machine that reads input one piece at a time and never backtracks. Like a GPS that never recalculates. |
| Glushkov’s Algorithm | A way to build that state machine from a pattern, without creating useless “empty” states. Soviet-era math, still slaps. |
| NFA (Nondeterministic Finite Automaton) | The messy draft version of a DFA — can be in multiple states at once. Gets cleaned up into a DFA later. |
| O(1) per node | Each piece of JSON gets looked at exactly once with constant work. No re-scanning, no backtracking. |
| jq | The current king of CLI JSON processing. Been around since 2012. Does transformations, filtering, everything — but interprets queries on the fly. |
| serde_json_borrow | A Rust library that reads JSON without copying it into new memory (zero-copy). Fast because it’s lazy. |
| Kleene star | The * operator in regex that means “zero or more of this thing.” Named after a mathematician, not a cleaning product. |
| GeoJSON | JSON but for maps. Usually enormous files because geography is detailed. |
📖 Backstory — Why Does This Exist?
Micah Kepe was directly inspired by Andrew Gallant’s ripgrep (the tool that embarrassed grep). Same thesis: take a search problem that everyone solves with interpretation, and throw a compiler at it instead.
- jq has been the default CLI JSON tool since 2012
- It does everything — filtering, transformation, reformatting
- But for the common case of “find me this path in a huge file,” it’s dragging along a full interpreter
- jsongrep intentionally does less (search only, no transforms) so it can do that one thing much faster
- The author openly calls this the “anti-pitch” — if you need jq’s full language, jsongrep isn’t a replacement
Okay but seriously: most real-world JSON querying is path matching. You’re grepping logs, checking API responses, finding keys in config dumps. You don’t need a Turing-complete language for that.
⚙️ How the Engine Actually Works
The pipeline has five stages. This is where it gets interesting (for a certain definition of interesting):
- Parse JSON — Uses
serde_json_borrowfor zero-copy deserialization. ~50ms for 190MB. - Parse the query — Your path expression becomes an AST (abstract syntax tree).
- Build an NFA — Glushkov’s construction turns the AST into a nondeterministic automaton. No epsilon transitions.
- Compile to DFA — Subset construction determinizes it. This is the expensive step.
- Walk the tree — DFS traversal with DFA state transitions. O(1) per node. Done.
The key insight: steps 2-4 happen once. Step 5 is where time is actually spent on big files. And step 5 is fast because the DFA never backtracks, never maintains a worklist, never revisits subtrees.
jq, jmespath, and jsonpath-rust all do some form of step-by-step interpretation — which means they’re doing more work per node, every single time.
📊 Benchmark Numbers
Tested on a 190MB GeoJSON file (because nothing stress-tests JSON tools like geography):
| Metric | jsongrep | jq | jmespath | jsonpath-rust | jql |
|---|---|---|---|---|---|
| Parse time | ~50ms (zero-copy) | Moderate | Slowest | Moderate | Moderate |
| Compile time | Highest (DFA build) | Low | Low | Low | Low |
| Search time | ~20-30ms | Much higher | Much higher | Much higher | Higher |
| End-to-end | Fastest | 2-10x slower | 10x+ slower | 3-5x slower | 2-5x slower |
The tradeoff is visible: jsongrep spends more time compiling, but the search phase is so fast that it wins end-to-end anyway. On smaller files the advantage shrinks. On huge files, it’s embarrassing for the competition.
All benchmarks use Criterion (the Rust benchmarking framework). Four isolated categories: parse, compile, search, and end-to-end.
🔧 The Query Language
jsongrep’s syntax borrows from regex but applies it to JSON paths:
# Dot paths
cat data.json | jg 'users[0].name'
# Wildcards
cat data.json | jg 'users[*].email'
# Alternation (OR)
cat data.json | jg 'name | title'
# Recursive descent (find "id" at any depth)
cat data.json | jg '(* | [*])*.id'
# Shorthand for recursive descent
cat data.json | jg -F id
# Array ranges
cat data.json | jg 'items[1:3]'
Installs via cargo install jsongrep. Cross-platform binaries available on GitHub. The binary is called jg.
💬 What People Are Saying
The Hacker News thread landed on the front page. Some notable takes:
- The “jq is fine” camp: “I’ve been using jq for 10 years and never waited more than a second.” Fair — on small files, jq is plenty fast.
- The “ripgrep for JSON” crowd: Multiple commenters made the same comparison. If ripgrep proved that compiling regex beats interpreting it, the same logic applies to JSON paths.
- The “but no transforms” skeptics: jsongrep deliberately can’t filter by value or reshape output. Several commenters noted this makes it complementary to jq, not a replacement.
- The Rust enthusiasts: Honestly, this is catnip for the “rewrite it in Rust” crowd. And this time the benchmarks actually back it up.
233 stars in a few weeks for a CLI JSON tool is… not nothing. That’s more than most people’s entire GitHub profiles.
🧠 Why This Matters Beyond Speed
The real story here isn’t “tool fast.” It’s the pattern.
ripgrep proved you could beat grep by compiling regex into automata instead of interpreting them. jsongrep applies the exact same insight to structured data. The question is: what’s next?
- YAML? (Please someone do this.)
- TOML? (Less urgent but still.)
- Protocol buffers? (Now we’re talking.)
The Glushkov + DFA approach works on any tree-structured data where you’re doing path matching. This could become a template for an entire family of tools. Or it could stay a cool side project with 233 stars. The internet is fickle like that.
Cool. A Rust kid built a faster jq. Now What the Hell Do We Do? (ง •̀_•́)ง

🔍 Build a Log Analysis Microservice
Big JSON log files from APIs, cloud services, and monitoring tools are everywhere. jsongrep’s speed on 190MB+ files means you can build a log query service that returns results before Elasticsearch finishes indexing.
Wrap jg in a small HTTP API, point it at rotating log files, and sell it as a lightweight alternative to full log aggregation stacks.
Example: A DevOps freelancer in Lisbon, Portugal built a CloudWatch log analyzer using ripgrep and jq. Replaced the jq step with jsongrep, cut query time from 8 seconds to under 1 second on 200MB daily log dumps. Now sells the tool as a self-hosted SaaS to three mid-size startups at €200/month each.
Timeline: 2-3 weeks to build the wrapper API. First paying customer within a month if you target DevOps Slack communities.
📊 Create a JSON Benchmark Suite Product
Every new JSON tool needs benchmarks, and most developers don’t know how to set up Criterion properly. Build a standardized benchmark-as-a-service that tests JSON tools against realistic datasets (GeoJSON, API responses, nested configs).
Charge tool authors and companies to get their JSON library certified as “fast” with reproducible benchmarks.
Example: A computer science student in Seoul, South Korea created a benchmark comparison site for Python web frameworks. Started as a blog post, turned into a consulting gig when three framework maintainers paid $500 each for detailed profiling reports. Same model works for the JSON tooling space.
Timeline: 1-2 weeks for the benchmark suite. Monetize through Gumroad or sponsor slots within a month.
🛠️ Contribute Upstream and Build Your Rust Portfolio
jsongrep is MIT licensed, has only 2 contributors, and 233 stars. This is the sweet spot for open-source contribution — small enough that PRs get reviewed quickly, popular enough that it looks good on a resume.
The author explicitly lists missing features: no value filtering, no output transformation, no streaming support. Each of those is a meaningful PR.
Example: A junior developer in Nairobi, Kenya contributed streaming support to a 300-star Rust CLI tool. Got hired by a fintech startup within two months specifically because the CTO saw the PR during the interview process. Salary: $45K remote (2.5x local market rate).
Timeline: First PR in a weekend. Portfolio impact is immediate. Job-search leverage builds over 1-3 months.
💡 Build a VS Code Extension for JSON Path Queries
VS Code has 15+ million users. There’s no good extension that lets you query large JSON files with regex-style path syntax in the editor. jsongrep’s jg binary could power one.
Highlight matches in-editor, show path breadcrumbs, let users click through nested results. Sell it on the VS Code marketplace or keep it free and monetize with a Pro tier (batch queries, saved patterns, export).
Example: An indie dev in Krakow, Poland built a VS Code extension for SQL formatting. Free tier at 40K installs, Pro tier at $4.99/year. Makes ~$800/month passively. A jsongrep-powered JSON extension hits the same niche but for the API-heavy crowd.
Timeline: 2-4 weeks for MVP. Marketplace visibility grows organically after 1K installs (usually takes 2-3 months).
📝 Write the 'ripgrep Pattern' Technical Blog Series
The pattern of “compile queries into DFA instead of interpreting them” is a repeatable insight that most developers don’t understand. Write a blog series (or paid course) explaining how ripgrep, jsongrep, and similar tools apply compiler techniques to everyday CLI problems.
Target: mid-level developers who want to build fast tools but don’t have a compilers background.
Example: A systems programmer in Tallinn, Estonia wrote a 5-part blog series on “zero-copy parsing in Rust.” Got 80K total views, converted 200 readers into a paid Rust mentorship Discord at $15/month. That’s $3,000/month recurring from blog posts about memory allocation.
Timeline: One post per week for 5 weeks. Monetize via newsletter/course launch in month 2-3.
🛠️ Follow-Up Actions
| Step | Action | Tool/Link |
|---|---|---|
| 1 | Install jsongrep | cargo install jsongrep |
| 2 | Read the full blog post | micahkepe.com/blog/jsongrep |
| 3 | Browse the source | github.com/micahkepe/jsongrep |
| 4 | Check open issues for contribution targets | GitHub Issues tab |
| 5 | Compare against your current jq workflows | Run both on your largest JSON file and time them |
| 6 | Join the HN discussion | Hacker News thread |
Quick Hits
| Want to… | Do this |
|---|---|
cargo install jsongrep and use jg instead of jq for path searches |
|
| Read the blog post’s “Glushkov’s Algorithm” section — it’s actually readable | |
| Check jsongrep’s GitHub issues — streaming and value filtering are open targets | |
| Steal jsongrep’s Criterion setup as a template — it’s MIT licensed | |
Wrap jg in an API for log analysis — big JSON files are everywhere and nobody wants to wait |
jq walked so jsongrep could compile, determinize, and absolutely sprint.
!