Jsongrep Smokes jq on a 190MB File — Rust DFA Engine Does O(1) Per Node

NoBody · March 27, 2026, 2:03pm

jsongrep Smokes jq on a 190MB File — Rust DFA Engine Does O(1) Per Node

Honestly, someone finally pointed a compiler at the “grep for JSON” problem and the benchmarks are rude.

233 GitHub stars in weeks. 10 releases. 190MB GeoJSON file parsed and queried while jq is still warming up. One dev, 99.8% Rust, MIT licensed.

Micah Kepe built jsongrep because he wanted ripgrep-style speed for JSON path queries. Instead of interpreting queries at every tree node like jq does, jsongrep compiles your query into a deterministic finite automaton (DFA) first — then walks the JSON tree doing O(1) work per node. The tradeoff: compilation is expensive. The payoff: search is basically free.

Speed

🧩 Dumb Mode Dictionary

Term	Translation
DFA (Deterministic Finite Automaton)	A state machine that reads input one piece at a time and never backtracks. Like a GPS that never recalculates.
Glushkov’s Algorithm	A way to build that state machine from a pattern, without creating useless “empty” states. Soviet-era math, still slaps.
NFA (Nondeterministic Finite Automaton)	The messy draft version of a DFA — can be in multiple states at once. Gets cleaned up into a DFA later.
O(1) per node	Each piece of JSON gets looked at exactly once with constant work. No re-scanning, no backtracking.
jq	The current king of CLI JSON processing. Been around since 2012. Does transformations, filtering, everything — but interprets queries on the fly.
serde_json_borrow	A Rust library that reads JSON without copying it into new memory (zero-copy). Fast because it’s lazy.
Kleene star	The `*` operator in regex that means “zero or more of this thing.” Named after a mathematician, not a cleaning product.
GeoJSON	JSON but for maps. Usually enormous files because geography is detailed.

📖 Backstory — Why Does This Exist?

Micah Kepe was directly inspired by Andrew Gallant’s ripgrep (the tool that embarrassed grep). Same thesis: take a search problem that everyone solves with interpretation, and throw a compiler at it instead.

jq has been the default CLI JSON tool since 2012
It does everything — filtering, transformation, reformatting
But for the common case of “find me this path in a huge file,” it’s dragging along a full interpreter
jsongrep intentionally does less (search only, no transforms) so it can do that one thing much faster
The author openly calls this the “anti-pitch” — if you need jq’s full language, jsongrep isn’t a replacement

Okay but seriously: most real-world JSON querying is path matching. You’re grepping logs, checking API responses, finding keys in config dumps. You don’t need a Turing-complete language for that.

⚙️ How the Engine Actually Works

The pipeline has five stages. This is where it gets interesting (for a certain definition of interesting):

Parse JSON — Uses serde_json_borrow for zero-copy deserialization. ~50ms for 190MB.
Parse the query — Your path expression becomes an AST (abstract syntax tree).
Build an NFA — Glushkov’s construction turns the AST into a nondeterministic automaton. No epsilon transitions.
Compile to DFA — Subset construction determinizes it. This is the expensive step.
Walk the tree — DFS traversal with DFA state transitions. O(1) per node. Done.

The key insight: steps 2-4 happen once. Step 5 is where time is actually spent on big files. And step 5 is fast because the DFA never backtracks, never maintains a worklist, never revisits subtrees.

jq, jmespath, and jsonpath-rust all do some form of step-by-step interpretation — which means they’re doing more work per node, every single time.

📊 Benchmark Numbers

Tested on a 190MB GeoJSON file (because nothing stress-tests JSON tools like geography):

Metric	jsongrep	jq	jmespath	jsonpath-rust	jql
Parse time	~50ms (zero-copy)	Moderate	Slowest	Moderate	Moderate
Compile time	Highest (DFA build)	Low	Low	Low	Low
Search time	~20-30ms	Much higher	Much higher	Much higher	Higher
End-to-end	Fastest	2-10x slower	10x+ slower	3-5x slower	2-5x slower

The tradeoff is visible: jsongrep spends more time compiling, but the search phase is so fast that it wins end-to-end anyway. On smaller files the advantage shrinks. On huge files, it’s embarrassing for the competition.

All benchmarks use Criterion (the Rust benchmarking framework). Four isolated categories: parse, compile, search, and end-to-end.

🔧 The Query Language

jsongrep’s syntax borrows from regex but applies it to JSON paths:

# Dot paths
cat data.json | jg 'users[0].name'

# Wildcards
cat data.json | jg 'users[*].email'

# Alternation (OR)
cat data.json | jg 'name | title'

# Recursive descent (find "id" at any depth)
cat data.json | jg '(* | [*])*.id'

# Shorthand for recursive descent
cat data.json | jg -F id

# Array ranges
cat data.json | jg 'items[1:3]'

Installs via cargo install jsongrep. Cross-platform binaries available on GitHub. The binary is called jg.

💬 What People Are Saying

The Hacker News thread landed on the front page. Some notable takes:

The “jq is fine” camp: “I’ve been using jq for 10 years and never waited more than a second.” Fair — on small files, jq is plenty fast.
The “ripgrep for JSON” crowd: Multiple commenters made the same comparison. If ripgrep proved that compiling regex beats interpreting it, the same logic applies to JSON paths.
The “but no transforms” skeptics: jsongrep deliberately can’t filter by value or reshape output. Several commenters noted this makes it complementary to jq, not a replacement.
The Rust enthusiasts: Honestly, this is catnip for the “rewrite it in Rust” crowd. And this time the benchmarks actually back it up.

233 stars in a few weeks for a CLI JSON tool is… not nothing. That’s more than most people’s entire GitHub profiles.

🧠 Why This Matters Beyond Speed

The real story here isn’t “tool fast.” It’s the pattern.

ripgrep proved you could beat grep by compiling regex into automata instead of interpreting them. jsongrep applies the exact same insight to structured data. The question is: what’s next?

YAML? (Please someone do this.)
TOML? (Less urgent but still.)
Protocol buffers? (Now we’re talking.)

The Glushkov + DFA approach works on any tree-structured data where you’re doing path matching. This could become a template for an entire family of tools. Or it could stay a cool side project with 233 stars. The internet is fickle like that.

Cool. A Rust kid built a faster jq. Now What the Hell Do We Do? (ง •̀_•́)ง

Search Tool

🔍 Build a Log Analysis Microservice

Big JSON log files from APIs, cloud services, and monitoring tools are everywhere. jsongrep’s speed on 190MB+ files means you can build a log query service that returns results before Elasticsearch finishes indexing.

Wrap jg in a small HTTP API, point it at rotating log files, and sell it as a lightweight alternative to full log aggregation stacks.

Example: A DevOps freelancer in Lisbon, Portugal built a CloudWatch log analyzer using ripgrep and jq. Replaced the jq step with jsongrep, cut query time from 8 seconds to under 1 second on 200MB daily log dumps. Now sells the tool as a self-hosted SaaS to three mid-size startups at €200/month each.

Timeline: 2-3 weeks to build the wrapper API. First paying customer within a month if you target DevOps Slack communities.

📊 Create a JSON Benchmark Suite Product

Every new JSON tool needs benchmarks, and most developers don’t know how to set up Criterion properly. Build a standardized benchmark-as-a-service that tests JSON tools against realistic datasets (GeoJSON, API responses, nested configs).

Charge tool authors and companies to get their JSON library certified as “fast” with reproducible benchmarks.

Example: A computer science student in Seoul, South Korea created a benchmark comparison site for Python web frameworks. Started as a blog post, turned into a consulting gig when three framework maintainers paid $500 each for detailed profiling reports. Same model works for the JSON tooling space.

Timeline: 1-2 weeks for the benchmark suite. Monetize through Gumroad or sponsor slots within a month.

🛠️ Contribute Upstream and Build Your Rust Portfolio

jsongrep is MIT licensed, has only 2 contributors, and 233 stars. This is the sweet spot for open-source contribution — small enough that PRs get reviewed quickly, popular enough that it looks good on a resume.

The author explicitly lists missing features: no value filtering, no output transformation, no streaming support. Each of those is a meaningful PR.

Example: A junior developer in Nairobi, Kenya contributed streaming support to a 300-star Rust CLI tool. Got hired by a fintech startup within two months specifically because the CTO saw the PR during the interview process. Salary: $45K remote (2.5x local market rate).

Timeline: First PR in a weekend. Portfolio impact is immediate. Job-search leverage builds over 1-3 months.

💡 Build a VS Code Extension for JSON Path Queries

VS Code has 15+ million users. There’s no good extension that lets you query large JSON files with regex-style path syntax in the editor. jsongrep’s jg binary could power one.

Highlight matches in-editor, show path breadcrumbs, let users click through nested results. Sell it on the VS Code marketplace or keep it free and monetize with a Pro tier (batch queries, saved patterns, export).

Example: An indie dev in Krakow, Poland built a VS Code extension for SQL formatting. Free tier at 40K installs, Pro tier at $4.99/year. Makes ~$800/month passively. A jsongrep-powered JSON extension hits the same niche but for the API-heavy crowd.

Timeline: 2-4 weeks for MVP. Marketplace visibility grows organically after 1K installs (usually takes 2-3 months).

📝 Write the 'ripgrep Pattern' Technical Blog Series

The pattern of “compile queries into DFA instead of interpreting them” is a repeatable insight that most developers don’t understand. Write a blog series (or paid course) explaining how ripgrep, jsongrep, and similar tools apply compiler techniques to everyday CLI problems.

Target: mid-level developers who want to build fast tools but don’t have a compilers background.

Example: A systems programmer in Tallinn, Estonia wrote a 5-part blog series on “zero-copy parsing in Rust.” Got 80K total views, converted 200 readers into a paid Rust mentorship Discord at $15/month. That’s $3,000/month recurring from blog posts about memory allocation.

Timeline: One post per week for 5 weeks. Monetize via newsletter/course launch in month 2-3.

🛠️ Follow-Up Actions

Step	Action	Tool/Link
1	Install jsongrep	`cargo install jsongrep`
2	Read the full blog post	micahkepe.com/blog/jsongrep
3	Browse the source	github.com/micahkepe/jsongrep
4	Check open issues for contribution targets	GitHub Issues tab
5	Compare against your current jq workflows	Run both on your largest JSON file and time them
6	Join the HN discussion	Hacker News thread

Quick Hits

Want to…	Do this
Query huge JSON fast	`cargo install jsongrep` and use `jg` instead of `jq` for path searches
Understand the DFA trick	Read the blog post’s “Glushkov’s Algorithm” section — it’s actually readable
Contribute to a rising Rust project	Check jsongrep’s GitHub issues — streaming and value filtering are open targets
Benchmark your own JSON tools	Steal jsongrep’s Criterion setup as a template — it’s MIT licensed
Build something commercial	Wrap `jg` in an API for log analysis — big JSON files are everywhere and nobody wants to wait

jq walked so jsongrep could compile, determinize, and absolutely sprint.

Topic		Replies	Views
Sonic \| A Fast, Lightweight And Schema-less Search Backend Tools & Scripts seo , tools , programming , freebies , business	0	223	May 18, 2025
Typesense \| Fast, Typo Tolerant Search Engine For Building Delightful Search Experiences :star: Tools & Scripts tools , programming , hacking , freebies , data-recovery	0	235	August 15, 2024
Structured Text Tools \| Massive Collection :star: Tools & Scripts tools , freebies	1	2038	January 27, 2021
A Clean, Fast, and Privacy-Focused Developer Toolkit Tools & Scripts programming	0	314	March 2, 2026
Glom \| Restructuring data, the Python way Tools & Scripts tools , freebies , data-recovery	0	70	July 31, 2024

Jsongrep Smokes jq on a 190MB File — Rust DFA Engine Does O(1) Per Node

jsongrep Smokes jq on a 190MB File — Rust DFA Engine Does O(1) Per Node

Cool. A Rust kid built a faster jq. Now What the Hell Do We Do? (ง •̀_•́)ง

Related topics