I’ve been running a little RSS aggregator called Dev Signal for a while now. It pulls from 17+ developer leadership sources — newsletters, blogs, podcasts show notes — and dumps everything into a Next.js 14 app backed by SQLite. Personal use, shared with a small team.
At some point the article count hit 50k and searching with LIKE '%keyword%' started feeling noticeably slow. More importantly, ranking by relevance was basically nonexistent. So I finally sat down to add real full-text search.
My first instinct was “just use Algolia” or spin up Elasticsearch. But both felt like massive overkill for what is genuinely a personal tool. I was already on SQLite (via better-sqlite3). Turns out SQLite ships with FTS5, a full-text search extension, and it’s actually pretty capable.
What FTS5 Gets You
FTS5 is a virtual table module in SQLite. You create a shadow table that indexes your text columns, and it gives you a MATCH operator plus BM25 relevance ranking out of the box. No external process, no network round-trips, no infra to maintain.
For my use case:
- Sub-10ms query times on 50k rows
- BM25 ranking that actually surfaces relevant results
- Zero new dependencies (already on
better-sqlite3)
Setting It Up
First, create the FTS5 virtual table. The content=articles part tells FTS5 to use articles as the backing table — it stores only the index, not the raw text, so you don’t double your storage.
CREATE VIRTUAL TABLE IF NOT EXISTS articles_fts
USING fts5(
title,
content,
title_zh,
content_zh,
content=articles,
content_rowid=id
);
Then seed it from the existing data:
INSERT INTO articles_fts(rowid, title, content, title_zh, content_zh)
SELECT id, title, content, title_zh, content_zh FROM articles;
Keeping It In Sync With Triggers
The annoying part is the FTS index doesn’t update automatically when you insert or update rows in articles. You need triggers.
CREATE TRIGGER articles_ai AFTER INSERT ON articles BEGIN
INSERT INTO articles_fts(rowid, title, content, title_zh, content_zh)
VALUES (new.id, new.title, new.content, new.title_zh, new.content_zh);
END;
CREATE TRIGGER articles_ad AFTER DELETE ON articles BEGIN
DELETE FROM articles_fts WHERE rowid = old.id;
END;
CREATE TRIGGER articles_au AFTER UPDATE ON articles BEGIN
UPDATE articles_fts
SET title = new.title,
content = new.content,
title_zh = new.title_zh,
content_zh = new.content_zh
WHERE rowid = old.id;
END;
It’s a bit verbose but this is the standard FTS5 pattern for content tables. Write it once, forget it.
Querying With BM25 Ranking
In Node.js with better-sqlite3, the query looks like this:
const search = db.prepare(`
SELECT
a.id,
a.title,
a.published_at,
a.source,
bm25(articles_fts) AS rank
FROM articles_fts
JOIN articles a ON a.id = articles_fts.rowid
WHERE articles_fts MATCH ?
ORDER BY rank
LIMIT 20
`);
const results = search.all(query);
BM25 scores are negative in SQLite FTS5 (more negative = more relevant), so ORDER BY rank without DESC gives you the best results first. Tripped me up the first time.
Where It Falls Short
A few real limitations I ran into:
No fuzzy matching. If you search “indexing” and an article says “indexes”, FTS5 won’t connect those unless you use the * prefix operator (index*). Elasticsearch handles stemming much better.
Chinese tokenization is rough. FTS5’s default tokenizer splits on whitespace, which breaks completely for CJK text. I ended up pre-processing Chinese content with a jieba-style segmenter before inserting it, adding spaces between words so the tokenizer has something to work with. It’s a hack but it works well enough for my traffic volume.
No distributed search. But I’m on a VPS with one SQLite file, so this wasn’t even a consideration.
WAL Mode Matters
One thing worth calling out: if you’re doing concurrent reads (Next.js API routes hit the DB in parallel), make sure you’re in WAL mode:
PRAGMA journal_mode=WAL;
Without it, a write locks the whole database and readers block. With WAL, readers and writers don’t block each other. Made a noticeable difference during page loads that fire multiple queries.
The whole FTS5 setup took me an afternoon. For a personal project at this scale, it’s genuinely all I needed — no new infra, no API keys, just SQL.