[CODE]

SQLite FTS5 Is All You Need for Small-Scale Full-Text Search

I built a developer news aggregator and added full-text search with SQLite FTS5. No Elasticsearch, no Algolia, just the database I was already using.

4 min read
sqlite full-text-search nodejs database

I’ve been running a little RSS aggregator called Dev Signal for a while now. It pulls from 17+ developer leadership sources — newsletters, blogs, podcasts show notes — and dumps everything into a Next.js 14 app backed by SQLite. Personal use, shared with a small team.

At some point the article count hit 50k and searching with LIKE '%keyword%' started feeling noticeably slow. More importantly, ranking by relevance was basically nonexistent. So I finally sat down to add real full-text search.

My first instinct was “just use Algolia” or spin up Elasticsearch. But both felt like massive overkill for what is genuinely a personal tool. I was already on SQLite (via better-sqlite3). Turns out SQLite ships with FTS5, a full-text search extension, and it’s actually pretty capable.

What FTS5 Gets You

FTS5 is a virtual table module in SQLite. You create a shadow table that indexes your text columns, and it gives you a MATCH operator plus BM25 relevance ranking out of the box. No external process, no network round-trips, no infra to maintain.

For my use case:

  • Sub-10ms query times on 50k rows
  • BM25 ranking that actually surfaces relevant results
  • Zero new dependencies (already on better-sqlite3)

Setting It Up

First, create the FTS5 virtual table. The content=articles part tells FTS5 to use articles as the backing table — it stores only the index, not the raw text, so you don’t double your storage.

CREATE VIRTUAL TABLE IF NOT EXISTS articles_fts
USING fts5(
  title,
  content,
  title_zh,
  content_zh,
  content=articles,
  content_rowid=id
);

Then seed it from the existing data:

INSERT INTO articles_fts(rowid, title, content, title_zh, content_zh)
SELECT id, title, content, title_zh, content_zh FROM articles;

Keeping It In Sync With Triggers

The annoying part is the FTS index doesn’t update automatically when you insert or update rows in articles. You need triggers.

CREATE TRIGGER articles_ai AFTER INSERT ON articles BEGIN
  INSERT INTO articles_fts(rowid, title, content, title_zh, content_zh)
  VALUES (new.id, new.title, new.content, new.title_zh, new.content_zh);
END;

CREATE TRIGGER articles_ad AFTER DELETE ON articles BEGIN
  DELETE FROM articles_fts WHERE rowid = old.id;
END;

CREATE TRIGGER articles_au AFTER UPDATE ON articles BEGIN
  UPDATE articles_fts
  SET title = new.title,
      content = new.content,
      title_zh = new.title_zh,
      content_zh = new.content_zh
  WHERE rowid = old.id;
END;

It’s a bit verbose but this is the standard FTS5 pattern for content tables. Write it once, forget it.

Querying With BM25 Ranking

In Node.js with better-sqlite3, the query looks like this:

const search = db.prepare(`
  SELECT
    a.id,
    a.title,
    a.published_at,
    a.source,
    bm25(articles_fts) AS rank
  FROM articles_fts
  JOIN articles a ON a.id = articles_fts.rowid
  WHERE articles_fts MATCH ?
  ORDER BY rank
  LIMIT 20
`);

const results = search.all(query);

BM25 scores are negative in SQLite FTS5 (more negative = more relevant), so ORDER BY rank without DESC gives you the best results first. Tripped me up the first time.

Where It Falls Short

A few real limitations I ran into:

No fuzzy matching. If you search “indexing” and an article says “indexes”, FTS5 won’t connect those unless you use the * prefix operator (index*). Elasticsearch handles stemming much better.

Chinese tokenization is rough. FTS5’s default tokenizer splits on whitespace, which breaks completely for CJK text. I ended up pre-processing Chinese content with a jieba-style segmenter before inserting it, adding spaces between words so the tokenizer has something to work with. It’s a hack but it works well enough for my traffic volume.

No distributed search. But I’m on a VPS with one SQLite file, so this wasn’t even a consideration.

WAL Mode Matters

One thing worth calling out: if you’re doing concurrent reads (Next.js API routes hit the DB in parallel), make sure you’re in WAL mode:

PRAGMA journal_mode=WAL;

Without it, a write locks the whole database and readers block. With WAL, readers and writers don’t block each other. Made a noticeable difference during page loads that fire multiple queries.

The whole FTS5 setup took me an afternoon. For a personal project at this scale, it’s genuinely all I needed — no new infra, no API keys, just SQL.