Client-Side Search

Currently I’m experimenting with Pagefind, not the solution described below.

This article will explain how this site’s full-text search works, hopefully in enough detail so you could replicate it yourself. Try it out by clicking the “Search” button in the header above or typing /. The UX is a rough clone of the standard DocSearch dialog, but behind the scenes there’s an important difference: the search results are not from a third-party dynamic search service.

When you do a search, it’s all happening in your browser with JavaScript. Aside from small search index JSON file that is downloaded at the beginning, there’s no network traffic as you refine your query.

Motivation

A client-side static local search index is not the only way to add search to a site, maybe not the easiest. If easier alternatives exist, why do this?

Performance. An in-browser search index can be lightning fast.
Customization. You have more control over what is indexed, the search behavior and user experience.
Security. To use a third-party search engine, you have to let them crawl your site. That may not be an option if the site is not publicly available.
Privacy. When your readers do searches against a hosted service, they leak precious information to the search provider.
Cost. If you outgrow or do not qualify for the free tier of a hosted solution, a static search could be much cheaper.

The advantages of a client-side search engine are layed out nicely in this blog post by Luca Ongaro, author of MiniSearch, the library showcased here.

How To

Regardless of the specific technical decisions you make, there are three main components to any site search: (1) the documents to search, (2) the search index, and (3) a UI to query the index and display results. Here’s how I did each of these with this site.

Gather Documents

Our goal for this stage is to somehow get a list of all the URLs and their corresponding content. We’ll use that list to compile the search index next.

Traditional search engines create that list by using a web crawler to download all the content from your site. We don’t need a crawler for our own static sites that have a build step, though. For example, suppose the documents you want to index are all either classic Markdown .md or .mdx files. Astro handles mapping files to URLs and rendering them to HTML during development and deployment.

Theoretically, we could get a list of all documents using Astro.glob’s cool uncle import.meta.glob. This is how other plugins like @astrojs/rss work, so it seems promising. If all our pages were traditional .md files then we might even be able to loop through that collection and call the compiledContent() method, then maybe strip out the HTML. Not only is that very awkward,¹ but unfortunately it won’t work for .mdx files.²

Instead, we can create a custom plugin to extract the plain text and inject it to the frontmatter object. Later when we’re building the index we can pull the plain text back out of the frontmatter. As convoluted as all that sounds, this appears to be the recommended way.

For your convenience, I included a plaintext plugin in @barnabask/astro-minisearch. If you’re curious how the plaintext plugin works, the source is available here. Once you’ve installed the plugin, you can add it to your astro.config.mjs file like so:

import { plainTextPlugin } from "@barnabask/astro-minisearch";

export default {
  site: "https://example.com",
  integrations: [frameworkDuJour()],
  markdown: {
    extendDefaultPlugins: true,
    rehypePlugins: [plainTextPlugin()],
  },
};

To see if it worked, you could output the contents of your frontmatter (which now includes all of the content in plain text) to the current page. Astro’s debug component is an easy way to accomplish that. You would think that it would create an endless loop, but it does not. ³

Compile the Index

The next step is to compile the search index. To be fair, if you have a tiny number of documents then this part may not even be necessary. For example, Astro’s client-side integrations search doesn’t use a pre-built index and it works Just Fine™.⁴

On the other hand, if you have between twenty and 10,000 documents, then a search index might be just what you need. If you installed the NPM package, then you already have the excellent MiniSearch library. Create a new file at src/pages/search.json.mjs like this:

import {
  getSearchIndex,
  pagesGlobToDocuments,
} from "@barnabask/astro-minisearch";

export async function get() {
  const pagesDocs = await pagesGlobToDocuments(import.meta.glob(`./**/*.md*`));
  return getSearchIndex(pagesDocs);
}

Once you make these changes and visit http://localhost:3000/search.json, you should see the serialized representation of your search index. You can look at my search index here: https://barnabas.me/search.json. ⁵

If you are using Astro v2’s content collections feature, then the code above is not a complete solution. In March of 2023 I updated @barnabask/astro-minisearch to support content collections, so check out the full documentation there.

While your local Astro dev server is running, your search index will be rebuilt each time you access the /search.json endpoint. It’s quick to generate and is a reasonable size for client-side scripts. For a single data point, as of March 2023 the search index of this site contains 88 search documents. (Each heading is considered a separate search document.) It takes anywhere from 60-300 ms to generate the index on my laptop, and it’s 140 kb uncompressed.

The Astro documentation template uses separate directories for each language. If you’re using that or a similar setup, I recommend putting a separate copy of search.json.* at the root of each language folder, for a few reasons. First, I think it’s more user friendly to separate each search index by language. Even bilingual people will probably want to search in one language at a time, so why make them download a large search index? Second, it’s practical. Your build may be quicker thanks to parallelization, and it’s just a few lines of code. Bear in mind that the glob functions can’t take variables, so a dynamic search path won’t work.

Create a UI

The search UI I have implemented here is fairly specific to this site, but I’ll give you a starter bit of markup and code anyway. The current design goal is to mimic the typical DocSearch search form, but to spell it out:

Visitor opens the search modal dialog either a global hot key or by clicking a button in the header.
The query input field should have the focus while the form is open.
As visitor types, search results appear instantly under the query input, most relevant first.
Search results are clickable links, and the title is the link text.
Down arrow and up arrow keys cycle through the active search result, press enter to go.

I still like Vue and the Vue integration is already enabled so I created two new SFCs in the src/components directory. The SearchDialog.vue component is the asynchronous one that will be loaded. Here’s some of the main code, minus parts about managing the dialog and selected item state:

<script setup lang="ts">
import { computed, ref } from "vue";
import { mande } from "mande";
import { loadIndex } from "@barnabask/astro-minisearch";

const query = ref("");
const searchJson = await mande("/").get<AsPlainObject>("search.json");
const searchIndex = loadIndex(await mande("/").get("search.json"));

const searchResults = computed(() => {
  return searchIndex.search(query.value, {
    boost: { title: 2, headings: 1.5 },
    prefix: (term) => term.length > 2,
    fuzzy: (term) => (term.length > 2 ? 0.2 : null),
  });
});
</script>
<template>
  <DialogPanel>
    <div class="mt-2">
      <input placeholder="Search" type="search" v-model="query" />
    </div>
    <ul>
      <li v-for="(result, index) in searchResults" :key="result.id">
        <a :href="result.id">
          <span class="font-bold">{{ result.title }}</span>
          <span v-if="result.heading" class="text-sm">{{
            result.heading
          }}</span>
        </a>
      </li>
    </ul>
  </DialogPanel>
</template>

This demonstrates how to instantiate the search index, get search results and bind them to UI elements. This also shows that you can get pretty fancy with search options.

The other component isn’t really search-specific, but it was an opportunity to learn something about Vue and Astro. SearchButton.vue looks something like this:

<script setup lang="ts">
import { computed, defineAsyncComponent, ref } from "vue";

const SearchDialog = defineAsyncComponent(() => import("./SearchDialog.vue"));
const hasOpened = ref<Boolean>(false);
const isOpen = ref<Boolean>(false);

async function openSearch() {
  hasOpened.value = true;
  isOpen.value = true;
}
</script>
<template>
  <button @click="openSearch()">Search</button>
  <kbd>/</kbd>
  <template v-if="hasOpened">
    <Suspense>
      <SearchDialog v-model="isOpen" />
    </Suspense>
  </template>
</template>

This component gets embedded in the header with <SearchButton client:idle />. By using <Suspense> together with defineAsyncComponent, we can defer loading <SearchDialog /> and search.json until absolutely necessary. In production the delay is hardly noticeable, and my precious Lighthouse score remains intact.⁶

Issues and Enhancements

There are a few minor known bugs and missing nice-to-haves currently. For example, in Firefox the slash (/) key conflicts with the “quick find”. I’d like to figure out how to make Ctrl + K or Cmd + K work eventually. MiniSearch has a suggestion method, I’d like to figure out how to make that work like you’d expect it to. As of v6, MiniSearch also supports incremental indexing which could help development reload performance in extreme situations. There are cool search options that it might be nice to let the user turn on and off, but that might be confusing too. There should be a message when no results are found.

Search results could use some dressing up too. It might be good to show more detail about each result, such as score, description, document type (article or blog post) or pub date.

Alternatives

If you decide not to do the above (understandable), here are some other options that might fit your needs better:

Algolia: the company behind DocSearch is happy to sell you something more powerful.
DocSearch: the search engine that powers Astro’s documentation.
Elastic Search UI: this also exists.
Pagefind: a static search library that works with any SSG.
Stork: a command-line tool and JS library for doing the same as above, not Astro-specific.
Typesense: an open source Algolia alternative you can host yourself or have them do it for you.
If the content all comes from a CMS then the CMS might have a search function.
If the search index is too large to download but you still like MiniSearch, consider an SSR API route.

Changelog

2020-09-20: Initial version
2020-10-10: Updated with NPM package
2023-03-11: Updated for Astro v2 with content collections
2023-04-02: Added Pagefind and disclaimer at the top.

I mean, think about it. You’d be writing in one markup language (Markdown), then converting it to HTML, then stripping the HTML tags out again. This is the programming equivalent of mailing someone a printed photograph of your computer screen playing a video. ↩
Specifically, if you try this on an MDX file you’ll see an error that says: “MDX does not support compiledContent()! If you need to read the HTML contents to calculate values (ex. reading time), we suggest injecting frontmatter via rehype plugins. Learn more on our docs: https://docs.astro.build/en/guides/integrations-guide/mdx/#inject-frontmatter-via-remark-or-rehype-plugins“.
So be it. ↩
I tried it and it was fine. The galaxy was not converted into paper clips. ↩
It is fine but it could be a little better. For example, I wish the text search was across all of the “collections”, not just the currently selected one. That is kind of the whole point of search. ↩
“Now hang on a second,” you’re thinking, “couldn’t you reverse-engineer the original content from the search index?” To which I say, (1) if it shouldn’t be in the search index, then you should filter it out in the previous step and (2) technically the content is there but it’s more in a blended up word cloud form. Not great reading. ↩
Apparently that’s very important. ↩