Client-Side Search

This article will explain how this site’s full-text search works, hopefully in enough detail so you could replicate it yourself. Try it out by clicking the “Search” button in the header above or typing /. The UX is a rough clone of the standard DocSearch dialog, but behind the scenes there’s an important difference: the search results are not from a third-party dynamic search service.

When you do a search, it’s all happening in your browser with JavaScript. Aside from small search index JSON file that is downloaded at the beginning, there’s no network traffic as you refine your query.

Motivation

A client-side static local search index is not the only way to add search to a site, maybe not the easiest. If easier alternatives exist, why do this?

The advantages of a client-side search engine are layed out nicely in this blog post by Luca Ongaro, author of MiniSearch, the library showcased here.

How To

Regardless of the specific technical decisions you make, there are three main components to any site search: (1) gathering documents to search, (2) compiling the search index, and (3) creating a UI to query the index and display results. Here’s how I did each of these with this site.

Gather Documents

Our goal for this stage is to somehow get a list of all the URLs and their corresponding content. We’ll use that list to compile the search index next.

Traditional search engines create that list by using a web crawler to download all the content from your site. We don’t need a crawler for our own static sites that have a build step, though. In my case, the documents I want to index are all either classic Markdown .md or .mdx files. The current file that you’re reading right now is …/src/pages/articles/client-side-search.mdx. Astro handles mapping those files to URLs and rendering them to HTML during development and deployment.

Theoretically, we could get a list of all documents using Astro.glob’s cool uncle import.meta.glob. This is how other plugins like @astrojs/rss work, so it seems promising. If all our pages were traditional .md files then we might even be able to loop through that collection and call the compiledContent() method, then maybe strip out the HTML. Not only is that very awkward,1 but unfortunately it won’t work for .mdx files.2

Instead, we can create a custom rehype plugin to extract the plain text and inject it to the frontmatter object. Later when we’re building the index we can pull the plain text back out of the frontmatter. As convoluted as all that sounds, this appears to be the recommended way.

First, install the hast-util-to-text package as a dev dependency since it’s only used during the build. Then create a the file src/plain-text-plugin.mjs:

import { toText } from 'hast-util-to-text';

export function plainTextPlugin() {
  return (tree, { data }) => {
    data.astro.frontmatter.plainText = toText(tree);
  };
}

Once you have your custom plugin, you should add it to your rehype plugins in astro.config.mjs file like so:

import { plainTextPlugin } from './plain-text-plugin.mjs';

export default {
  site: "https://example.com",
  integrations: [frameworkDuJour()],
  markdown: {
    extendDefaultPlugins: true,
    rehypePlugins: [plainTextPlugin]
  },
};

To see if it worked, it’s tempting to output the contents of your frontmatter (which now includes all of the content in plain text) to the current page. That’s a bad idea because it will create an endless loop.3 Instead, create a page called src/search.json.mjs (or .ts) that will eventually become the search index. For now, it will just show us what we’ve accomplished.

function markdownToDocument(markdown) {
  const { url, frontmatter } = markdown;
  const headings = markdown.getHeadings().map((h) => h.text).join(" ");

  return { url, headings, ...frontmatter };
}

export const get = () => {
  const pagesObject = import.meta.glob("./**/*.md*", { eager: true });
  const documents = Object.values(pagesObject).map(markdownToDocument);

  return {
    body: JSON.stringify(documents),
  };
};

If there were some pages we wanted to hide from the search, we’d filter them out here. Similarly, if there were additional URLs with other content to include (such as from a CMS) then we could add it here too. Astute readers will notice we’ve taken the opportunity to flatten the document headings into a separate field. The function markdownToDocument makes an object with url, headings, and whatever is in frontmatter. With your Astro dev server running, open http://localhost:3000/search.json to preview your document list. Depending on your setup, you should see something like this:

[
  {
    "url": "/articles/client-side-search",
    "headings": "Motivation How To Gather Documents Compile the Index Alternatives Footnotes",
    "plainText": "This article will explain how this site’s full-text search works, ...",
    "title": "Client-Side Search",
    "description": "How to add custom client-side full-text search to an Astro-powered website.",
    "layout": "@/layouts/Article.astro"
  },
  { ...

If the plainText property is a blank string, double check that you’ve added the plugin to your rehype plugins, not remark plugins. If everything looks good, hey nice attention to detail, You. Onward!

Compile the Index

As mentioned, we’re going to use the excellent MiniSearch library. This step requires installing a package and two minor changes to the search JSON endpoint you just made. First install the minisearch package and restart your dev server. You could put it in your main dependencies because we’ll use it on both the server and the browser later.

Next, add the following line to the top of src/search.json.mjs:

import MiniSearch from "minisearch";

Then change the get function at the bottom of the file to something like this:

export const get = () => {
  const pagesObject = import.meta.glob("./**/*.md*", { eager: true });
  const documents = Object.values(pagesObject).map(markdownToDocument);

  const miniSearch = new MiniSearch({
    idField: "url",
    fields: ["title", "plainText", "headings", "description"],
    storeFields: ["title"]
  });
  miniSearch.addAll(documents);

  return {
    body: JSON.stringify(miniSearch),
  };
};

Read all about the constructor options in the MiniSearch API documentation. This sample does the following:

You can change those options if you like, but just be aware that you’ll have to sync the options with your client-side code later. Once you make these changes and visit http://localhost:3000/search.json again, you should see the serialized representation of your search index. If you’re not actually following along, you can look at my search index here: https://barnabas.me/search.json. 4

While your local Astro dev server is running, you can refresh your /search.json endpoint and the search index should be rebuilt each time. At the time of this writing I don’t have a huge number of pages on my site so I’m not sure if this will become a problem or not. I suspect that it won’t be a problem unless/until you’re dealing with many thousands of pages. In any case, if you do a test build you should see your search index saved to the file system so you can get an idea of how long that takes and how large the file is.

Create a UI

The search UI I have implemented here is fairly specific to this site, but I’ll give you a starter bit of markup and code anyway. The current design goal is to mimic the typical DocSearch search form, but to spell it out:

  1. Visitor opens the search modal dialog either a global hot key or by clicking a button in the header.
  2. The query input field should have the focus while the form is open.
  3. As visitor types, search results appear instantly under the query input, most relevant first.
  4. Search results are clickable links, and the title is the link text.
  5. Down arrow and up arrow keys cycle through the active search result, press enter to go.

I still like Vue and the Vue integration is already enabled so I created two new SFCs in the src/components directory. The SearchDialog.vue component is the asynchronous one that will be loaded. Here’s some of the main code, minus parts about managing the dialog and selected item state:

<script setup lang="ts">
import { computed, ref } from "vue";
import { mande } from "mande";
import MiniSearch, { AsPlainObject } from "minisearch";

const query = ref("");
const searchJson = await mande("/").get<AsPlainObject>("search.json");
const miniSearch = MiniSearch.loadJS(searchJson, {
  idField: "url",
  fields: ["title", "plainText", "headings", "description"],
  storeFields: ["title"],
});

const searchResults = computed(() => {
  return miniSearch.search(query.value, {
    boost: { title: 2, headings: 1.5 },
    prefix: (term) => term.length > 2,
    fuzzy: (term) => (term.length > 2 ? 0.2 : null),
  });
});
</script>
<template>
  <DialogPanel>
    <div class="mt-2">
      <input placeholder="Search" type="search" v-model="query" />
    </div>
    <ul>
      <li v-for="(result, index) in searchResults" :key="result.id">
        <a :href="result.id">
          {{ result.title }}
        </a>
      </li>
    </ul>
  </DialogPanel>
</template>

This demonstrates how to instantiate the search index, get search results and bind them to UI elements. Notice that the options for the loadJS function mirror the options we used to create the index. This also shows that you can get pretty fancy with search options. This is why we indexed the headings as a separate field, so we could boost them here. Honestly I’m not sure if that makes a difference.

The other component isn’t really search-specific, but it was an opportunity to learn something about Vue and Astro. SearchButton.vue looks something like this:

<script setup lang="ts">
import { computed, defineAsyncComponent, ref } from "vue";

const SearchDialog = defineAsyncComponent(() => import("./SearchDialog.vue"));
const hasOpened = ref<Boolean>(false);
const isOpen = ref<Boolean>(false);

async function openSearch() {
  hasOpened.value = true;
  isOpen.value = true;
}
</script>
<template>
  <button @click="openSearch()">
    Search
  </button>
  <kbd>/</kbd>
  <template v-if="hasOpened">
    <Suspense>
      <SearchDialog v-model="isOpen" />
    </Suspense>
  </template>
</template>

This component gets embedded in the header with <SearchButton client:idle />. By using <Suspense> together with defineAsyncComponent, we can defer loading <SearchDialog /> and search.json until absolutely necessary. In production the delay is hardly noticeable, and my precious Lighthouse score remains intact.5

Issues and Enhancements

There are a few minor known bugs and missing nice-to-haves currently. For example, in Firefox the slash (/) key conflicts with the “quick find”. I’d like to figure out how to make Ctrl + K or Cmd + K work eventually. MiniSearch has a suggestion method, I’d like to figure out how to make that work like you’d expect it to. There are cool search options that it might be nice to let the user turn on and off, but that might be confusing too. There should be a message when no results are found.

Search results could use some dressing up too. It might also be good to index the documents separately by heading. For example, this is the only place with the word “aardvark”, so it would be nice the search result went directly here rather than the top of the page. It might be good to show more detail about each result, such as score, description, document type (article or blog post) or pub date.

Alternatives

If you decide not to do the above (understandable), here are some other options that might fit your needs better:

Footnotes

  1. I mean, think about it. You’d be writing in one markup language (Markdown), then converting it to HTML, then stripping the HTML tags out again. This is the programming equivalent of mailing someone a printed photograph of your computer screen playing a video.

  2. Specifically, if you try this on an MDX file you’ll see an error that says: “MDX does not support compiledContent()! If you need to read the HTML contents to calculate values (ex. reading time), we suggest injecting frontmatter via rehype plugins. Learn more on our docs: https://docs.astro.build/en/guides/integrations-guide/mdx/#inject-frontmatter-via-remark-or-rehype-plugins“.
    So be it.

  3. Either your development server will crash, your hard drive will be filled with text, or the galaxy will be converted into paper clips.

  4. “Now hang on a second,” you’re thinking, “couldn’t you reverse-engineer the original content from the search index?” To which I say, (1) if it shouldn’t be in the search index, then you should filter it out in the previous step and (2) technically the content is there but it’s more in a blended up word cloud form. Not great reading.

  5. Apparently that’s very important.

Support