Recently I learned about Typesense from the GitHub trending repositories. It is a search engine written mostly in C++ which announces to be fast, typo-tolerant (configurable), with high availability (running in clusters), and simple to use. Other alternatives could be ElasticSearch which is heavier and runs on JVM or Algolia which is commercial.
The Typesense project provides its own hosting service as a commercial product, but the engine itself is open source and can be easily used locally via Docker or just installing a binary file. It has a GLP 3 license which is fine if not changing the actual server code. It is a server with a REST API but they also provide official client libraries for a few languages, including a JavaScript one.
This was a perfect match for my personal blog tech-wise, and a useful feature to have. After seeing the demos they provide with surprisingly high number of items indexed, and having done an initial proof of concept, I was positevily impressed with it. It was kind of easy to get it running, and it doesn't consume too much RAM in the production server. Checked smaps
to find out an estimation of the RAM used (by summing the Pss values from the output):
#!/usr/bin/env bash
set -e
TYPESENSE_PID="$(ps aux \
| grep typesense \
| head -n 1 \
| awk '{ print $2 }')"
cat /proc/"$TYPESENSE_PID"/smaps \
| grep -i pss \
| awk '{Total+=$2} END {print Total/1024" MB"}'
# 36.0986 MB
Which is consistent with the percentage value from htop
. This server keeps the indexed data in memory for faster delivery, so the more data indexed the longer it will take to initialize and the more memory it will consume. However, for my case these values were totally acceptable. The best part was how fast the results were returned.
The official guide and API documentation are both excellent. They provide multiple demos, and was personally amazed by this 32 Million Songs browser. After this, my blog content would be a piece of cake compared to that amount of data.
Even if the Typesense server supports quering it directly from the browser, and it has an option to enable CORS, one thing that bugged me for a while was that there was no way to filter out the returned fields if they were queried.
So decided to create a simple Express server in front of it to parse some of the response before sending it back to the browser. This was easy as the Typesense JavaScript client also works on Node.js. It also allowed to keep the key hidden, although that was not a big issue as Typesense supports having read-only keys when using the browser directly.
For the front-end integration, it was possible to use a custom client to query the custom, but also used the suggested UI library as it was easy to extend and customize too: react-instantsearch-dom from instantsearch.js which is from the Algolia team. Because it can be customized, it is possible to add a custom debounce to limit the requests to the server.
All components can be customized or overridden. For example in this case replaced the provided SearchBox
with a new one which looks something like this:
import React, { useState } from "react"
import { InstantSearch, Hits } from "react-instantsearch-dom"
import { connectSearchBox } from "react-instantsearch-core"
// <InstantSearch /> and <Hits /> are used also but not in the SearchBox
const SearchBox = connectSearchBox(
({ refine, onChange: onChangeProp, defaultRefinement, placeholder }) => {
const [value, updateValue] = useState(defaultRefinement || "")
const onChange = (e) => {
const newValue = e.target.value
updateValue(newValue)
refine(newValue)
onChangeProp(e)
}
return (
<input
className="search-input"
onChange={onChange}
placeholder={placeholder}
value={value}
/>
)
}
)
Last bit was how to extract the data that would be indexed by the Typesense server. In my case, the data is in Markdown files with body and some metadata information. So a simple Node.js script was enough, using the Typesense client to populate it. For this process I used front-matter with marked.
All in all, I was satisfied with the result and the time to market provided by this tool. It may not fit your use case, for example it doesn't provide multiple results per document, it always returns one or none result per document, the best match. However, it has many options, including to filter, sort or facet the results. If you are curious, just give it a go, and I hope you learn something while doing it.