Generates Awesome Indexes - tiny search engines build from curated sources:
- Awesome Lists (hence the name Awesome Indexes!)
- Zotero libraries and collections
- Zenodo Communities
- Any other source that can be used to create suitable JSON objects formatted as JSONL files.
The awindex
tool gathers links and metadata from these sources, and uses them to build a static web page that provides a Pagefind faceted search interface. It can also package the index data as a downloadable database, to allow deeper analysis or custom visualisations to be created.
You can see a demonstration here.
To install awindex
locally, you need Python 3.11 or later.
pip install git+https://github.com/digipres/awesome-indexer@main
awindex -c config -o ./index
After which, you will be able to run the awindex
command.
Or, if uv is installed, the awindex
tool can be run directly using:
uvx --from git+https://github.com/digipres/awesome-indexer@main awindex -c config.yaml -o ./index
By default, the awindex
command reads it's configuration from a file called ./config.yaml
(this can overridden at the command line, run awindex -h
for help).
The tool reads the config.yaml
file, downloads and caches the information sources, and generates an Awesome Index in the ./index
folder.
There are a set of fields that provide some basic information about the site, and then a list of sources to read in order to build the index. For example:
title: "My Awesome Index Title"
homepage: https://my.website/page-about-this-index
description: "A brief description about this index and what's in it."
sources:
- name: "Awesome Digital Preservation"
homepage: "https://github.com/digipres/awesome-digital-preservation/"
type: awesome-list
url: "https://raw.githubusercontent.com/digipres/awesome-digital-preservation/refs/heads/main/README.md"
An example config.yaml
is provided that shows how it works in more detail.
Each type
of source should have a name
and a homepage
so people can find out more about the source that has been included in the index. Each source can also have a description
, to be shown in the Awesome Index source summary.
The additional parameters for each source are...
type: awesome-list
(required)url
: A URL to download the Markdown source content of the Awesome List. (required)view_url
: A URL pointing to a web version of the source content that allows linking and highlighting of lines using a#L10
fragment on the end of the URL.
Note that awindex
only supports public Zotero collections at present.
type: zotero
(required)library_type
: Eitheruser
orgroup
(required).library_id
: The identification number for this library, e.g.8195999
(required).collection_id
: The key of a specific collection within this library, e.g.ERZIYJ3T
(optional). If this is specified, the index will only include records that are included in that hierarchy of collections.
The pyzotero documentation has more information about these fields and how to find them.
type: zenodo
(required)community
: The unique identifier for this community, e.g.digital-preservation
(required).
type: jsonl
(required)file
: A local file path for a set of records in JSONL format, e.g../test/ipres-awindex-test.jsonl
(required).
Unfortunately, the index itself won't work without a web server. If you've got Python 3+ installed, you can run:
cd index
python -m http.server 8080
and then the index will be accessible at http://localhost:8080.
To share your Awesome Index, you can upload your files to a static web host like GitHub Pages, Netlify (e.g. using Netlify Drop) or these EU alternatives.
You can look at the SQLite database that the indexer generates using e.g Datasette, like this:
uvx datasette serve index/records.db --metadata datasette-metadata.json
Building an index can be integrated into GitHub Action build like this:
- name: Install uv
uses: astral-sh/setup-uv@v6
with:
python-version: 3.11
- name: Build the Awesome Index
run: uvx --from git+https://github.com/digipres/awesome-indexer@main awindex -c _awindex/config.yaml -o ./awesome-index
There is an example here.
As well as needing Python 3.11+, the development environment needs NodeJS installed (because Pagefind is written in JavaScript).
The search page template uses the Jinja2 templating library and the interface is built using Bootstrap (v5).
sudo apt install python3.11
sudo apt install python3.11-venv
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .
Having installed in development mode (pip install -e
), to run from source:
python -m awindex.cli
TBA: JSONL or extend thusly