Skip to content

digipres/awesome-indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Indexer

Generates Awesome Indexes - tiny search engines build from curated sources:

The awindex tool gathers links and metadata from these sources, and uses them to build a static web page that provides a Pagefind faceted search interface. It can also package the index data as a downloadable database, to allow deeper analysis or custom visualisations to be created.

You can see a demonstration here.

Usage

Local installation

To install awindex locally, you need Python 3.11 or later.

pip install git+https://github.com/digipres/awesome-indexer@main
awindex -c config -o ./index

After which, you will be able to run the awindex command.

Or, if uv is installed, the awindex tool can be run directly using:

uvx --from git+https://github.com/digipres/awesome-indexer@main awindex -c config.yaml -o ./index

Building an Awesome Index

By default, the awindex command reads it's configuration from a file called ./config.yaml (this can overridden at the command line, run awindex -h for help).

The tool reads the config.yaml file, downloads and caches the information sources, and generates an Awesome Index in the ./index folder.

Configuration

There are a set of fields that provide some basic information about the site, and then a list of sources to read in order to build the index. For example:

title: "My Awesome Index Title"
homepage: https://my.website/page-about-this-index
description: "A brief description about this index and what's in it."
sources:
- name: "Awesome Digital Preservation"
  homepage: "https://github.com/digipres/awesome-digital-preservation/"
  type: awesome-list
  url: "https://raw.githubusercontent.com/digipres/awesome-digital-preservation/refs/heads/main/README.md"

An example config.yaml is provided that shows how it works in more detail.

Each type of source should have a name and a homepage so people can find out more about the source that has been included in the index. Each source can also have a description, to be shown in the Awesome Index source summary.

The additional parameters for each source are...

Source: Awesome Lists

  • type: awesome-list (required)
  • url: A URL to download the Markdown source content of the Awesome List. (required)
  • view_url: A URL pointing to a web version of the source content that allows linking and highlighting of lines using a #L10 fragment on the end of the URL.

Source: Zotero Collection

Note that awindex only supports public Zotero collections at present.

  • type: zotero (required)
  • library_type: Either user or group (required).
  • library_id: The identification number for this library, e.g. 8195999 (required).
  • collection_id: The key of a specific collection within this library, e.g. ERZIYJ3T (optional). If this is specified, the index will only include records that are included in that hierarchy of collections.

The pyzotero documentation has more information about these fields and how to find them.

Source: Zenodo Community

  • type: zenodo (required)
  • community: The unique identifier for this community, e.g. digital-preservation (required).

Source: JSONL File

  • type: jsonl (required)
  • file: A local file path for a set of records in JSONL format, e.g. ./test/ipres-awindex-test.jsonl (required).

Using an Awesome Index

Unfortunately, the index itself won't work without a web server. If you've got Python 3+ installed, you can run:

cd index
python -m http.server 8080

and then the index will be accessible at http://localhost:8080.

To share your Awesome Index, you can upload your files to a static web host like GitHub Pages, Netlify (e.g. using Netlify Drop) or these EU alternatives.

Inspect the data using Datasette

You can look at the SQLite database that the indexer generates using e.g Datasette, like this:

uvx datasette serve index/records.db --metadata datasette-metadata.json

As a GitHub Action

Building an index can be integrated into GitHub Action build like this:

      - name: Install uv
        uses: astral-sh/setup-uv@v6
        with:
          python-version: 3.11
      - name: Build the Awesome Index
        run: uvx --from git+https://github.com/digipres/awesome-indexer@main awindex -c _awindex/config.yaml -o ./awesome-index

There is an example here.

Development setup

As well as needing Python 3.11+, the development environment needs NodeJS installed (because Pagefind is written in JavaScript).

The search page template uses the Jinja2 templating library and the interface is built using Bootstrap (v5).

Linux/WSL2

sudo apt install python3.11
sudo apt install python3.11-venv
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .

Having installed in development mode (pip install -e), to run from source:

python -m awindex.cli

Adding a new source

TBA: JSONL or extend thusly

About

Builds tiny awesome search engines for awesome lists and other awesome resources.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published