Stract
Stract is an open source web search engine written in Rust.
Contains some tweaks to the parser so it can ingest WARC files from the Common Crawl project collected using cdx_toolkit
Status
- can build a search index from collected WARC files
- only working on Jim's laptop ... next step is to deploy to vichex.ca
- upstream S3 bucket containing starter files is missing, replaced with publicly available files: Download data here
Source document
Publish-to: 6kgruqaeaaaa.vichex.ca/stract/