2025-05-12 10:04:58 +02:00
2025-05-02 15:32:58 +02:00
2025-05-02 15:32:58 +02:00
2025-05-02 15:32:58 +02:00
2025-05-02 15:32:58 +02:00
2025-05-12 10:04:58 +02:00
2025-05-02 15:32:58 +02:00

mp_scrape_source_template

Template with Playwright to build a source.

Arguments

  • (Optional) display_browser When enabled, a browser window is opened displaying the actions being performed.

Template

Structure

All of the files contain comments like # FIXME: ... with what needs to be updated in this template. Do not release the package unless you have addressed all of those comments.

Aside from those, you should change the name of the mp_scrape_source_template folder to something that makes sense for your source. The convention is to name you package mp_scrape_source_IDENTIFIER.

The files to look out for are the following:

  • pyproject.toml defines the Python project. This allows you to build and publish the package.
  • mp_scrape_source_template/__init__.py contains the logic of your source.
    • The fetch_data function contains the scraping logic.
    • The metadata function contains basic metadata about the source.
  • tests/test_pipeline.py contains a simple pipeline that you can use to check the output of your source.
    • See Testing for more information.

Testing

There is a testing pipeline you can use to test your source. It allows you to check two things:

  1. That your source runs correctly, and
  2. The data output is what you expect.

To run the pipeline, use the harness script. It sets up a clean virtual environment and installs of the necessary depenencies. The environment is reused between runs, but you can reset it at any time with the --reset flag.

python3 ./tests/harness.py [--reset]

Publishing

Before publishing your package, send an e-mail to contact@fsfe.org with the following subject line: [MP-Scrape] Package <YOUR_PACKAGE_NAME> for <SOURCE_OF_DATA>.

Its helpful for us to stay informed about all published packages to maintain a good overview of the ecosystem. We may also consider integrating it into the official MP Scrape packages.

Follow the Python Packaging User Guide instructions to build and publish your package. For this you will need a PyPI account.

Description
Template to create an MP Scrape source
Readme 50 KiB
Languages
Python 100%