2025-09-02 16:17:08 +00:00
2025-05-02 15:32:58 +02:00
2025-07-25 10:46:32 +02:00
2025-07-25 10:46:32 +02:00
2025-07-25 10:46:32 +02:00
2025-09-02 16:17:08 +00:00

mp_explore_source_template

Template with Playwright to build a source.

Check out Pre-requisites for more general information on what do you need to know and check before starting.

Arguments

  • (Optional) display_browser When enabled, a browser window is opened displaying the actions being performed.

Template

Structure

All of the files contain comments like # FIXME: ... with what needs to be updated in this template. Do not release the package unless you have addressed all of those comments.

Aside from those, you should change the name of the mp_explore_source_template folder to something that makes sense for your source. The convention is to name you package mp_explore_source_IDENTIFIER.

The files to look out for are the following:

  • pyproject.toml defines the Python project. This allows you to build and publish the package.
  • mp_explore_source_template/__init__.py contains the logic of your source.
    • The fetch_data function contains the scraping logic.
    • The metadata function contains basic metadata about the source.
  • tests/test_pipeline.py contains a simple pipeline that you can use to check the output of your source.
    • See Testing for more information.

Testing

There is a testing pipeline you can use to test your source. It allows you to check two things:

  1. That your source runs correctly, and
  2. The data output is what you expect.

To run the pipeline, use the harness script. It sets up a clean virtual environment and installs of the necessary depenencies. The environment is reused between runs, but you can reset it at any time with the --reset flag.

python3 ./tests/harness.py [--reset]

Publishing

Before publishing your package, send an e-mail to contact@fsfe.org with the following subject line: [MP-Explore] Package <YOUR_PACKAGE_NAME> for <SOURCE_OF_DATA>.

Its helpful for us to stay informed about all published packages to maintain a good overview of the ecosystem. We may also consider integrating it into the official MP Explore packages.


Pre-requisites

MP Explore and its ecosystem is based in Python and Pandas. This template integrates Playwright to simplify the scaffolding process. If you do not know Python, you might be interested in The Python Tutorial.

MP Explore Sources, at its core, are Python modules that obtain the data from official source programmatically however they see fit. The fetch_data function just expects a DataFrame with the fetched information.

First of all, you need to choose which parliament you want to create sources for. From our experience, the websites and official sources of national and regional parliaments tend to be the easiest to work with, but your mileage may vary.

After choosing a parlaiment, you need to look for official and up-to-date pages that contain, at least, the following information:

  • Members of Parliament (MPs) with their respective names, group affiliantion, and e-mail address.
  • Committees and which MPs are part of which committee and in which capacity.

Without both, the core filtering capabilities of MP Explore are dimished.

When looking for pages that contain such information, you might stumble upon computer readable versions of a subset of that information. If you choose to evaluate the usage of such sources, consider that the information needs to be up-to-date, correct, and maintained.

Most likely, you will not find this information in a single page, therefore you will need to navigate different pages and experiment on how to extract information from them. You can look into existing MP Explore Sources (mp_explore_source_*) to get an idea of how they are currently implemented.

Normally, a basic knowledge of HTML/XML and Web Technologies is required to tinker on how to extract such information. If you do not know much about either, you might be interested in the Core Learning Modules of the MDN Web Development Course.

Once you learn how to extract this information from the various pages you found, preferably landing on a few JavaScript functions that can be easily abstracted, you can start editing the fetch_data function!

Description
Template to create an MP Scrape source
Readme 55 KiB
Languages
Python 100%