This repository has been archived on 2022-04-01. You can view files and clone it, but cannot push or open issues or pull requests.
Linus Sehn 3a6d0a7ce1
All checks were successful
continuous-integration/drone/push Build is passing
add TED link
2022-04-01 12:34:47 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:28:29 +02:00
2022-03-29 21:19:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:34:47 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 11:12:07 +02:00
2022-04-01 11:12:07 +02:00
2022-04-01 12:30:30 +02:00
2022-04-01 12:24:04 +02:00
2022-04-01 12:24:04 +02:00

Project Description

Build Status REUSE status

Target audience of app

Journalists, NGOs/public institutions/regulators with a focus on market access, anti-trust and transparency, (private) investigators.

Project Description

The aim is to enable public access to an accessible, high-performance search and analysis frontend for investigators, journalists, regulators and the public to uncover and proactively monitor suspicious market activity in fields that matter to them. By linking disparate and complex data and making the results subject to public scrutiny, we hope to further enhance insight into the EU procurement processes, combat corruption and stimulate fair and open competition.

At the core of this application developed and released under a free software license, lies the TED dataset. In the backend, the TED data is transformed into a performant database. In the first release of the app, the TED data will be linked with data from the following sources:

  • OpenCorporates Database
  • other, more detailed public data of companies (e.g. crawled financial reports of German companies) when available
  • EU Transparency Register
  • ICIJ Offshore Leaks Database
  • The OpenSanctions Database (incl. the Consolidated Financial Sanctions File)

Once the central database unifies the aforementioned data sources, incoming tenders could be analysed and ranked by a transparent and fair risk prediction indicator that will be developed freely and transparently by a diverse community of stakeholders. The risk prediction indicator allows regulators and watchdogs to focus their resources on those transactions that are likely to bear increased risk of corruption and corporate wrong-doing. The transparency of the indicator allows to check at any time how a certain ranking was achieved to ensure fairness, prompt adaptation and wide adoption.

EU Datasets used (URLS)

Development

Requirements:

  • Docker
# Preparing the data
git clone https://git.fsfe.org/fsfe-system-hackers/datathon-2022.git
cd datathon-2022
mkdir -p ./data/postcodes
mkdir -p ./data/open_coroporates
mkdir -p ./data/ted
mkdir -p ./data/db
sudo apt install wget bzip2
./download.sh

# Preparing the database
docker build -t datathon_base .
docker-compose -f docker-compose.ingest.yml up
docker exec -it datathon_back /bin/bash
flask db migrate
flask db upgrade
flask import_postcodes data/postcodes/de.csv
flask import_open_corporates data/open_coroporates/de_companies_ocdata.jsonl
flask import_ted data/ted
exit
docker-compose -f docker-compose.ingest.yml down

# Starting everything
docker-compose up

Example requests

curl http://localhost:5000/companies?name=Forschungszentrum
curl http://localhost:5000/companies/1220137

curl http://localhost:5000/persons?name=Dorothee%20Dzwonnek
curl http://localhost:5000/persons/805196

curl http://localhost:5000/queries/persons_most_companies
curl http://localhost:5000/queries/companies_highest_tender_value_sum

Deployment

Initial

For now, the database and the backend are deployed manually on meitner.fsfeurope.org and not accessible from the outside. This is how it was done:

# Preparing the data
cd /srv
git clone https://git.fsfe.org/fsfe-system-hackers/datathon-2022.git
cd datathon-2022
mkdir -p ./data/postcodes
mkdir -p ./data/open_coroporates
mkdir -p ./data/ted
mkdir -p ./data/db
sudo apt install wget bzip2
./download.sh
chown -R dockeruser:dockeruser data

# Preparing the database
docker build -t datathon_base .
docker-compose -f docker-compose.ingest.yml up
docker exec -it datathon_back /bin/bash
flask db migrate
flask db upgrade
flask import_postcodes data/postcodes/de.csv
flask import_open_corporates data/open_coroporates/de_companies_ocdata.jsonl
flask import_ted data/ted
exit
docker-compose -f docker-compose.ingest.yml down

# Starting everything
docker-compose up -d

On updates

cd /srv/datathon-2022
git pull origin main
Description
The FSFE's application entry for the EU Datathon 2022
Readme 135 KiB
Languages
Python 95.3%
Shell 1.6%
Mako 1%
Makefile 0.9%
Dockerfile 0.6%
Other 0.6%