Setup and document continuous deployment / orchestration #3

Open
opened 2024-02-05 07:46:15 +00:00 by linus · 3 comments
Owner

The goal is to automatically run the ETL pipeline on each push to a yet to be created prod branch. Once a commit is pushed to the prod branch, all the necessary steps should be executed on our main workload server holmes.

To achieve this, AFAICU we can either:

  • A full-blown solution like prefect (would probably require a fair bit of restructuring the code but maybe that's a good thing)

or

  • The deployer user on Gitea actions to also have rights to run the ingestion/deduplication scripts and then set that up via Gitea Actions to run on every push to the prod branch. This is probably harder to maintain in the long run.
The goal is to automatically run the ETL pipeline on each push to a yet to be created `prod` branch. Once a commit is pushed to the `prod` branch, all the necessary steps should be executed on our main workload server `holmes`. To achieve this, AFAICU we can either: - A full-blown solution like [prefect](https://www.prefect.io/) (would probably require a fair bit of restructuring the code but maybe that's a good thing) or - The deployer user on Gitea actions to also have rights to run the ingestion/deduplication scripts and then set that up via Gitea Actions to run on every push to the `prod` branch. This is probably harder to maintain in the long run.
linus added the
Kind/Feature
Kind/Testing
labels 2024-02-05 17:25:58 +00:00
linus self-assigned this 2024-02-06 10:04:43 +00:00
linus changed title from Setup continuous deployment to Setup and document continuous deployment 2024-02-06 10:17:27 +00:00
linus added
Kind/Documentation
and removed
Kind/Testing
labels 2024-02-06 10:17:35 +00:00
Author
Owner

When switching over to prefect and using holmes as our worker, we'd have to pay particular attention to how concurrency over here will be handled. Probably with something like this.

When switching over to prefect and using `holmes` as our [worker](https://docs.prefect.io/latest/concepts/work-pools/), we'd have to pay particular attention to how concurrency [over here](https://git.fsfe.org/TEDective/etl/src/commit/7629fbb117d506fc691d21674ba0c291284fb71c/tedective_parser/parser.py#L277-L280) will be handled. Probably with something like [this](https://docs.prefect.io/latest/api-ref/prefect/task-runners/#prefect.task_runners.ConcurrentTaskRunner).
Author
Owner

This might be good inspiration and here is an intro video for prefect.

[This](https://github.com/investigativedata/investigraph-etl) might be good inspiration and [here](https://www.youtube.com/watch?v=D5DhwVNHWeU) is an intro video for prefect.
micgor32 was assigned by linus 2024-02-06 14:55:55 +00:00
linus changed title from Setup and document continuous deployment to Setup and document continuous deployment / orchestration 2024-02-06 15:13:52 +00:00
Author
Owner

I ditched prefect and started using luigi instead. Much simpler and better geared for our use case IMHO.

I ditched prefect and started using luigi instead. Much simpler and better geared for our use case IMHO.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TEDective/etl#3
No description provided.