feat/python-rewrite #4762
Reference in New Issue
Block a user
Delete Branch ":feat/python-rewrite"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Rewrite the whole build process in python, for superior speed and maintenance
f8b17154a1to5d00068988b05a51078etoaf81f98f68So, as of now this is, as far as I can tell, a fully functional website build replacement that correctly builds all pages, menus, etc. There may be some small issues somewhere.
The phase1 is significantly faster than before (7s worst case vs 120s worst case).
I have not benchmarked phase 2 yet. I believe there are some optimisations I can make to phase 2, mainly regarding that at the moment the same xhtml|xml files are parsed multiple times with lxml. I should be able to reduce this, but passing lxml objects between funcs can be tricky.
Additionally, I have enhanced the Dockerfile, Docker compose, and introduced entrypoint.sh such that the docker build process no longer interferes with the users repo. It also means that the docker builds properly use caching, so I believe it will be viable to use drone to deploy the website, as we use drone for all other deployments.
Unresolved issues as of now:
How to get files from docker volume to the several serves they need to be on? I think it would be unwise/impossible to allow the container access to the host sshkeys for rsync? easiest thing is probably a cron job on build-server that rsyncs the result from the volume to the targets. I will try and discuss with other syshackers.
While there are some features I want to add I think its best to first move to this enhanced but very similar in behavior to original builder, then add features/ enhancements. This is a significant change, so probably wisest to deploy on the test branch, then on master.
d507cff1a2tod140dafa39@@ -1,23 +1,22 @@FROM debian:bookworm-slimRUN apt updateFROM fedora:latestGreat work! I've never seen Fedora used in a Dockerfile :)
Why Fedora and not something like Alpine?
Under alpine the compilation of
tdewolff-minifyfails with error__vfprintf_chk: symbol not found. It would seem that this can be solved by building the go binary with different compile flags, E.G here, but I did not feel I knew enough about go to investigate it.And the go version in the latest debian is too old to compile
tdewolff-minify.I get that it seems the problem here is
tdewolff-minify, but it does seem to be the best minifier around for speed and supporting multiple filetypes.At the moment it is only used for the css generated from less. But in the future, probably in a followup PR, I would like to use it to minify all
css,jsandhtmlfiles.Some issues came up with fedora (broken node) and I managed to port it do debian successfully.
Things this branch does not do that the original script does
I need to fix the staging bit. At the moment it will not actually copy the result from the stage dir to the target dir.
It does not do the generated manifest and then the removal of files from the target. So only a full rebuild using
--stagewill actually remove unused files from the target. I would kinda prefer to avoid implementing this removal logic, as I don't really want to treat the generated website dir as something that is carefully managed, I would prefer that if there are any questions we just full rebuild. But the manifest and removal may be necessary.Some code style to work on:
There are a bunch of pretty common pathlib operation I do, like chaining `with_suffix("") and some stuff that should probably become lib functions.
d140dafa39to96e24da3d5Oh, of course. We can use drone secrets to pass things to the container. So we can pass the secrets keys as env vars, add them to files, load them and deploy.
Or the solution described in this comment seems quite elegant, if less neatly encapsulated in drone. https://stackoverflow.com/a/36648428
Fixed in latest commit, it now stages correctly.
Oh, and another issue: the fsfe status cgi script is no longer works meangingfully.
I can patch the build process and the scig script to show the new logs, but if we are using drone that is kinda just unnecessary? Probably better to remove it.
I also plan to add a buildtime flag from generating the translation status pages, so we should get to replace that stuff.
Did some basic benchmarking on my laptop, should not be taken as solid proof of anything, but shows general trends. Both for full rebuilds with no caching. Current bash build script was ran first, so as not to make improvements seem better than real.
Old bash build: 24 mins
New build: 8 mins
The worst offenders for this have now been functionised.
5105374e10to89623e5407WIP: feat/python-rewriteto feat/python-rewritefeat/python-rewriteto WIP: feat/python-rewrite89623e5407to4ea6d885974ea6d88597to6e978b576dHave now rewritten the translation-status script ad added it to the build process.
New one takes ~30s for all languages, old took ~33 mins for all languages.
So thats a nice improvement.
As far as I can tell at this point, the build process works properly using the script locally, using docker and using drone exec to run the pipeline locally.
The next step is to disable the pull based ci for the test branch. I have a pr to do this fsfe-system-hackers/build-server#15. After its merged someone with access will have to redeploy the webserver.
Then merge this pr to the test branch. Alter the config so it aims at the right dirs on the webservers.
This will need sshkeys to access the webservers to deploy the test branch builds too. Load them into ssh as per https://stackoverflow.com/a/72654766
And to sign the drone file, rerun it, etc I need more drone permissions for this repo. I have commit access for the repo, but for some reason Tobias and I cant figure that does not give me drone access.
This branch is stuck waiting on the above requirements.
Pinging @tobiasd as maintainer.
24391c7ef5to8769c12efb797de26bfdtoe6bd3ada8f4115fedf9dto30d5eaa8b430d5eaa8b4to329e09084d3b6745aa93to17bfcf6476c1bcf03375toaa8d373e67So, I appreciate that it seems like this is still under heavy development, but frankly I am just grabbing some small optimizations and code style improvements.
2d9514e976toaac039d57daac039d57dtoc22c26453ca593c6daf9toccb4d0e5d9This is now deployed on the test branch.
8d75825b82to1d3edd02f6So, progress report:
Loading of secrets works (I think), and the docker compose file enables ipv6 properly. Ran it locally, so a build from this branch is currently live on test.fsfe.org
But it does not work when ran in drone, I think because the parent docker container, the official docker image, does not enable ipv6.
So we can enable ipv6 by default for all images, in docker daemon conf, but thats not ideal.
Other problem we have is that the ephemeral docker image used by drone means that layers dont get cached between runs, massively increasing the build time.
I think we maybe use a drone exec pipeline to run with one less layer of docker, I think would fix our ipv6+caching issues.
But less secure, and different to how most of our services work.
1d3edd02f6to88721f94b9Ipv6 now not needed, as using ipv4 proxies to the ipv6 servers.
Caching still an issue, final blocker I am aware of.
88721f94b9toedc241c185Caching is now fixed.
The docker in docker container is pruned nightly, so the first build every day will do the whole dep reinstall. But I have optimized that a little, so it's not too slow.
And after that cached docker layers will be used till the next day.
And fixed the build incremental caching, so now it works. There are a few bits that dont seem to cache properly, but I think that hunting them down is a future problem.
35e5515febto99a93470b0So.
This now largely works, and is deployed on
test.fsfe.org. There are a few things to iron out.I will be very busy the next while, so there will likely be no further development for a bit.
99a93470b0toedd80f302c7468236a03to7b0e3bf3b4So,
What is left for this PR?
Done
It would seem that the site builds correctly, all links working and etc. The CI setup for deploying it to
test.fsfe.orgworks correctly.The build itself is much faster than before, and seems to cache properly.
Todo
In python performance
There are several places where some speed in the python itself can be gained, all marked with TODOs. These would give slightly better performance, but I think its best to leave that to a follow up pr. This one is already huge, and reworking some of the stuff has significant breakage risks.
TLDR: Some optimization possible leave to future pr.
CI
The check step uses a different image, and takes a while to install and run its tests. Before builds did not actually need this to pass of event start to start building and deployment, so they did not have the delay it adds. Probably just a price to pay for not building and deployed commits that fail ci. Nothing to do really.
We delete our drone runner image/clean our caches every night. This means that the first run the following day takes another 2 minutes or so. This should be okay as we will presumably just set a nighty cron to build the webpages anyway. Just an FYI really.
TLDR: Some small delays that are slightly annoying, but no real mitigation available. And only small delays.
Deployment
If one inspects the drone yml file they will see a step for sending to test.fsfe.org. A similar one would need to be introduced to deploy it to
fsfe.orgproper pre merge.Additionally, before this could be merged, one must disable the existing build process for the website, as was done in fsfe-system-hackers/build-server#15 for the test branch.
Oh, and we should probably be using a
pyproject.toml, but again, that can wait for a future pr.7b0e3bf3b4to9d3daed6220e4850c286toa7e3b10a87.htaccessand.htuserfiles f5a8cb3614WIP: feat/python-rewriteto feat/python-rewrite