#1218 WIP: benchmark build system and improve performance

Open
jzarl wants to merge 4 commits from jzarl/fsfe-website:l-benchmark into master
jzarl commented 1 year ago
Collaborator

Going home from FOSDEM I took a look at the build system to see if there are any low-hanging fruit to reduce build times.

Since it is executed every time, tools/update_xmllists.sh seems like an obvious place to start. And indeed, parallelizing the generation of tag maps (using GNU parallel) cuts down the time to do so considerably:
On my laptop (NVME SSD) this removes ~7 seconds of 27 seconds total, and on my (admittedly dated) PC using a regular hard drive it cuts 20 seconds of 65 seconds total time.

So, based on a relatively trivial change this could benefit most users of the build system.

But maybe this is the wrong direction I'm taking? That script in particular has neither well-defined dependencies on files, nor does it produce clearly defined products. Maybe we should split it into 4 independent scripts that are only called by the Makefile when a dependency changes?

@reinhard: I think you know the build system best - what's your opinion on this?

Going home from FOSDEM I took a look at the build system to see if there are any low-hanging fruit to reduce build times. Since it is executed every time, `tools/update_xmllists.sh` seems like an obvious place to start. And indeed, parallelizing the generation of tag maps (using GNU parallel) cuts down the time to do so considerably: On my laptop (NVME SSD) this removes ~7 seconds of 27 seconds total, and on my (admittedly dated) PC using a regular hard drive it cuts 20 seconds of 65 seconds total time. So, based on a relatively trivial change this could benefit most users of the build system. But maybe this is the wrong direction I'm taking? That script in particular has neither well-defined dependencies on files, nor does it produce clearly defined products. Maybe we should split it into 4 independent scripts that are only called by the Makefile when a dependency changes? @reinhard: I think you know the build system best - what's your opinion on this?
jzarl added the
build
label 1 year ago
reinhard commented 1 year ago
Collaborator

Thank you for looking into this!

About the profiling, did you see that the already existing standard output of the build script (https://status.fsfe.org/fsfe.org/) already tells you the time used for each step?

About parallelizing stuff, I think it should work. Let's check how much of a difference it makes on the actual build machine.

About defining dependencies, the point is that these scripts also have to run when files have been removed, which cannot be easily defined in a Makefile. In the past there were some hacks in the build script trying to achieve this (like trying to parse the "git pull" output), but they all turned out to be very unreliable, so I think it's better to buy reliable function with a few seconds of build time.

Thank you for looking into this! About the profiling, did you see that the already existing standard output of the build script (https://status.fsfe.org/fsfe.org/) already tells you the time used for each step? About parallelizing stuff, I think it should work. Let's check how much of a difference it makes on the actual build machine. About defining dependencies, the point is that these scripts also have to run when files have been *removed*, which cannot be easily defined in a Makefile. In the past there were some hacks in the build script trying to achieve this (like trying to parse the "git pull" output), but they all turned out to be very unreliable, so I think it's better to buy reliable function with a few seconds of build time.
jzarl commented 1 year ago
Poster
Collaborator

Thanks for the fast response!

I did see the timings output on the console log, but I didn't know about the overview page that you linked. Anyway, the profiling stuff is not really intended as a permanent addition, but as a tool to focus on specific pieces of code when finding hotspots.

Good point about removing files. I think it should be possible to achieve that without relying on git, though. How about something like the following pseudo-code?

for language in $languages ;do
  find * -name "*.$language.xml" > $language.filelist.new
  copy_if_different $language.filelist.new $language.filelist
done
find * -name '*.sources' > sources.filelist.new
copy_if_different sources.filelist.new sources.filelist

But even if it turns out that we have to execute this script unconditionally, there is still a benefit of moving parts of the logic into the Makefile - having per-language rules in the Makefile would get us parallel execution and job scheduling for free, without having to introduce GNU parallel as (optional) dependency.

Thanks for the fast response! I did see the timings output on the console log, but I didn't know about the overview page that you linked. Anyway, the profiling stuff is not really intended as a permanent addition, but as a tool to focus on specific pieces of code when finding hotspots. Good point about removing files. I think it should be possible to achieve that without relying on git, though. How about something like the following pseudo-code? ``` for language in $languages ;do find * -name "*.$language.xml" > $language.filelist.new copy_if_different $language.filelist.new $language.filelist done find * -name '*.sources' > sources.filelist.new copy_if_different sources.filelist.new sources.filelist ``` But even if it turns out that we have to execute this script unconditionally, there is still a benefit of moving parts of the logic into the Makefile - having per-language rules in the Makefile would get us parallel execution and job scheduling for free, without having to introduce GNU parallel as (optional) dependency.
reinhard commented 1 year ago
Collaborator

I think your approach is generally correct and worth trying. On the other hand, there's no complaint I hear more often than "the build system is too complex" (in fact, I think I heard it by magnitudes more often than "the build system is too slow"). So considering that speed is not the only quality we want to optimize, I would like to see us find a good compromise regarding performance vs. understandability of code.

Having said that, I am all for giving it a try and see how much we gain on the production machine.

I think your approach is generally correct and worth trying. On the other hand, there's no complaint I hear more often than "the build system is too complex" (in fact, I think I heard it by magnitudes more often than "the build system is too slow"). So considering that speed is not the *only* quality we want to optimize, I would like to see us find a good compromise regarding performance vs. understandability of code. Having said that, I am all for giving it a try and see how much we gain on the production machine.
All checks were successful
continuous-integration/drone/pr Build is passing
This pull request has changes conflicting with the target branch.
build/misc.sh
tools/update_xmllists.sh
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This pull request currently doesn't have any dependencies.

Loading…
There is no content yet.