Translation warning box sometimes wrong #64
Etiquetas
Sem etiqueta
bug
build
cgi Scripting
design
disruptive
documentation
duplicate
easy
feature-request
help wanted
javascript
priority/low
question
system-hackers
tagging
text
translations
wait/bugfix
wait/inprogress
wait/misc
wait/proofread
wontfix
xsl
Sem marco
Sem responsável
4 participante(s)
Notificações
Data limite
Data limite não informada.
Dependências
Nenhuma dependência definida.
Referência: FSFE/fsfe-website#64
Carregando…
Referência em uma nova issue
Nenhuma descrição fornecida.
Excluir branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
On some pages the red "translation outdated" box is false-positive. For example index.de.xhtml is newer that its English pendant: DE vs EN.
On some other pages it seems to me that there's a false-negative but I don't have an example right now. Maybe when fixing the logic this error is also found.
Translation warning box on wrongpara Translation warning box sometimes wrongThis is odd, because on the server both files got updated on November 21, even though according to git log, the english version should be 14 days older.
We are relying on filesystem time stamps here. Is there any chance that git touches files during merges, even when those files themselfes did not change?
On a side note: is it possible to enable proper time display in gitea. It is no fun to browse the repository for time stamp discrepancies when everything says "approx. 1 month ago", or "2 days ago".
Also: I do not think this has anything to do with xsl either.
Yes, git touches files after a merge which kind of makes sense in a git mindset. This by the way also seems to happen after a commit, at least my text editor sometimes tells me that a file has been changed after such an operation.
Not ideal but if you hover over this indicator you should see a more detailed time. No idea how to make this the default view.
Changed that, too
The status log confirms that the file was modified during the merge, despite not being part of a related commit (i believe?): https://status.fsfe.org/fsfe.org/status_1511267050.html
Good, so this means we should definitely find a way to make the box dependent on commit time stamps. Looking those up during page build is a lot slower than checking fs timestamps.
It might be faster to update fs time stamps at the start of the build to reflect commit times. Because this is also slow to do for the entire repo, maybe it should be implemented as part of the VCS updater (git_build_into() in buildrun.sh). This way we could limit lookups of the commit log to files from the merge.
Maybe with some git magic we can automatically add a timestamp="..." attribute with the timestamp of the commit in the root node of all XML files?
This might also be related to #837.
Personally, I am a bit hesitant to automatically add/edit something in all XML files. Preferably, any lookup should be done on the server side.
@reinhard, do you think you could estimate the speed loss if we looked up the last edit time and author via Git? If it's significant, could we do it during the midnight run?
I haven't really tested, but I guess the speed loss would be enormous. Maybe you can do a test during the webathon?
I am attaching a test script that does 30*30 file comparisons. On my rather old machine comparison of file timestamps takes less than one second, while comparison of git commit times takes about 44 seconds.
That is a huge difference but as the build process as a whole is quite time consuming the overall effect of the modified time comparison may be not that dramatic and maybe even acceptable.
PR #952 by @ulf
Oh wow, that's quite a difference, and probably opposite to what we want to achieve with the latest improvements of the build script.
I wonder whether we could cache at least the change time of the EN file somehow to reduce the time for all the translations by almost 50%.
@reinhard, would you see any other potential methods to reduce the check time? Perhaps a file containing all timestamps which is only updated incrementally if one file is being changed?
If I am not mistaken, we need the date of the last commit for two purposes:
To check whether a translation is outdated (see #2). There might be better indicators for that than the commit time anyway.
To display the date of the last change on the webpage itself (see #837). Actually, most websites out there don't display that information at all, and in the past years, where this information in fact was not present, nobody missed it. So we might want to think again whether we really want that.
If we actually do want the date of the last commit, I still think a commit hook would be the best solution, since it is essentially the same as what we had in SVN times.
Could you please explain which kind of hook and what it shall do? I don't remember the server-side hooks we had with SVN.
SVN filled in the $Author: and $Date: information upon commit, that was not a server-side hook, but rather a built-in SVN function.
Essentially, we would just need some git magic which writes the date of the last commit into a predefined space within the file.
OK, we could do that, but it would mean that we effectively would have to add this information to all files initially. That would mean that we also touch outdated files...
@max.mehl you have a point there :-/
I am going to do some test builds to find out how large the actual impact of the modified timestamp comparison is. Please hold the line ...
During a full build more than 50,000 checks are done in order to find
outdated translations.
Currently the checks are done by comparing file modification times.
This takes almost no time (about 10 seconds on my box). A full build
takes approx. 110 minutes on my box.
In PR #952 the checks are done by comparing git commit times. For
each comparison "git log" is called for each file and the commit times
are then compared. There is no optimisation. With these changes a
full build takes approx. 135 minutes on my box.
PR #974 contains an alternative approach. During phase 1 of the build
a "sidecar file" is created or updated for each "*.en.xhtml" file that
contains its outdated translations. (This is implicitly assuming that
"en" is the original language and all others are translations. This
should be adjusted/generalised/fixed.) These sidecar files are later
used to identify outdated files. With these changes a full build
takes approx. 115 minutes on my box.
@reinhard what do you think, is any of these two approaches doable for our setup, especially if we do partial builds? Time-wise, the difference is much smaller than I thought, but perhaps we could make a test on our build server?
I just had the following idea and would like to hear your feedback:
touch -d "$(git log -1 --format="%ci" ${file})" .${file}.date
for each file which was updated since the last build run.BUT I would also suggest that we really first decide about whether we actually want to use commit dates for anything, see my other comment of 27 May 9:33.
Sounds good.
I am wondering whether it is faster to call "git log" for each language to create ".date" files or to call "git log" once for the en.xhtml and once for the whole set of translations to create ".outdated-translations" files.
One should do some tests.
I am impartial on the format and strategy to create the companion files, but it sounds like a good plan to me.
That's a complicated one.
The actual issue (translation warning sometimes wrong) was solved with
c4cb7aeef4
so I'm closing this issue and propose that we continue discussion about the other use of git commit time (displaying it in the footer of the HTML output) in #837.