29
30
Fork 85

Translation warning box sometimes wrong #64

Fechado
aberto por max.mehl 2017-12-14 11:06:24 +00:00 · 20 comentários
Proprietário

On some pages the red "translation outdated" box is false-positive. For example index.de.xhtml is newer that its English pendant: DE vs EN.

On some other pages it seems to me that there's a false-negative but I don't have an example right now. Maybe when fixing the logic this error is also found.

On some pages the red "translation outdated" box is false-positive. For example index.de.xhtml is newer that its English pendant: [DE](https://git.fsfe.org/FSFE/fsfe-website/commits/branch/master/index.de.xhtml) vs [EN](https://git.fsfe.org/FSFE/fsfe-website/commits/branch/master/index.en.xhtml). On some other pages it seems to me that there's a false-negative but I don't have an example right now. Maybe when fixing the logic this error is also found.
max.mehl alterou o título de Translation warning box on wrong para Translation warning box sometimes wrong 2017-12-14 11:06:42 +00:00
max.mehl adicionou os rótulos
xsl
bug
2017-12-14 11:08:58 +00:00
max.mehl adicionou o(s) rótulo(s)
build
e removeu
xsl
2017-12-14 13:41:55 +00:00

This is odd, because on the server both files got updated on November 21, even though according to git log, the english version should be 14 days older.

We are relying on filesystem time stamps here. Is there any chance that git touches files during merges, even when those files themselfes did not change?

On a side note: is it possible to enable proper time display in gitea. It is no fun to browse the repository for time stamp discrepancies when everything says "approx. 1 month ago", or "2 days ago".

Also: I do not think this has anything to do with xsl either.

This is odd, because on the server both files got updated on November 21, even though according to git log, the english version should be 14 days older. We are relying on filesystem time stamps here. Is there any chance that git touches files during merges, even when those files themselfes did not change? On a side note: is it possible to enable proper time display in gitea. It is no fun to browse the repository for time stamp discrepancies when everything says "approx. 1 month ago", or "2 days ago". Also: I do not think this has anything to do with xsl either.
Autor
Proprietário

Yes, git touches files after a merge which kind of makes sense in a git mindset. This by the way also seems to happen after a commit, at least my text editor sometimes tells me that a file has been changed after such an operation.

On a side note: is it possible to enable proper time display in gitea. It is no fun to browse the repository for time stamp discrepancies when everything says "approx. 1 month ago", or "2 days ago".

Not ideal but if you hover over this indicator you should see a more detailed time. No idea how to make this the default view.

Also: I do not think this has anything to do with xsl either.

Changed that, too

Yes, git touches files after a merge which kind of makes sense in a git mindset. This by the way also seems to happen after a commit, at least my text editor sometimes tells me that a file has been changed after such an operation. > On a side note: is it possible to enable proper time display in gitea. It is no fun to browse the repository for time stamp discrepancies when everything says "approx. 1 month ago", or "2 days ago". Not ideal but if you hover over this indicator you should see a more detailed time. No idea how to make this the default view. > Also: I do not think this has anything to do with xsl either. Changed that, too

The status log confirms that the file was modified during the merge, despite not being part of a related commit (i believe?): https://status.fsfe.org/fsfe.org/status_1511267050.html

Good, so this means we should definitely find a way to make the box dependent on commit time stamps. Looking those up during page build is a lot slower than checking fs timestamps.

It might be faster to update fs time stamps at the start of the build to reflect commit times. Because this is also slow to do for the entire repo, maybe it should be implemented as part of the VCS updater (git_build_into() in buildrun.sh). This way we could limit lookups of the commit log to files from the merge.

The status log confirms that the file was modified during the merge, despite not being part of a related commit (i believe?): https://status.fsfe.org/fsfe.org/status_1511267050.html Good, so this means we should definitely find a way to make the box dependent on commit time stamps. Looking those up during page build is a lot slower than checking fs timestamps. It might be faster to update fs time stamps at the start of the build to reflect commit times. Because this is also slow to do for the entire repo, maybe it should be implemented as part of the VCS updater (git_build_into() in buildrun.sh). This way we could limit lookups of the commit log to files from the merge.
Membro

Maybe with some git magic we can automatically add a timestamp="..." attribute with the timestamp of the commit in the root node of all XML files?

This might also be related to #837.

Maybe with some git magic we can automatically add a timestamp="..." attribute with the timestamp of the commit in the root node of all XML files? This might also be related to #837.
Autor
Proprietário

Personally, I am a bit hesitant to automatically add/edit something in all XML files. Preferably, any lookup should be done on the server side.

@reinhard, do you think you could estimate the speed loss if we looked up the last edit time and author via Git? If it's significant, could we do it during the midnight run?

Personally, I am a bit hesitant to automatically add/edit something in all XML files. Preferably, any lookup should be done on the server side. @reinhard, do you think you could estimate the speed loss if we looked up the last edit time and author via Git? If it's significant, could we do it during the midnight run?
Membro

I haven't really tested, but I guess the speed loss would be enormous. Maybe you can do a test during the webathon?

I haven't really tested, but I guess the speed loss would be enormous. Maybe you can do a test during the webathon?
max.mehl adicionou esta issue para o marco Hackathon1905 2019-05-22 15:26:19 +00:00

I am attaching a test script that does 30*30 file comparisons. On my rather old machine comparison of file timestamps takes less than one second, while comparison of git commit times takes about 44 seconds.

That is a huge difference but as the build process as a whole is quite time consuming the overall effect of the modified time comparison may be not that dramatic and maybe even acceptable.

I am attaching a test script that does 30*30 file comparisons. On my rather old machine comparison of file timestamps takes less than one second, while comparison of git commit times takes about 44 seconds. That is a huge difference but as the build process as a whole is quite time consuming the overall effect of the modified time comparison may be not that dramatic and maybe even acceptable.
Autor
Proprietário

PR #952 by @ulf

Oh wow, that's quite a difference, and probably opposite to what we want to achieve with the latest improvements of the build script.

I wonder whether we could cache at least the change time of the EN file somehow to reduce the time for all the translations by almost 50%.

@reinhard, would you see any other potential methods to reduce the check time? Perhaps a file containing all timestamps which is only updated incrementally if one file is being changed?

PR #952 by @ulf Oh wow, that's quite a difference, and probably opposite to what we want to achieve with the latest improvements of the build script. I wonder whether we could cache at least the change time of the EN file somehow to reduce the time for all the translations by almost 50%. @reinhard, would you see any other potential methods to reduce the check time? Perhaps a file containing all timestamps which is only updated incrementally if one file is being changed?
Membro

If I am not mistaken, we need the date of the last commit for two purposes:

  1. To check whether a translation is outdated (see #2). There might be better indicators for that than the commit time anyway.

  2. To display the date of the last change on the webpage itself (see #837). Actually, most websites out there don't display that information at all, and in the past years, where this information in fact was not present, nobody missed it. So we might want to think again whether we really want that.

If we actually do want the date of the last commit, I still think a commit hook would be the best solution, since it is essentially the same as what we had in SVN times.

If I am not mistaken, we need the date of the last commit for two purposes: 1. To check whether a translation is outdated (see #2). There might be better indicators for that than the commit time anyway. 2. To display the date of the last change on the webpage itself (see #837). Actually, most websites out there don't display that information at all, and in the past years, where this information in fact was not present, nobody missed it. So we might want to think again whether we really want that. If we actually *do* want the date of the last commit, I still think a commit hook would be the best solution, since it is essentially the same as what we had in SVN times.
Autor
Proprietário

If we actually do want the date of the last commit, I still think a commit hook would be the best solution, since it is essentially the same as what we had in SVN times.

Could you please explain which kind of hook and what it shall do? I don't remember the server-side hooks we had with SVN.

> If we actually do want the date of the last commit, I still think a commit hook would be the best solution, since it is essentially the same as what we had in SVN times. Could you please explain which kind of hook and what it shall do? I don't remember the server-side hooks we had with SVN.
Membro

SVN filled in the $Author: and $Date: information upon commit, that was not a server-side hook, but rather a built-in SVN function.

Essentially, we would just need some git magic which writes the date of the last commit into a predefined space within the file.

SVN filled in the $Author: and $Date: information upon commit, that was not a server-side hook, but rather a built-in SVN function. Essentially, we would just need some git magic which writes the date of the last commit into a predefined space within the file.
Autor
Proprietário

OK, we could do that, but it would mean that we effectively would have to add this information to all files initially. That would mean that we also touch outdated files...

OK, we could do that, but it would mean that we effectively would have to add this information to all files initially. That would mean that we also touch outdated files...
Membro

@max.mehl you have a point there :-/

@max.mehl you have a point there :-/

I am going to do some test builds to find out how large the actual impact of the modified timestamp comparison is. Please hold the line ...

I am going to do some test builds to find out how large the actual impact of the modified timestamp comparison is. Please hold the line ...

During a full build more than 50,000 checks are done in order to find
outdated translations.

Currently the checks are done by comparing file modification times.
This takes almost no time (about 10 seconds on my box). A full build
takes approx. 110 minutes on my box.

In PR #952 the checks are done by comparing git commit times. For
each comparison "git log" is called for each file and the commit times
are then compared. There is no optimisation. With these changes a
full build takes approx. 135 minutes on my box.

PR #974 contains an alternative approach. During phase 1 of the build
a "sidecar file" is created or updated for each "*.en.xhtml" file that
contains its outdated translations. (This is implicitly assuming that
"en" is the original language and all others are translations. This
should be adjusted/generalised/fixed.) These sidecar files are later
used to identify outdated files. With these changes a full build
takes approx. 115 minutes on my box.

During a full build more than 50,000 checks are done in order to find outdated translations. Currently the checks are done by comparing file modification times. This takes almost no time (about 10 seconds on my box). A full build takes approx. 110 minutes on my box. In PR #952 the checks are done by comparing git commit times. For each comparison "git log" is called for each file and the commit times are then compared. There is no optimisation. With these changes a full build takes approx. 135 minutes on my box. PR #974 contains an alternative approach. During phase 1 of the build a "sidecar file" is created or updated for each "*.en.xhtml" file that contains its outdated translations. (This is implicitly assuming that "en" is the original language and all others are translations. This should be adjusted/generalised/fixed.) These sidecar files are later used to identify outdated files. With these changes a full build takes approx. 115 minutes on my box.
Autor
Proprietário

@reinhard what do you think, is any of these two approaches doable for our setup, especially if we do partial builds? Time-wise, the difference is much smaller than I thought, but perhaps we could make a test on our build server?

@reinhard what do you think, is any of these two approaches doable for our setup, especially if we do partial builds? Time-wise, the difference is much smaller than I thought, but perhaps we could make a test on our build server?
Membro

I just had the following idea and would like to hear your feedback:

  • In the phase 1 Makefile, run touch -d "$(git log -1 --format="%ci" ${file})" .${file}.date for each file which was updated since the last build run.
  • So at the end of phase 1 Makefile, each file has a hidden companion whose filetime is the actual commit date of the real file.
  • This filetime can easily and cheaply be queried for all purposes, like determination of outdated translations, or inclusion of the commit date in the HTML output.

BUT I would also suggest that we really first decide about whether we actually want to use commit dates for anything, see my other comment of 27 May 9:33.

I just had the following idea and would like to hear your feedback: * In the phase 1 Makefile, run `touch -d "$(git log -1 --format="%ci" ${file})" .${file}.date` for each file which was updated since the last build run. * So at the end of phase 1 Makefile, each file has a hidden companion whose filetime is the actual commit date of the real file. * This filetime can easily and cheaply be queried for all purposes, like determination of outdated translations, or inclusion of the commit date in the HTML output. **BUT** I would also suggest that we really first decide about whether we actually want to use commit dates for anything, see my other comment of 27 May 9:33.

Sounds good.

I am wondering whether it is faster to call "git log" for each language to create ".date" files or to call "git log" once for the en.xhtml and once for the whole set of translations to create ".outdated-translations" files.

One should do some tests.

Sounds good. I am wondering whether it is faster to call "git log" for each language to create ".date" files or to call "git log" once for the en.xhtml and once for the whole set of translations to create ".outdated-translations" files. One should do some tests.
Autor
Proprietário

I am impartial on the format and strategy to create the companion files, but it sounds like a good plan to me.

BUT I would also suggest that we really first decide about whether we actually want to use commit dates for anything, see my other comment of 27 May 9:33.

That's a complicated one.

  • Regarding showing the time stamps, it's quite useful for debugging (and also a future sitemap file), but I would be fine if we hid that in an HTML comment.
  • Regarding outdated translations, we would have to find a good strategy which is suitable for webmasters, translators and editors alike. And if we had this, I doubt that we can adapt it to all old files but start using it incrementally. Until then, I think we have to rely on git commit times.
I am impartial on the format and strategy to create the companion files, but it sounds like a good plan to me. > BUT I would also suggest that we really first decide about whether we actually want to use commit dates for anything, see my other comment of 27 May 9:33. That's a complicated one. * Regarding showing the time stamps, it's quite useful for debugging (and also a future sitemap file), but I would be fine if we hid that in an HTML comment. * Regarding outdated translations, we would have to find a good strategy which is suitable for webmasters, translators and editors alike. And if we had this, I doubt that we can adapt it to all old files but start using it incrementally. Until then, I think we have to rely on git commit times.
max.mehl removeu esta issue do marco Hackathon1905 2020-03-20 11:51:10 +00:00
Membro

The actual issue (translation warning sometimes wrong) was solved with c4cb7aeef4 so I'm closing this issue and propose that we continue discussion about the other use of git commit time (displaying it in the footer of the HTML output) in #837.

The actual issue (translation warning sometimes wrong) was solved with c4cb7aeef44ed9d2c955268b9b20946f024e09b4 so I'm closing this issue and propose that we continue discussion about the other use of git commit time (displaying it in the footer of the HTML output) in #837.
Acesse para participar desta conversação.
Sem marco
Sem responsável
4 participante(s)
Notificações
Data limite
A data limite é inválida ou está fora do intervalo. Por favor, use o formato 'dd/mm/aaaa'.

Data limite não informada.

Dependências

Nenhuma dependência definida.

Referência: FSFE/fsfe-website#64
Nenhuma descrição fornecida.