Mass-convert files to UTF-8 #918
No reviewers
Labels
No Label
bug
build
cgi Scripting
design
disruptive
documentation
duplicate
easy
feature-request
help wanted
javascript
priority/low
question
system-hackers
tagging
text
translations
wait/bugfix
wait/inprogress
wait/misc
wait/proofread
wontfix
xsl
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: FSFE/fsfe-website#918
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix-encoding"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR adds some new/improved tools to convert a lot of files to UTF-8, and fixes #641.
It makes a few basic checks though:
The script
tools/encoding-convert.sh
makes use ofcheck-translation-status.sh
.Here is the complete log of the initial run. It shows successful conversions and the files which have been ignored due to their "outdatedness":
I noticed there are also files which claim to be non-UTF8 but actually are (or us-ascii which seems to be kinda equivalent). I made a check by running:
That results in:
So in the next commit I will just change the declared XML encoding to UTF-8 for those marked as "actually-utf" (sorry for the stupid name)
This is the list of files which are non-UTF8 but not outdated. That happens because some files do not have an EN base version, so we cannot easily check what's the original to prevent changing and outdated file.
One way to continue here could be to check whether there actually IS another language version. If not, we could safely convert the encoding.
There was one files which was present in more than 1 language:
For that it was obvious that the ES version has been the original and that there is no discrepancy between both versions. So I've changed the encoding for ES, and fake-updated the CA version.
The next commit was about about updating the rest of the list above.
What I forgot: fake-updating all up-to-date translations whose EN original changed due to the encoding changes
Solved by adding the -o flag to check-translation-status.sh and doing some semi-automatic comparisons, and by that a fake-update to
The following 67 files are still not UTF-8 because they are outdated against their EN original. For some, the English version just fixed a typo, some others definitely lag behind content-wise.
We'll have to go through them. Useful tools would be
git log
andtools/check-translation-status.sh -a -f <file>
to check the change dates of all correlated files.The latest commits also solve the leftover files mentioned above as well as a few other edge cases like files that are declared as non-UTF in their XML header but actually are UTF-8.
I have merged to the test branch to test it.
I tested ~20 sites on test.fsfe.org and everything looks fine. Will merge therefore :)