#954 Tags for languages and countries are not separated

Closed
opened 2 years ago by jzarl · 11 comments
jzarl commented 2 years ago
Collaborator

Looking over the tags it seems that there is no clear distinction between ISO 639-1 language codes and ISO 3166-1 country codes.
My understanding is that two-letter codes in tags are supposed to be country codes (language codes are embedded in the file name and subject to translation).

E.g. there are several Austrian events tagged as "de".

P.S.: As a side note, the tag for Great Britain/United Kingdom is "uk" instead of the correct "gb" (which is my fault).

Looking over the tags it seems that there is no clear distinction between ISO 639-1 language codes and ISO 3166-1 country codes. My understanding is that two-letter codes in tags are supposed to be country codes (language codes are embedded in the file name and subject to translation). E.g. there are several Austrian events tagged as "de". P.S.: As a side note, the tag for Great Britain/United Kingdom is "uk" instead of the correct "gb" (which is my fault).
reinhard commented 2 years ago
Collaborator

I agree that it makes more sense to tag by country than by language.

I agree that it makes more sense to tag by country than by language.
max.mehl commented 2 years ago
Owner

Full ack, and that's how we automatically do it for the events registered with the tool. Please have a look at the tool's country drop-down to see the country codes.

The the wrongly tagged events probably stem from manual edits or earlier policies I am not aware of.

Full ack, and that's how we automatically do it for the events registered with the tool. Please have a look at the [tool's](https://fsfe.org/community/tools/eventregistration.html) country drop-down to see the country codes. The the wrongly tagged events probably stem from manual edits or earlier policies I am not aware of.
max.mehl added the
tagging
label 2 years ago
max.mehl commented 2 years ago
Owner

Partially fixed with @jzarl's #966

Partially fixed with @jzarl's #966
Collaborator

Is it realistic to remove all language-related tags and, if needed, change them into country-specific tags? Because I guess we all agree that this is what we want, but who can do it?

Is it realistic to remove all language-related tags and, if needed, change them into country-specific tags? Because I guess we all agree that this is what we want, but who can do it?
Owner

I think it's realistic in a sense that we must to it some day. The tags are quite useless in their current state.

I wonder about the how though. In the past I did some smaller unification attempts, but they have been quite manual. Is there a way to create a file where we can define tags to be deleted or changed to something else, and then just run the whole thing? Not sure whether @jzarl's tool is exactly doing that already...

I think it's realistic in a sense that we *must* to it some day. The tags are quite useless in their current state. I wonder about the *how* though. In the past I did some smaller unification attempts, but they have been quite manual. Is there a way to create a file where we can define tags to be deleted or changed to something else, and then just run the whole thing? Not sure whether @jzarl's tool is exactly doing that already...
jzarl commented 11 months ago
Poster
Collaborator

My script started with exactly that. Alas, I assumed that nobody wants to tediously define a .csv file in order to run a batch job and removed this mode when updating the tool to the new tag syntax.

IMO, though, we didn't really lose anything when I removed the bulk mode. You can still do something like the following:

tools/tagtool/tagtool.sh --remove-tags $deprecated_tags

cat tags_to_rename.txt | {
while read oldTag newTag
do
  tools/tagtool/tagtool.sh --rename-tag "$oldTag" "$newTag"
done }
My script started with exactly that. Alas, I assumed that nobody wants to tediously define a .csv file in order to run a batch job and removed this mode when updating the tool to the new tag syntax. IMO, though, we didn't really lose anything when I removed the bulk mode. You can still do something like the following: ``` tools/tagtool/tagtool.sh --remove-tags $deprecated_tags cat tags_to_rename.txt | { while read oldTag newTag do tools/tagtool/tagtool.sh --rename-tag "$oldTag" "$newTag" done } ```
jzarl commented 11 months ago
Poster
Collaborator

The bigger problem to me seems to be /finding/ these tags, though.
Of course one can manually search for them, but that seems overly tedious.

Ideally, it would be nice to have a script to find candidate tags. Maybe we could identify suspicious tags by comparing the tags of each document to the other translations and issue a warning when there's a discrepancy regarding the tags?

In the past, this would probably have lead to way too many warnings, but I guess the situation has improved, and especially with the new tag syntax it has become so much easier to write and translate tags correctly.

Maybe if such a script is written, we could even add it to the git commit hooks and warn authors/translators when they add/remove tags...

The bigger problem to me seems to be /finding/ these tags, though. Of course one can manually search for them, but that seems overly tedious. Ideally, it would be nice to have a script to find candidate tags. Maybe we could identify suspicious tags by comparing the tags of each document to the other translations and issue a warning when there's a discrepancy regarding the tags? In the past, this would probably have lead to way too many warnings, but I guess the situation has improved, and especially with the new tag syntax it has become so much easier to write and translate tags correctly. Maybe if such a script is written, we could even add it to the git commit hooks and warn authors/translators when they add/remove tags...
Collaborator

Ideally, it would be nice to have a script to find candidate tags. Maybe we could identify suspicious tags by comparing the tags of each document to the other translations and issue a warning when there's a discrepancy regarding the tags?

Actually I do think that this is an excellent idea.

In the past, this would probably have lead to way too many warnings, but I guess the situation has improved, and especially with the new tag syntax it has become so much easier to write and translate tags correctly.

I agree that for past news and events, this script would lead to many warnings, but actually I think that all of these warnings would be legitimate and should be fixed.

Let's face it: we have a mess, and we won't get rid of that mess without investing some work :-/

It's a lot of work, BUT: it doesn't need deep knowledge of the build or tagging system, so we might find some people to help with this. I'd be ready to cover my share of the cleanup.

Maybe if such a script is written, we could even add it to the git commit hooks and warn authors/translators when they add/remove tags...

Another excellent idea.

> Ideally, it would be nice to have a script to find candidate tags. Maybe we could identify suspicious tags by comparing the tags of each document to the other translations and issue a warning when there's a discrepancy regarding the tags? Actually I do think that this is an excellent idea. > In the past, this would probably have lead to way too many warnings, but I guess the situation has improved, and especially with the new tag syntax it has become so much easier to write and translate tags correctly. I agree that for past news and events, this script would lead to many warnings, but actually I think that all of these warnings would be legitimate and should be fixed. Let's face it: we have a mess, and we won't get rid of that mess without investing some work :-/ It's a lot of work, BUT: it doesn't need deep knowledge of the build or tagging system, so we might find some people to help with this. I'd be ready to cover my share of the cleanup. > Maybe if such a script is written, we could even add it to the git commit hooks and warn authors/translators when they add/remove tags... Another excellent idea.
Owner

Maybe if such a script is written, we could even add it to the git commit hooks and warn authors/translators when they add/remove tags...

Actually, we already have a warning in the pre-commit hook if someone introduces a completely new tag: https://git.fsfe.org/FSFE/fsfe-website/src/branch/master/tools/githooks/pre-commit#L84-L109

For everything else, I agree to Reinhard here

> Maybe if such a script is written, we could even add it to the git commit hooks and warn authors/translators when they add/remove tags... Actually, we already have a warning in the pre-commit hook if someone introduces a completely new tag: https://git.fsfe.org/FSFE/fsfe-website/src/branch/master/tools/githooks/pre-commit#L84-L109 For everything else, I agree to Reinhard here
Collaborator

I think the idea was that the pre-commit hook could warn if the tags in the translation differ from the tags in the original.

I think the idea was that the pre-commit hook could warn if the tags in the translation differ from the tags in the original.
Owner

It seems that we've fixed the wrong tags with the major cleanups in the recent months.

I will work on the pre-commit hook and see how we can avoid a mismatch.

It seems that we've fixed the wrong tags with the major cleanups in the recent months. I will work on the pre-commit hook and see how we can avoid a mismatch.
max.mehl self-assigned this 3 months ago
max.mehl closed this issue 3 months ago
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.