Can we make sure that we actually check only one instance of this project, and therefore only one badge? And this for all kinds of source forges?
Git repos usually have multiple URLs, e.g.
```
https://github.com/fsfe/reuse-tool
https://github.com/fsfe/reuse-tool.git
git://github.com/fsfe/reuse-tool
git@github.com/fsfe/reuse-tool.git
```
Can we make sure that we actually check only one instance of this project, and therefore only one badge? And this for all kinds of source forges?
max.mehl
changed title from How to deal with duplicated repos to How to deal with duplicated repos?4 years ago
The rest I agree needs some fixing, possibly, probably. I would suggest to rewrite any request that comes in to the format "git://github.com/fsfe/reuse-tool", but I am not sure whether all platforms support that syntax. This is the kind of thing that, when implemented, can cost someone an hour of their time if it doesn't work and they can't figure out why, and it turns out their URLs are being rewritten.
> git@github.com/fsfe/reuse-tool.git
This syntax does not work.
The rest I agree needs some fixing, possibly, probably. I would suggest to rewrite any request that comes in to the format "git://github.com/fsfe/reuse-tool", but I am not sure whether all platforms support that syntax. This is the kind of thing that, when implemented, can cost someone an hour of their time if it doesn't work and they can't figure out why, and it turns out their URLs are being rewritten.
How about a dropdown of supported schemes, e.g. only http, https, and git so people know what they put into? The rest of the URL is probably always the same (except the .git suffix) and could be checked for duplicates in the backend.
How about a dropdown of supported schemes, e.g. only http, https, and git so people know what they put into? The rest of the URL is probably always the same (except the .git suffix) and could be checked for duplicates in the backend.
max.mehl
added this to the 0.1 milestone 4 years ago
max.mehl
changed title from How to deal with duplicated repos? to Deal with duplicated repos4 years ago
Are we actually sure we want to forbid re-registering with a different scheme? What if, for example, somebody registers http://git.acme.com/foo/bar and later decides to completely switch the server from http to https?
What's the damage for us when multiple URLs are registered, when only one of them will actually be queried?
Are we actually sure we want to forbid re-registering with a different scheme? What if, for example, somebody registers http://git.acme.com/foo/bar and later decides to completely switch the server from http to https?
What's the damage for us when multiple URLs are registered, when only one of them will actually be queried?
I just had another idea: we could store just the URL without the scheme, and when it comes to checking, we try "git", "https" and "http" (in a fixed, TBD order) and take the first one that works. This would even implicitly solve the issue of repositories changing the supported access scheme.
Maybe that costs us a few seconds when linting the repositories not supporting our first choice, but that runs in an asynchronous queue anyway.
I just had another idea: we could store just the URL without the scheme, and when it comes to checking, we try "git", "https" and "http" (in a fixed, TBD order) and take the first one that works. This would even implicitly solve the issue of repositories changing the supported access scheme.
Maybe that costs us a few seconds when linting the repositories not supporting our first choice, but that runs in an asynchronous queue anyway.
What’s the damage for us when multiple URLs are registered, when only one of them will actually be queried?
Resources, I am afraid. I would rather prefer linting the Linux kernel just once per commit (at least the primary repo)...
I just had another idea: we could store just the URL without the scheme, and when it comes to checking, we try “git”, “https” and “http” (in a fixed, TBD order) and take the first one that works. This would even implicitly solve the issue of repositories changing the supported access scheme.
Yes, that could be a viable solution!
> What’s the damage for us when multiple URLs are registered, when only one of them will actually be queried?
Resources, I am afraid. I would rather prefer linting the Linux kernel just once per commit (at least the primary repo)...
> I just had another idea: we could store just the URL without the scheme, and when it comes to checking, we try “git”, “https” and “http” (in a fixed, TBD order) and take the first one that works. This would even implicitly solve the issue of repositories changing the supported access scheme.
Yes, that could be a viable solution!
@carmenbianca what do you think about the proposal to just try git, https, and http and take the first that works? What do you think would be the best order to try?
@carmenbianca what do you think about the proposal to just try git, https, and http and take the first that works? What do you think would be the best order to try?
what do you think about the proposal to just try git, https, and http and take the first that works? What do you think would be the best order to try?
@reinhard This seems to work for me, in that order. There is probably some weird server out there that behaves differently based on protocol, but it's probably fine. The order git -> https -> http seems fine.
> what do you think about the proposal to just try git, https, and http and take the first that works? What do you think would be the best order to try?
@reinhard This seems to work for me, in that order. There is probably some weird server out there that behaves differently based on protocol, but it's probably fine. The order `git -> https -> http` seems fine.
@carmenbianca instead of opening 3 ssh connections to the reuse lint server for the 3 tries, would it be smarter to improve the reuse-lint-repo script and make it accept the URL without protocol and try all 3 variants within a single run of the script?
@carmenbianca instead of opening 3 ssh connections to the reuse lint server for the 3 tries, would it be smarter to improve the reuse-lint-repo script and make it accept the URL without protocol and try all 3 variants within a single run of the script?
@carmenbianca please forget the above question. The API does a git ls-remote on the repository and can remember which of the protocols worked before starting the remote lint.
@carmenbianca please forget the above question. The API does a `git ls-remote` on the repository and can remember which of the protocols worked before starting the remote lint.
Git repos usually have multiple URLs, e.g.
Can we make sure that we actually check only one instance of this project, and therefore only one badge? And this for all kinds of source forges?
How to deal with duplicated reposto How to deal with duplicated repos? 4 years agoThis syntax does not work.
The rest I agree needs some fixing, possibly, probably. I would suggest to rewrite any request that comes in to the format "git://github.com/fsfe/reuse-tool", but I am not sure whether all platforms support that syntax. This is the kind of thing that, when implemented, can cost someone an hour of their time if it doesn't work and they can't figure out why, and it turns out their URLs are being rewritten.
How about a dropdown of supported schemes, e.g. only http, https, and git so people know what they put into? The rest of the URL is probably always the same (except the .git suffix) and could be checked for duplicates in the backend.
How to deal with duplicated repos?to Deal with duplicated repos 4 years agoAre we actually sure we want to forbid re-registering with a different scheme? What if, for example, somebody registers http://git.acme.com/foo/bar and later decides to completely switch the server from http to https?
What's the damage for us when multiple URLs are registered, when only one of them will actually be queried?
I just had another idea: we could store just the URL without the scheme, and when it comes to checking, we try "git", "https" and "http" (in a fixed, TBD order) and take the first one that works. This would even implicitly solve the issue of repositories changing the supported access scheme.
Maybe that costs us a few seconds when linting the repositories not supporting our first choice, but that runs in an asynchronous queue anyway.
Resources, I am afraid. I would rather prefer linting the Linux kernel just once per commit (at least the primary repo)...
Yes, that could be a viable solution!
@carmenbianca what do you think about the proposal to just try git, https, and http and take the first that works? What do you think would be the best order to try?
@reinhard This seems to work for me, in that order. There is probably some weird server out there that behaves differently based on protocol, but it's probably fine. The order
git -> https -> http
seems fine.@carmenbianca instead of opening 3 ssh connections to the reuse lint server for the 3 tries, would it be smarter to improve the reuse-lint-repo script and make it accept the URL without protocol and try all 3 variants within a single run of the script?
@carmenbianca please forget the above question. The API does a
git ls-remote
on the repository and can remember which of the protocols worked before starting the remote lint.