Check for broken links #1478
Labels
No Label
bug
build
cgi Scripting
design
disruptive
documentation
duplicate
easy
feature-request
help wanted
javascript
priority/low
question
system-hackers
tagging
text
translations
wait/bugfix
wait/inprogress
wait/misc
wait/proofread
wontfix
xsl
No Milestone
No Assignees
6 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: FSFE/fsfe-website#1478
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I found this broken link checker npm program. It seems to be very capable and usable for checking broken links.
I ran it:
It provides output like:
The 429 error on the
my.fsfe.org
link is because of the high number of requests. With this tool you'r basically doing a DOS attack 😛I intend to run it against the entire website and find mistakes that can be solved (like the comma (
,
) in the above pdf link).Perhaps this is something worth noting for future use? On the wiki perhaps?
It seems I hit a rate limit. Either I have to build/download a local website copy, or write a custom Javascript script to use the rate limiting option.
That's interesting indeed!
Doing it locally holds the benefit of faster requests and sparing our webserver. However, you should make sure to respect the rewrites in the .htaccess file as they redirect lot of deleted/moved files and their links.
An initial dump of broken links. The broken link checker didn't work reliably for me, so often times it would crash (related bug report). I couldn't get the new version to work, perhaps I need to update my nodejs version. So at the moment it is a very iterative process. Run it, watch it crash, find the link on which it crashed, add that domain to
/etc/hosts
to127.0.0.1
and try again.This intermediate reporting should give an idea of the current situation:
ttps://
orhhttp://
I'm not sure how and if we should follow up. Perhaps I should try to restrict it to the more static pages, ignoring news and events.
I think that's reasonable. I remember that we had discussions about broken links in the past and decided not to change news entries or events. Mainly because if the event is over the page is preserved for historical purpose only.
For links that have actually succumbed to bitrot (unlike https://bcnfs.org/ and https://openinventionnetwork.com/), I would recommend marking them as broken on the relevant page (new class
broken
or whatever and modern CSS features such as:after
andcontent
to convey the information). When possible, I would also include an alternative link to a contemporaneous archived version of the page on the Internet Archive's Wayback Machine.Where the content still exists, typos in the link should be fixed to point to the intended URL.
I scrapped my local installation but found 29k 'broken' links there. If I try to verify them most of them are working fine on https://fsfe.org
One of the actual errors is:
https://fsfe.org/about/people/interviews/bernhard.en.html
@max.mehl do you know a quick git search command to see if the file 'bernhard.en.html' was previously available in git and got deleted?
Yes, there is this neat command:
Apparently, it never existed.
Hi all! I wrote an email about this to the team@ mailing list a while ago, and @nico.rikken was kind enough to point me to this issue. I have been using lychee, which I believe to be more modern, complete and performant tool to check for broken links.
I don’t know how to tackle @repentinus’ proposal of marking broken links with CSS and automatically provide a fallback to a saved snapshot on the Wayback Machine.
Nevertheless, in the coming works I will slowly go through the broken links and try to fix them.
As my internship is ending, I will gladly do this as a volunteer.
Regarding the CSS, I now looked into it. It should be possible to change links using URLs, but it seems it cannot change links. https://developer.mozilla.org/en-US/docs/Web/CSS/content We could use this feature to add a suffix to the link text:
Would result in: original link text (broken)
We might then need translation of 'broken' too.
It would be helpful to come up with a generic method of dealing with broken links though. The ones that cannot be restored.
We could create a a
<broken>
class that links to both the original URL and offers a(archive.org)
suffix that links tohttps://web.archive.org/web/http://www.original-url.org/
so people have both URLs available if needed.Something like:
Code:
<a href="https://fsfe.org/">original link text</a>
(<a href="https://web.archive.org/web/https://fsfe.org/">archive.org</a>
)Preview: original link text (archive.org)
Or maybe it doesn't make sense to keep the old URL around and we might as well replace these links to archive.org or mark them broken.
Thinking about priorities:
I don't think that we can get the broken translated, if it comes from the CSS file. A manually included link to an archive could have the broken translated though.