#377 Create an automatic sitemap

Open
opened 1 year ago by max.mehl · 0 comments
max.mehl commented 1 year ago

For SEO it could be beneficial to create a live sitemap of the pages on fsfe.org. The only downside I could think of is that some pages are then accessible which we don’t want to have indexed yet, e.g. unreleased news or pages which are just archived.

The principle would be rather easy:

  1. Find all en XHTML files and make some exceptions: find . -type f -iname "*\.en\.xhtml" | grep -v '^./internal\|^./news\|^./error' | sort. This could be extended by excluding news and newsletter which are not released yet according to their releasedate. Also, we could check for files which do not exist in English but only another language.
  2. Find translations of all files these files.
  3. Get last modification date, e.g. by git log --pretty="%cd" --date=short -1 $file
  4. Create sitemap.xml using the sitemap protocol. This should also rename xhtml to html For translations, this could look like:
<url>
    <loc>https://fsfe.org/work.en.html</loc>
    <lastmod>2018-01-01</lastmod>
    <xhtml:link
                rel="alternate"
                hreflang="nl"
                href="https://fsfe.org/work.nl.html"
                />
</url>

I’m not sure how to get the lastmod attribute in the alternate link, or whether language-specific sitemaps would make more sense.

If someone would have time to create such a script, I can try to include it in the actual build system.

For SEO it could be beneficial to create a live sitemap of the pages on fsfe.org. The only downside I could think of is that some pages are then accessible which we don't want to have indexed yet, e.g. unreleased news or pages which are just archived. The principle would be rather easy: 1. Find all en XHTML files and make some exceptions: `find . -type f -iname "*\.en\.xhtml" | grep -v '^./internal\|^./news\|^./error' | sort`. This could be extended by excluding news and newsletter which are not released yet according to their releasedate. Also, we could check for files which do not exist in English but only another language. 2. Find translations of all files these files. 3. Get last modification date, e.g. by `git log --pretty="%cd" --date=short -1 $file` 4. Create sitemap.xml using the [sitemap protocol](https://www.sitemaps.org/protocol.html). This should also rename `xhtml` to `html` For translations, this could look like: ``` <url> <loc>https://fsfe.org/work.en.html</loc> <lastmod>2018-01-01</lastmod> <xhtml:link rel="alternate" hreflang="nl" href="https://fsfe.org/work.nl.html" /> </url> ``` I'm not sure how to get the lastmod attribute in the alternate link, or whether language-specific sitemaps would make more sense. If someone would have time to create such a script, I can try to include it in the actual build system.
max.mehl added the
feature-request
label 1 year ago
max.mehl added the
build
label 1 year ago
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
Cancel
Save
There is no content yet.