Create an automatic sitemap #377

Open
opened 2018-06-27 08:03:23 +00:00 by max.mehl · 0 comments
Owner

For SEO it could be beneficial to create a live sitemap of the pages on fsfe.org. The only downside I could think of is that some pages are then accessible which we don't want to have indexed yet, e.g. unreleased news or pages which are just archived.

The principle would be rather easy:

  1. Find all en XHTML files and make some exceptions: find . -type f -iname "*\.en\.xhtml" | grep -v '^./internal\|^./news\|^./error' | sort. This could be extended by excluding news and newsletter which are not released yet according to their releasedate. Also, we could check for files which do not exist in English but only another language.
  2. Find translations of all files these files.
  3. Get last modification date, e.g. by git log --pretty="%cd" --date=short -1 $file
  4. Create sitemap.xml using the sitemap protocol. This should also rename xhtml to html For translations, this could look like:
<url>
    <loc>https://fsfe.org/work.en.html</loc>
    <lastmod>2018-01-01</lastmod>
    <xhtml:link
                rel="alternate"
                hreflang="nl"
                href="https://fsfe.org/work.nl.html"
                />
</url>

I'm not sure how to get the lastmod attribute in the alternate link, or whether language-specific sitemaps would make more sense.

If someone would have time to create such a script, I can try to include it in the actual build system.

For SEO it could be beneficial to create a live sitemap of the pages on fsfe.org. The only downside I could think of is that some pages are then accessible which we don't want to have indexed yet, e.g. unreleased news or pages which are just archived. The principle would be rather easy: 1. Find all en XHTML files and make some exceptions: `find . -type f -iname "*\.en\.xhtml" | grep -v '^./internal\|^./news\|^./error' | sort`. This could be extended by excluding news and newsletter which are not released yet according to their releasedate. Also, we could check for files which do not exist in English but only another language. 2. Find translations of all files these files. 3. Get last modification date, e.g. by `git log --pretty="%cd" --date=short -1 $file` 4. Create sitemap.xml using the [sitemap protocol](https://www.sitemaps.org/protocol.html). This should also rename `xhtml` to `html` For translations, this could look like: ``` <url> <loc>https://fsfe.org/work.en.html</loc> <lastmod>2018-01-01</lastmod> <xhtml:link rel="alternate" hreflang="nl" href="https://fsfe.org/work.nl.html" /> </url> ``` I'm not sure how to get the lastmod attribute in the alternate link, or whether language-specific sitemaps would make more sense. If someone would have time to create such a script, I can try to include it in the actual build system.
max.mehl added the
feature-request
build
labels 2018-06-27 08:03:23 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: FSFE/fsfe-website#377
No description provided.