Sitemap.xml is a special file in websites which helps search engine crawlers to index the site.
It provides two key functions;
- It gives crawlers a nice ToC of all of the pages on your site, so that it doesn't have to slowly discover them by navigating page-by-page through links.
- It helps focus a crawler's attention on what has changed, using the last-modified date.
Here's an example sitemap.xml, for this site.
Noteable Webflow Points
What Webflow includes;
- All static pages
- All individual Collection Pages
What Webflow does not include;
- Pages with Collection List pagination
Although these are legitimate discrete links, they are low value compared to the collection pages themselves. As a complication, the number of possible URLs multiplies out fast due to the way Collection List pagination works,
Webflow uses the default location of /sitemap.xml
Webflow does not include the last-modified date, possibly due to some complications with updating CMS items. However this is probably not needed, see notes below.
The Hostname Bug
No matter how many domain names you have on your site, Webflow will only generate the sitemap.xml with ONE of them.
If you have a default domain set, and republish your site, Webflow should always use that default domain as the one displayed in the sitemap.xml.
For example, on Sygnal's site, I have the www.sygnal.com domain specified as the default, and that results in a sitemap xml containing;
<loc>https://www.sygnal.com/</loc>
However if you DO NOT have a default domain name specified, the sitemap.xml gets a fairly random domain name in it.
For example, if I have domain names sygnal1.com and sygnal2.com, and I have not set any default name in Webflow, then Webflow will unpredictably choose one for the sitemap.xml- and then it will deliver that same sitemap.xml for both sites.
That means Google might request https://sygnal1.com, but the sitemap contains URLs pointing to https://sygnal2.com.
To Google, that's an invalid sitemap, since it's not pointing to the current site.
FAQs
There are some pages I don't want Google to index, how do I remove them from my sitemap.xml?
There is no way to exclude a page from Webflow's automatic sitemap.xml.
The only option Webflow offers is to replace your entire sitemap with a literal, custom-edited one, copy-paste style.
However there are almost zero situations where that's actually useful.
Sitemaps only have an "assist" role; they don't determine what should be indexed, or indicate what you want to appear in search results.
If you're trying to exclude a page, use META noindex instead.
References
Last-modified
Definition of last-modified
Notes about Google's use of the last-modified date are highly conflicted as to whether it's beneficial to ranking.
https://www.seroundtable.com/google-last-modified-date-xml-sitemaps-30026.html
https://webmasters.stackexchange.com/questions/117090/when-should-i-update-lastmod-value-in-the-sitemap
https://stackoverflow.com/questions/31349345/how-to-properly-format-last-modified-lastmod-time-for-xml-sitemaps?stw=2
Sitemap validation notes
https://support.google.com/webmasters/thread/148829982?hl=en&msgid=149001137