Webflow site search is based on Elasticsearch, which is an open source framework for building search engines.
Here are some of the configuration details shared by Webflow Support;
This article, while at times a bit technical/developer-oriented, provides a lot more detail if you’re eager to read more: A Practical Guide on Elasticsearch Scoring and Relevancy | Qbox HES
How does indexing work?
Many search platforms index HTML externally, and use markup in the HTML to determine which pages, and which parts of the page should be indexed.
Elasticsearch itself does not crawl websites and index HTML pages like a search engine such as Google does. Instead, data must be sent to Elasticsearch in a format it can understand (usually JSON) for it to index and search the data.
Therefore, if you want to exclude certain pages or parts of pages from being indexed by Elasticsearch, you would need to do this at the point where you are collecting and formatting the data for Elasticsearch.
My guess is that this is probably done by Webflow's publishing engine. It likely publishes the visible HTML, CSS and JS content, and at the same time publishes a JSON document for Elasticsearch to index later. I suspect this in part because when you mark an element as "do not include in search" there are no visible custom attributes or markup apparent in the HTML.
Resources
https://www.elastic.co/
https://www.elastic.co/guide/index.html
https://www.elastic.co/guide/en/workplace-search/8.8/workplace-search-customizing-indexing-rules.html
Webflow Site search
https://university.webflow.com/lesson/site-search#indexing-and-controlling-what-is-%E2%80%9Csearchable%E2%80%9D
Democratizing search technology — by bringing it to Webflow
https://webflow.com/blog/democratizing-search-technology-by-bringing-it-to-webflow