Custom URL and sitemap scanning

  • Updated

Overview

Osano's CMP currently provides two scanning features: URL scanning and sitemap scanning. The URL scanning feature periodically (currently monthly, subject to change in future releases) visits a customer-entered URL and scans the cookies, scripts, and iframes implemented by the page found at that URL. This is used to give a more complete picture of the trackers in use at that location.

Some customers have many web application URLs to scan and entering this data, as well as keeping it current, can be difficult. To assist with this, Osano provides an optional sitemap scanning feature. The customer can enter the URL of a standard sitemap XML file (often named sitemap.xml) with references to their various web properties. The sitemap scanning process will periodically (currently monthly, subject to change in future releases) visit the entered sitemap URLs and automatically adjust the stored roster of scan URLS to match those listed in the sitemap document.

The URL Scan Process

The URL scanning operation runs in a containerized (docker) process that makes use of headless browser technology (puppeteer) to scan the customer's specified website. The process makes note of any unmanaged cookies, scripts, and iframes that are found and reports these as discoveries, exactly the same as the osano.js script does within the end-user's browser. Unlike the osano.js JavaScript code, the headless browser has full access to all cookies that are being stored by the web application, even so-called "server" or "http only" cookies. These cookies are off limits to JavaScript executing in the browser and will not be sent back as new discoveries in your Consent Manager configuration.

The diagram below describes, at a high level, the URL scanning process.

URL Scan Process