Skip to Content Skip to Menu Skip to Footer

Why does S&P Global operate crawlers?

Many of the information products and services of S&P Global rely on information published on Internet sites by governments, agencies, and companies. This data is used for:

  • Direct insertion/updates into our databases
  • Normalization to our internal standards before insertion/updates
  • Source material for downstream machine and/or human extraction
  • Source material for background research or models

What information does S&P Global collect?

S&P Global crawlers collect information commonly published on Internet websites.

For instance, coverage of company fundamentals requires S&P Global to access investor relations materials for the companies it covers. Similarly, forecasts of future energy supply rely, in part, on press releases and annual reports discussing new projects by market participants.

Identifying our crawlers

Crawling activity by S&P Global is identified by User-Agent strings in the following form:

  • SPGlobalWebBot/<Major>.<Minor> (+https://www.spglobal.com/bot; id=<RegistryID>)
  • Example: SPGlobalWebBot/1.0 (+https:www.//spglobal.com/bot; id=core10a)

Components

  • SPGlobalWebBot - The unified identifier for all S&P Global automated crawlers
  • <Major>.<Minor> - Crawler framework version following semantic versioning (e.g., 1.0)
  • https://www.spglobal.com/bot - URL linking to this documentation page
  • id=<RegistryID> - Unique internal identifier used by S&P Global to track its crawler instances (e.g., core10a)

Blocking S&P Global crawlers

S&P Global crawlers respect published robots.txt files. To block S&P Global from accessing any of your site content, add the following two lines to your robots.txt file at your site's root directory:

User-agent: SPGlobalWebBot
Disallow: /

Note that doing so may cause data about your organization to become stale in our products.