Static assets scraping

In the context of Zoning and Session Replays, Contentsquare fetches static assets on your website.

To allow Contentsquare fetching these assets, select one of the following options:

  1. Allowlist Contentsquare IP addresses
  2. Use a static header to validate requests
  3. Use a dynamic signature header to validate requests

Allow ports 80 (HTTP) and 443 (HTTPS) and allow these IP addresses to prevent your proxy, firewall, or server configuration from blocking the scraper:

  • 52.18.162.157 (APP & DT scraper)
  • 100.24.76.90 (microproxy-production.us-east-1.csq)
  • 34.192.98.148 (fallback microproxy-production.us-east-1.csq)

When selecting this option, Contentsquare adds a custom header to the project settings.

{
"headers": {
"my-new-header-key": "myKeyValue"
}
}

You can then validate that scrapper requests contains the specific header and value.

receivedHeaderExample = 'myKeyValue';
const CONTENTSQUARE_CUSTOM_HEADER = 'myKeyValue';
if (receivedHeaderExample === CONTENTSQUARE_CUSTOM_HEADER) {
//
}

Using a custom dynamic signature header

Section titled Using a custom dynamic signature header

When selecting this option, Contentsquare adds the X-CONTENTSQUARE-SIGNATURE header to incoming requests from the scrapper.

The X-CONTENTSQUARE-SIGNATURE header is a string generated in this format:

<TIMESTAMP>-base64(hmac('sha256', <SECRET>, <RESOURCE_DOMAIN>-<TIMESTAMP>))

with:

  • <TIMESTAMP>: the time at which the request was sent using Date.now(),
  • <RESOURCE_DOMAIN>: the complete domain hosting the resource on your website,
  • <SECRET>: the secret shared between you and Contentsquare for the project, generated at project creation.

With a secret of abcde, the Contentsquare scrapper service has emitted the request below on contentsquare.com on the 6th of August 2020 at 05:39 am, to fetch the official Contentsquare logo.

Terminal window
curl \
-H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8,fr;q=0.7' \
-H 'accept: application/json, text/plain, */*' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36' \
-H 'X-CONTENTSQUARE-SIGNATURE: 1596706743675-BxmHtG6vu4CfFlzpHxc0qYOmR0iMajlIvA2B4404qk4=' \
-X GET https://contentsquare.com/wp-content/themes/kps3-contentsquare/public/assets/images/contentsquare-logo.svg?tv=1.3.0

You can compute the signature and verify it against the value of the X-CONTENTSQUARE-SIGNATURE header by providing:

  • The timestamp from the incoming request (1596706743675),
  • The resource domain (contentsquare.com),
  • The secret provided by Contentsquare (abcde).
const crypto = require('crypto');
const secret = 'abcde'; // Given by someone from CS
receivedHeaderExample = '1596706743675-BxmHtG6vu4CfFlzpHxc0qYOmR0iMajlIvA2B4404qk4=';
// Extract the timestamp and digest from the received header
const [timestamp, receivedDigest] = receivedHeaderExample.split('-');
// Extra security step to make sure the timestamp is not too old
const currentTimestamp = Date.now();
if(currentTimestamp - timestamp > 5 * 60 * 1000) {
throw new Error("Validation failed. Timestamp signature is older than 5 minutes");
}
// Recreate the string that was used to generate the digest
const dataToSign = `${resourceDomain}-${timestamp}`; // e.g. 'contentsquare.com-1596706743675';
// Create a new digest using the same secret and algorithm
const hmac = crypto.createHmac('sha256', secret);
hmac.update(dataToSign);
const generatedDigest = hmac.digest('base64');
// Compare the newly generated digest with the one received in the header
if (receivedDigest === generatedDigest) {
console.log('Header is valid');
// Code to validate request
}