Chapter 5: Salesforce Commerce Cloud Robots.txt

Discover how to configure a Salesforce Commerce Cloud robots.txt template for your site; read the Salesforce Commerce Cloud guide today!
October 22, 2024
4 common issues with Redirects on SFCC

Struggling with Redirects in SFCC?

Get 3 expert tips to optimize your SEO and safeguard your traffic.

Table of Contents

1. What is a Robots.txt?

A website uses a robots.txt file to tell search engines and bots where they are allowed to visit and what they should avoid when crawling a site. 

Additionally, a robots.txt file helps control the crawl budget of a site. A crawl budget is how many resources a crawler can allocate to a website before moving on. Because Commerce Cloud drives e-commerce sites, which have the potential to generate varying URLs, including those found in sorting and filtering pages, in addition to customer-sensitive pages, there is risk of burning through the crawl budget unnecessarily.

I won’t walk through the syntax because it can be thorough and there are great articles written about how to optimize a robots.txt file. See Ahrefs detailed guide on Robots.txt.

The pages you want to block may vary depending on the site. However for Commerce Cloud sites, there are a few directives that you may want to consider. Although, we’ll talk through some sample URLs at the end of the section, if you’re interested, I have a SFCC Robots.txt template found here.

Let’s get started on Understanding the portal.

Access my tools guide…

    We respect your privacy. Unsubscribe at anytime.

    2. Switching Between Instances

    As we’re navigating our way through to the Robots.txt portal, it’s important to note that the robots.txt we’re currently viewing is solely targeting the Staging site. As a reminder, an environment can be isolated for SEO development work or be used for work to be pushed on the live site. In our case, we’re using the staging environment. See below,

    The ”Instance Type” denotes which environment the robots.txt will be activated. That is to say, if we left the Instance Type pointing to “Staging”, then the robots.txt would be pushed to the below. 

    https://stg.examle.com/robots.txt

    That said, we have to push to the Production environment. Remember, changes on the Staging site replicate to the Production site. So we have to target the Production Instance through the drop-down. Follow steps below.

    You can add a robots.txt to the staging site, as a means to test; however, when I’m working with a crawler, like Screaming Frog, I prefer to test in the crawler itself, as I can crawl as Googolebot using an optimized Robots.txt to better understand Google’s behavior. 

    I also wouldn’t be worried about search engines and users accessing your site due to a custom robots.txt; there’s typically some password protection to get to Staging.

    3. Configuring and Building a Custom Robots.txt

    Now you should be able to work in a robots.txt file targeted for the Production environment, or a live site.

    You have a couple of options to choose from. You can use a robots.txt that comes from SFCC or define a specific robots.txt. Let’s go with the latter.

    We’ll want to select a “Custom robots.txt Definition” because each site is unique and there may be something specific we’d want to block or allow.

    As called out earlier, you can review and copy my SFCC Robots.txt Template. Please keep in mind that it may look different based on how you build your Salesforce Commerce Cloud site.

    What URLs are we typically going to want to block?

    Because it is an e-commerce site, there’s the potential to inefficiently crawl. Commerce Cloud can generate parameter URLs based on filtering and sorting options which can be innumerable.

    With that, I’ve typically provided the below disallows to filtering and sorting parameters.

    Disallow: /*prefv*
    Disallow: /*prefn*
    Disallow: /*pmax
    Disallow: /*pmin
    

    A robots.txt is going to vary by site and I’ve heard of others allowing crawlers to access the above, and that’s fine too; it just really depends on the site and strategy.

    Another area where I’ve disallowed crawlers are Pipeline pages. Not all but several. Pipeline pages can be used to relay customer information behind a login. As such, we don’t want to push out pages that are not accessible without a password.

    Some popular SFCC Pipelines that I’ve blocked or disallowed, include the below,

    Disallow: /Account-PasswordReset
    Disallow: /Account-SetNewPassword
    Disallow: /Cart-Show
    Disallow: /Order-History
    

    There are several more Pipelines you can disallow but it’s going to vary depending on which Pipelines you actually use on your site.

    We’ll want to ensure that for an SEO project or a website migration a thorough robots.txt is built out.

    See Salesforce Commerce Cloud documentation for configuring a Robots.txt here

    4. Salesforce Commerce Cloud Robots.txt Template

    To give you an idea of which directives to provide in a Salesforce Commerce Cloud robots.txt, I’ve provided the template below. Keep in mind that this will range based on how you’ve structured your site.

    User-agent: *
    Disallow: /Account-EditProfile
    Disallow: /Account-PasswordReset
    Disallow: /Account-SetNewPassword
    Disallow: /Account-SetNewPasswordConfirm
    Disallow: /Account-Show
    Disallow: /Account-StartRegister
    Disallow: /Address-List
    Disallow: /CC_Login-Form
    Disallow: /COBilling-Start
    Disallow: /COCustomer-Start
    Disallow: /COShipping-Start
    Disallow: /COSummary-Start
    Disallow: /COSummary-Submit
    Disallow: /Cart-Show
    Disallow: /Compare-Show
    Disallow: /CustomerService-ContactUs
    Disallow: /CustomerService-Show
    Disallow: /Default-Offline
    Disallow: /Default-Start
    Disallow: /Error-Start
    Disallow: /GiftCert-Edit
    Disallow: /GiftCert-Purchase
    Disallow: /GiftRegistry-Search
    Disallow: /GiftRegistry-ShowRegistryByID
    Disallow: /GiftRegistry-Start
    Disallow: /GiftRegistryCustomer-Show
    Disallow: /Home-ErrorNotFound
    Disallow: /Order-History
    Disallow: /PaymentInstruments-List
    Disallow: /ReferAFriend-Start
    Disallow: /Search-Show
    Disallow: /Search-ShowContent
    Disallow: /Stores-Find
    Disallow: /Wishlist-Search
    Disallow: /Wishlist-Show
    Disallow: /Wishlist-ShowOther
    Disallow: /*prefv*
    Disallow: /*prefn*
    Disallow: /*pmax
    Disallow: /*pmin
    Disallow: /PaymentInstruments-List
    
    Sitemap: https://www.example.com/sitemap_index.xml

    Get in Touch

    Need Practical SEO Advice?

    Get practical SEO advice, improve communication, and focus on initiatives that drive organic growth.