Chapter 5: Salesforce Commerce Cloud Robots.txt

Discover how to configure a Salesforce Commerce Cloud robots.txt template for your site; read the Salesforce Commerce Cloud guide today!
June 2, 2022

Configuring the robots.txt


1. What is a Robots.txt?

A website uses a robots.txt file to tell search engines and bots where they are allowed to visit and what they should avoid when crawling a site. 

Additionally, a robots.txt file helps control the crawl budget of a site. A crawl budget is how many resources a crawler can allocate to a website before moving on. Because Commerce Cloud drives e-commerce sites, which have the potential to generate varying URLs, including those found in sorting and filtering pages, in addition to customer-sensitive pages, there is risk of burning through the crawl budget unnecessarily.

I won’t walk through the syntax because it can be thorough and there are great articles written about how to optimize a robots.txt file. See Ahrefs detailed guide on Robots.txt.

The pages you want to block may vary depending on the site. However for Commerce Cloud sites, there are a few directives that you may want to consider. Although, we’ll talk through some sample URLs at the end of the section, if you’re interested, I have a SFCC Robots.txt template found here.

Let’s get started on Understanding the portal.

2. Switching Between Instances

As we’re navigating our way through to the Robots.txt portal, it’s important to note that the robots.txt we’re currently viewing is solely targeting the Staging site. As a reminder, an environment can be isolated for SEO development work or be used for work to be pushed on the live site. In our case, we’re using the staging environment. See below,

The ”Instance Type” denotes which environment the robots.txt will be activated. That is to say, if we left the Instance Type pointing to “Staging”, then the robots.txt would be pushed to the below. 

https://stg.examle.com/robots.txt

That said, we have to push to the Production environment. Remember, changes on the Staging site replicate to the Production site. So we have to target the Production Instance through the drop-down. Follow steps below.

You can add a robots.txt to the staging site, as a means to test; however, when I’m working with a crawler, like Screaming Frog, I prefer to test in the crawler itself, as I can crawl as Googolebot using an optimized Robots.txt to better understand Google’s behavior. 

I also wouldn’t be worried about search engines and users accessing your site due to a custom robots.txt; there’s typically some password protection to get to Staging.

3. Configuring and Building a Custom Robots.txt

Now you should be able to work in a robots.txt file targeted for the Production environment, or a live site.

You have a couple of options to choose from. You can use a robots.txt that comes from SFCC or define a specific robots.txt. Let’s go with the latter.

We’ll want to select a “Custom robots.txt Definition” because each site is unique and there may be something specific we’d want to block or allow.

As called out earlier, you can review and copy my SFCC Robots.txt Template. Please keep in mind that it may look different based on how you build your Salesforce Commerce Cloud site.

What URLs are we typically going to want to block?

Because it is an e-commerce site, there’s the potential to inefficiently crawl. Commerce Cloud can generate parameter URLs based on filtering and sorting options which can be innumerable.

With that, I’ve typically provided the below disallows to filtering and sorting parameters.

Disallow: /*prefv*
Disallow: /*prefn*
Disallow: /*pmax
Disallow: /*pmin

A robots.txt is going to vary by site and I’ve heard of others allowing crawlers to access the above, and that’s fine too; it just really depends on the site and strategy.

Another area where I’ve disallowed crawlers are Pipeline pages. Not all but several. Pipeline pages can be used to relay customer information behind a login. As such, we don’t want to push out pages that are not accessible without a password.

Some popular SFCC Pipelines that I’ve blocked or disallowed, include the below,

Disallow: /Account-PasswordReset
Disallow: /Account-SetNewPassword
Disallow: /Cart-Show
Disallow: /Order-History

There are several more Pipelines you can disallow but it’s going to vary depending on which Pipelines you actually use on your site.

We’ll want to ensure that for an SEO project or a website migration a thorough robots.txt is built out.

See Salesforce Commerce Cloud documentation for configuring a Robots.txt here

4. Salesforce Commerce Cloud Robots.txt Template

To give you an idea of which directives to provide in a Salesforce Commerce Cloud robots.txt, I’ve provided the template below. Keep in mind that this will range based on how you’ve structured your site.

User-agent: *
Disallow: /Account-EditProfile
Disallow: /Account-PasswordReset
Disallow: /Account-SetNewPassword
Disallow: /Account-SetNewPasswordConfirm
Disallow: /Account-Show
Disallow: /Account-StartRegister
Disallow: /Address-List
Disallow: /CC_Login-Form
Disallow: /COBilling-Start
Disallow: /COCustomer-Start
Disallow: /COShipping-Start
Disallow: /COSummary-Start
Disallow: /COSummary-Submit
Disallow: /Cart-Show
Disallow: /Compare-Show
Disallow: /CustomerService-ContactUs
Disallow: /CustomerService-Show
Disallow: /Default-Offline
Disallow: /Default-Start
Disallow: /Error-Start
Disallow: /GiftCert-Edit
Disallow: /GiftCert-Purchase
Disallow: /GiftRegistry-Search
Disallow: /GiftRegistry-ShowRegistryByID
Disallow: /GiftRegistry-Start
Disallow: /GiftRegistryCustomer-Show
Disallow: /Home-ErrorNotFound
Disallow: /Order-History
Disallow: /PaymentInstruments-List
Disallow: /ReferAFriend-Start
Disallow: /Search-Show
Disallow: /Search-ShowContent
Disallow: /Stores-Find
Disallow: /Wishlist-Search
Disallow: /Wishlist-Show
Disallow: /Wishlist-ShowOther
Disallow: /*prefv*
Disallow: /*prefn*
Disallow: /*pmax
Disallow: /*pmin
Disallow: /PaymentInstruments-List

Sitemap: https://www.example.com/sitemap_index.xml

Get in Touch


Subscribe To #PragmaticSEO

Practical SEO advice. Personal SEO advice.

[Active] Blog Post - Side Bar Newsletter Signup

Reach out

[Active] Blog Post - Side Bar Contact Form

☕ Buy me a coffee

I hope you’ve been enjoying these posts.

If so, please considering making a donation to my ☕ coffee fund ☕.

Donations will cover the cost of web maintenance & the coffee needed for brainstorming and implementing new ideas 🤔.