should I block robots

Started by abcdf, 10-16-2011, 19:29:22

Previous topic - Next topic

abcdfTopic starter

Hi

I keep updating my website regularly with some new changes but every time I work on the backend I block robots visits on my website to prevent wrong or incomplete indexing by them in their database.

My question is should I block robots while I am working on the scripts of my website or let them do their work while I am working on source?

Can anyone help me with the correct answer?

Thanks in advace!


leonardw

Yeah! you may able to allow or disallow robots on your site by using robots.txt files. These are very helpful when the site is on maintenance or any changes.






  •  


Hogward

#2
It can depend on the scale and nature of your website updates. If you're making large-scale changes to the structure of your site that might mislead web crawlers, temporarily blocking them can be a workable strategy. This is true particularly if you're performing changes like refactoring your URLs, implementing redirects, changing your site's architecture, or making considerable adjustments to your content. The goal is to prevent search engines (like Google) from indexing your site when it isn't in a state that accurately represents its function or content.

On the other hand, for minor updates or changes that don't drastically affect your website's overall structure or content, it might not be necessary to block web crawlers. During these updates, you would want to maintain your presence in search engines' indexes.

Take note, however, there are risks in blocking robots for an extended period:

The search engines will eventually honor your robots.txt rules and stop crawling and indexing your site, which can have an impact on your SEO visibility. You risk losing your website's ranking in search engine results and it might take some time to regain those rankings once the site is allowed to be indexed again.

It's also important to remember to unblock your site once you're done with your updates. Failing to remove the block could mean that search engines continue to ignore your site, which would have long-term consequences for your visibility.

As mentioned earlier, blocking or allowing robots while updating your website scripts largely depends on the kind of work you're doing. If the site changes could lead to confusion for search engine crawlers, or inaccurately portray your site, then it makes sense to use the robots.txt file to disallow crawling during that period.

Here's more detail about other factors you could consider:

Staging Environment: Do you work directly on your live site, or do you have a development or staging environment? If it's the latter, you can make all your changes in the staging environment (where crawlers are blocked), then upload the final, polished version to the live environment (where crawlers are allowed).

Server Load: Web crawlers can put load on your server, which may slow down your system. If your server can manage both your development work and the crawler load, you may not need to block crawlers. However, if the load affects the speed or performance of your development work, you might want to block them temporarily.

Partial Site Blocking: If you are updating only a subdirectory or section of your website, you could block robots for just that section, rather than the whole website.

Use of Meta Tags: Another approach to consider is using meta tags on individual pages. For example, if you are still working on a specific page, you could include a 'noindex' meta tag to tell search engines not to index that page yet.

Speed of Reindexing: Search engines differ in how quickly they revisit your site and notice changes in your robots.txt file or indexing status. While Googlebot is quite responsive, other crawlers may not be as quick.

Few more things to consider when determining whether or not to block robots while working on your website:

Scheduled Downtime: Minor updates may not require blocking robots unless your site will be down for a while. However, for major overhauls that require significant downtime, you may want to consider using a 503 HTTP status code, which tells search engines and other services that the site is temporarily down for maintenance. The 503 status should be in combination with a Retry-After header to indicate when you expect the site to be back up. Practice this only when necessary since long server downtime flagged with a 503 status can, over time, negatively impact your SEO if your site is continually going down.

Avoid partial content crawl: If you are working on updates that are rolling out slowly across your site, you might want to prevent search engines from indexing a part of it in a new state while the rest of your website is the way it previously was. In this scenario, it could make sense to block the pages under transition until the work is finished.

Combining Techniques: Lastly, you could consider combining multiple methods to manage robots during your site's development. For instance, you could use a meta tag such as "noindex,follow" to tell search engines not to index the page but still follow (and therefore give weight to) links on the page. This could be useful if your updates primarily impact the visual aspects of your site, and you still want crawlers to parse your links and other textual content.

Remember, the general rule of thumb when deciding on whether to block robots is based on how the changes you're implementing would affect the value of the crawled and indexed content. If your changes significantly distort the value, relevance, and context of your website content or structure, then it's pragmatic to disallow robots until the update is completed and wholly integrated.

In the end, SEO is also about ensuring a positive user experience, so it's best to avoid showing incomplete or improperly functioning pages in search results. However, if the updates are minor and not affecting the relevant site structure, blocking robots isn't necessary.
  •