How do you control access of the web crawlers?

Started by pixhngeh, 12-29-2011, 02:30:43

Previous topic - Next topic

pixhngehTopic starter

Hello,
There are numerous reasons as to why or when you should control the access of the web robots or web crawlers to your site.  As much as you want Googlebot to come to you site, you don't want the spam bots to come and collect private information from your site.
How do you control access of the web crawlers?
  •  


Hogward

By the effective use of robots.txt you can control the access of the web crawlers. whenever the bot crawls the site it checks with robot.txt file to crawl the pages. You can make the bot to allow/disallow the site with it.
  •  


crowdfinch

You can restrict web crawlers from robots.txt file.

macjonshonm

As per my suggestion you can use robot.txt to not access any page of your site. Thank you.


newbielink:http://www.mbaupdates.com/index.aspx [nonactive] | newbielink:http://www.mbaupdates.com/index.aspx [nonactive]
  •  

cpaoutsourcing

By the successful use of programs.txt you can management the accessibility of the web programs. whenever the bot crawls the website it assessments with automatic robot.txt computer file to examine the websites. You can create the bot to allow/disallow the website with it.
newbielink:http://www.cpa-outsourcing.com/ [nonactive]
newbielink:http://www.cpa-outsourcing.com/ [nonactive]
  •  


Rattan11

There are numerous reasons as to why or when you should control the access of the web robots or web crawlers to your site.  As much as you want Googlebot to come to you site, you don't want the spam bots to come and collect private information from your site. Not to mention that when a robot crawls your site it uses the website's bandwidth too!

Why use 'robots.txt' file?
Gooble bot may be crawling your site to provide better search results but at the same time other spam bots may be collecting personal information such as email addresses for spamming purpose. If you want to control the access of the web crawlers on your site, you can do so by using the "robots.txt" file.

How do I create 'robots.txt' file?
'robots.txt' is a plain text file. Use any text editor to create the 'robots.txt' file.

Examples
The following will stop all robots from crawling your site ('*' means all and '/' is the root directory.)

User-agent: *
Disallow: /

The following will stop all robots from crawling the '/private' directory.

User-agent: *
Disallow: /private

ifixservices

As much as you want Googlebot to come to you site, you don't want the spam bots to come and collect private information from your site. Not to mention that when a robot crawls your site it uses the website's bandwidth too! In this post I have explained how you can control the access of the web robots to your site through the usage of a simple 'robots.txt' file.

f1dark

#7
As per my suggestion you can use robot.txt to not access any page of your site. Thank you.

Post Merge: 08-13-2022, 07:25:37


By the successful use of programs.txt you can management the accessibility of the web programs. whenever the bot crawls the website it assessments with automatic robot.txt computer file to examine the websites. You can create the bot to allow/disallow the website with


anilkh7058

By using Robots.txt you can control access of the web crawlers.
:)
  •  


anilkh7058

:)
  •