How do you control access of the web crawlers?

Author Topic: How do you control access of the web crawlers?  (Read 5588 times)

Offline pixhngehTopic starter

  • Trade Count: (0)
  • Semi-Newbie
  • *
  • Thank You 0
  • Posts: 24
  • Karma: 1
How do you control access of the web crawlers?
« on: 12-29-2011, 02:30:43 »
Hello,
There are numerous reasons as to why or when you should control the access of the web robots or web crawlers to your site.  As much as you want Googlebot to come to you site, you don’t want the spam bots to come and collect private information from your site.
How do you control access of the web crawlers?


Offline Hogward

  • Trade Count: (0)
  • Sr. Member
  • ****
  • Thank You 20
  • Posts: 420
  • Karma: 5
Re: How do you control access of the web crawlers?
« Reply #1 on: 12-29-2011, 06:29:55 »
By the effective use of robots.txt you can control the access of the web crawlers. whenever the bot crawls the site it checks with robot.txt file to crawl the pages. You can make the bot to allow/disallow the site with it.

Offline crowdfinch

  • Trade Count: (0)
  • Semi-Newbie
  • *
  • Thank You 3
  • Posts: 40
  • Karma: 1
  • Gender: Female
    • CrowdFinch Technologies
Re: How do you control access of the web crawlers?
« Reply #2 on: 12-31-2011, 01:13:55 »
You can restrict web crawlers from robots.txt file.

Offline macjonshonm

  • Trade Count: (0)
  • Novice
  • *
  • Thank You 2
  • Posts: 0
  • Karma: 0
Re: How do you control access of the web crawlers?
« Reply #3 on: 01-12-2012, 02:58:07 »
As per my suggestion you can use robot.txt to not access any page of your site. Thank you.


newbielink:http://www.mbaupdates.com/index.aspx [nonactive] | newbielink:http://www.mbaupdates.com/index.aspx [nonactive]

Offline cpaoutsourcing

  • Trade Count: (0)
  • Newbie
  • *
  • Thank You 4
  • Posts: 7
  • Karma: 1
Re: How do you control access of the web crawlers?
« Reply #4 on: 01-19-2012, 22:54:39 »
By the successful use of programs.txt you can management the accessibility of the web programs. whenever the bot crawls the website it assessments with automatic robot.txt computer file to examine the websites. You can create the bot to allow/disallow the website with it.
newbielink:http://www.cpa-outsourcing.com/ [nonactive]
newbielink:http://www.cpa-outsourcing.com/ [nonactive]


Offline Rattan11

  • Trade Count: (0)
  • Jr. Member
  • **
  • Thank You 9
  • Posts: 96
  • Karma: 0
  • Gender: Male
    • GOVERNMENT JOBS IN INDIA
Re: How do you control access of the web crawlers?
« Reply #5 on: 03-03-2019, 03:44:50 »
There are numerous reasons as to why or when you should control the access of the web robots or web crawlers to your site.  As much as you want Googlebot to come to you site, you don’t want the spam bots to come and collect private information from your site. Not to mention that when a robot crawls your site it uses the website’s bandwidth too!

Why use ‘robots.txt’ file?
Gooble bot may be crawling your site to provide better search results but at the same time other spam bots may be collecting personal information such as email addresses for spamming purpose. If you want to control the access of the web crawlers on your site, you can do so by using the “robots.txt” file.

How do I create ‘robots.txt’ file?
‘robots.txt’ is a plain text file. Use any text editor to create the ‘robots.txt’ file.

Examples
The following will stop all robots from crawling your site (‘*’ means all and ‘/’ is the root directory.)

User-agent: *
Disallow: /

The following will stop all robots from crawling the ‘/private’ directory.

User-agent: *
Disallow: /private

Offline lishmaliny

  • Trade Count: (0)
  • Jr. Member
  • **
  • Thank You 1
  • Posts: 85
  • Karma: 0
Re: How do you control access of the web crawlers?
« Reply #6 on: 10-18-2019, 04:47:50 »
As much as you want Googlebot to come to you site, you don’t want the spam bots to come and collect private information from your site. Not to mention that when a robot crawls your site it uses the website’s bandwidth too! In this post I have explained how you can control the access of the web robots to your site through the usage of a simple ‘robots.txt’ file.

 

Related Topics

  Subject / Started by Replies Last post
8 Replies
3351 Views
Last post 12-07-2018, 21:59:25
by DeveloperOnRent
1 Replies
3797 Views
Last post 07-28-2014, 02:03:44
by Siservices
3 Replies
3448 Views
Last post 02-28-2015, 02:42:43
by jannatul18
4 Replies
3219 Views
Last post 09-20-2018, 22:24:07
by MVMInfotech18
3 Replies
7044 Views
Last post 03-02-2019, 00:02:59
by harrywood