Nutch

From Seo Wiki - Search Engine Optimization and Programming Languages

Jump to: navigation, search
Lucene Nutch
File:Nutch-logo.gif
Developer(s) Apache Software Foundation</td></tr>
Stable release 1.0.0 / March 23, 2009; 154691594 ago</td></tr>
Written in Java</td></tr>
Operating system Cross-platform</td></tr>
Development status Active</td></tr>
Type Search Engine</td></tr>
License Apache License 2.0</td></tr>
Website http://lucene.apache.org/nutch/</td></tr>

</table> Nutch is an effort to build an open source search engine based on Lucene Java for the search and index component.

Contents

Features

It is coded completely in the Java programming language, but data is written in language-independent formats.

Nutch has a highly modular architecture allowing developers to create plugins for the following activities: media-type parsing, data retrieval, querying and clustering.

The fetcher ("robot" or "web crawler") has been written from scratch solely for this project.

History

Nutch originated with Doug Cutting (creator of both Lucene and Hadoop) and Mike Cafarella.

In June 2003, there was a successful 100 million page demo system. To meet the multimachine processing needs of the crawl and index tasks, the Nutch project has also implemented a MapReduce facility and a distributed file system. These two facilities have been spun out into their own subproject called Hadoop.

As of June 2005, Nutch has graduated from the Apache Incubator, and is now a subproject of Lucene.

Scalability

IBM Research studied the performance[1] of Nutch/Lucene as part of its Commercial Scale Out (CSO) project [2]. Their findings were that a scale-out system, such as Nutch/Lucene, could achieve a performance level on a cluster of blades that was not achievable on any scale-up computer such as the Power5.

Related projects

  • Hadoop - Java framework that supports distributed applications running on large clusters
  • nutchWAX - Uses Nutch to search a web archive
  • Sixearch - An unstructured peer network application, which provides a complementary way for users to actively and collaboratively share their own document collections.
  • SEO Tips - Collections of seo tips to promote your websites.

Search engines built with Nutch

References

  1. Scalability of the Nutch search engine
  2. Base Operating System Provisioning and Bringup for a Commercial Supercomputer
  3. http://creativecommons.org/press-releases/entry/5064

Bibliography

External links

ca:Nutch

de:Nutch es:Nutch fr:Nutch ko:너치 it:Nutch nl:Nutch

tr:Nutch
Personal tools

Served in 0.414 secs.