From Seo Wiki - Search Engine Optimization and Programming Languages
|Developer(s)||Apache Software Foundation</td></tr>|
|Stable release||1.0.0 / March 23, 2009</td></tr>|
|License||Apache License 2.0</td></tr>|
It is coded completely in the Java programming language, but data is written in language-independent formats.
Nutch has a highly modular architecture allowing developers to create plugins for the following activities: media-type parsing, data retrieval, querying and clustering.
The fetcher ("robot" or "web crawler") has been written from scratch solely for this project.
In June 2003, there was a successful 100 million page demo system. To meet the multimachine processing needs of the crawl and index tasks, the Nutch project has also implemented a MapReduce facility and a distributed file system. These two facilities have been spun out into their own subproject called Hadoop.
IBM Research studied the performance of Nutch/Lucene as part of its Commercial Scale Out (CSO) project . Their findings were that a scale-out system, such as Nutch/Lucene, could achieve a performance level on a cluster of blades that was not achievable on any scale-up computer such as the Power5.
Search engines built with Nutch