From Seo Wiki - Search Engine Optimization and Programming Languages
|To comply with Wikipedia's guidelines, the introduction of this article may need to be rewritten. Please discuss this issue on the talk page and read the layout guide to make sure the section will be inclusive of all essential details. (September 2009)|
Google requires large computational resources in order to provide their service. This article describes the technological infrastructure behind Google's websites, as presented in the company's public announcements.
When an attempt to connect to Google is made, DNS servers resolve www.google.com to multiple IP addresses, which acts as a first level of load balancing by directing clients to different Google clusters. (When a domain name resolves to multiple IP addresses, typical implementation of clients is to use the first IP address for communication; the order of IP addresses provided by DNS servers for a domain name is typically done using Round Robin policy.) Each Google cluster has thousands of servers, and upon connection to a cluster further load balancing is performed by hardware in the cluster, in order to send the queries to the least loaded web server. This makes Google one of the biggest and most complex content delivery networks.
Racks are custom-made and contain 40 to 80 servers (20 to 40 1U servers on either side), while new servers are 2U Rackmount systems. Each rack has a switch. Servers are connected via a 100 Mbit/s Ethernet link to the local switch. Switches are connected to core gigabit switch using one or two gigabit uplinks.
Since queries are composed of words, an inverted index of documents is required. Such an index allows obtaining a list of documents by a query word. The index is very large due to the number of documents stored in the servers.
- Google load balancers take the client request and forward it to one of the Google Web Servers via Squid proxy servers.
- Squid proxy servers take the client request from the load balancers and return the result, if present in local cache; otherwise, they forward the request to a Google Web Server.
- Google web servers coordinate the execution of queries sent by users, then format the result into an HTML page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.
- Data-gathering servers are permanently dedicated to spidering the Web. Google's web crawler is known as GoogleBot. They update the index and document databases and apply Google's algorithms to assign ranks to pages.
- Each index server contains a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.
- Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.
- Ad servers manage advertisements offered by services like AdWords and AdSense.
- Spelling servers make suggestions about the spelling of queries.
Server hardware and software
- Sun Ultra II with dual 200 MHz processors, and 256 MB of RAM. This was the main machine for the original Backrub system.
- 2 × 300 MHz Dual Pentium II Servers donated by Intel, they included 512 MB of RAM and 9 × 9 GB hard drives between the two. It was on these that the main search ran.
- F50 IBM RS/6000 donated by IBM, included 4 processors, 512 MB of memory and 8 × 9 GB hard drives.
- Two additional boxes included 3 × 9 GB hard drives and 6 x 4 GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
- IBM disk expansion box with another 8 × 9 GB hard drives donated by IBM.
- Homemade disk box which contained 10 × 9 GB SCSI hard drives.
|This article may need to be updated. Please update this article to reflect recent events or newly available information, and remove this template when finished. Please see the talk page for more information.|
Servers are commodity-class x86 PCs running customized versions of Linux. The goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. Estimates of the power required for over 450,000 servers range upwards of 20 megawatts, which cost on the order of US$2 million per month in electricity charges. The combined processing power of these servers might reach from 20 to 100 petaflops.
- Upwards of 15,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III (as of 2003[update]). A 2005 estimate by Paul Strassmann has 200,000 servers, while unspecified sources claimed this number to be upwards of 450,000 in 2006.
- One or more 80 GB hard disks per server (2003)
- 2–4 GB of memory per machine (2004)
The exact size and whereabouts of the data centers Google uses are unknown, and official figures remain intentionally vague. In a 2000 estimate, Google's server farm consisted of 6,000 processors, 12,000 common IDE disks (2 per machine, and one processor per machine), at four sites: two in Silicon Valley, California and one in Virginia. Each site had an OC-48 (2488 Mbit/s) internet connection and an OC-12 (622 Mbit/s) connection to other Google sites. The connections are eventually routed down to 4 × 1 Gbit/s lines connecting up to 64 racks, each rack holding 80 machines and two Ethernet switches. The servers run custom server software called Google Web Server.
Hardware details considered sensitive
In a 2008 book, the reporter Randall Stross wrote: "Google's executives have gone to extraordinary lengths to keep the company's hardware hidden from view. The facilities are not open to tours, not even to members of the press." He wrote this based on interviews with staff members and his experience of visiting the company.
Google has numerous data centers scattered around the world. At least 12 significant Google data center installations are located in the United States. The largest known centers are located in The Dalles, Oregon; Atlanta, Georgia; Reston, Virginia; Lenoir, North Carolina; and Goose Creek, South Carolina. In Europe, the largest known centers are in Eemshaven and Groningen in the Netherlands and Mons, Belgium.
One of the largest Google data centers is located in the town of The Dalles, Oregon, on the Columbia River, approximately 80 miles from Portland. Codenamed "Project 02", the new complex is approximately the size of two football fields, with cooling towers four stories high. The site was chosen to take advantage of inexpensive hydroelectric power, and to tap into the region's large surplus of fiber optic cable, a remnant of the dot-com boom. A blueprint of the site has appeared in print.
In February 2009, Stora Enso announced that they had sold the Summa paper mill in Hamina, Finland to Google for 40 million Euros. Google plans to invest 200 million euros on the site to build a data center.
Most of the software stack that Google uses on their servers was developed in-house. It is believed that C++, Java and Python are favored over other programming languages. Google has acknowledged that Python has played an important role from the beginning, and that it continues to do so as the system grows and evolves.
The software that runs the Google infrastructure includes:
- Google Web Server
- Google File System
- Chubby lock service
- MapReduce and Sawzall programming language
- Protocol buffers
Most operations are read-only. When an update is required, queries are redirected to other servers, so as to simplify consistency issues. Queries are divided into sub-queries, where those sub-queries may be sent to different ducts in parallel, thus reducing the latency time.
To lessen the effects of unavoidable hardware failure, software is designed to be fault tolerant. Thus, when a system goes down, data is still available on other servers, which increases reliability.
- ↑ 1.0 1.1 1.2 Fiach Reid (2004). "Case Study: The Google search engine". Network Programming in .NET. Digital Press. pp. 251–253. ISBN 1555583156.
- ↑ 2.0 2.1 2.2 2.3 Web Search for a Planet: The Google Cluster Architecture (Luiz André Barroso, Jeffrey Dean, Urs Hölzle)
- ↑ Chandler Evans (2008). "Google Platform". Future of Google Earth. Madison Publishing Company. p. 299. ISBN 1419689037.
- ↑ Chris Sherman (2005). "How Google Works". Google Power. McGraw-Hill Professional. pp. 10–11. ISBN 0072257873.
- ↑ Michael Miller (2007). "How Google Works". Googlepedia. Pearson Technology Group. pp. 17–18. ISBN 078973639X.
- ↑ "Google Stanford Hardware." Stanford University (provided by Internet Archive). Retrieved on July 10, 2006.
- ↑ Tawfik Jelassi and Albrecht Enders (2004). "Case study 16 — Google". Strategies for E-business. Pearson Education. p. 424. ISBN 0273688405.
- ↑ Google Surpasses Supercomputer Community, Unnoticed?, May 20, 2008.
- ↑ Strassmann, Paul A. "A Model for the Systems Architecture of the Future." December 5, 2005. Retrieved on March 18, 2008.
- ↑ Carr, David F. "How Google Works." Baseline Magazine. July 6, 2006. Retrieved on July 10, 2006.
- ↑ Hennessy, John; Patterson, David (2002), Computer Architecture: A Quantitative Approach (Third ed.), Morgan Kaufmann, ISBN 1558605967 .
- ↑ Randall Stross (2008). Planet Google. New York: Free Press. p. 61. ISBN 1-4165-4691-X.
- ↑ 13.0 13.1 Rich Miller (March 27th, 2008). "Google Data Center FAQ". Data Center Knowledge. http://www.datacenterknowledge.com/archives/2008/03/27/google-data-center-faq/. Retrieved 2009-03-15.
- ↑ Markoff, John; Hansell, Saul. "Hiding in Plain Sight, Google Seeks More Power." New York Times. June 14, 2006. Retrieved on October 15, 2008.
- ↑ Strand, Ginger. "Google Data Center" Harper's Magazine. March 2008. Retrieved on October 15, 2008.
- ↑ "Stora Enso divests Summa Mill premises in Finland for EUR 40 million". Stora Enso. 2009-02-12. http://www.storaenso.com/media-centre/press-releases/2009/02/Pages/stora-enso-divests-summa-mill.aspx. Retrieved 12.02.2009.
- ↑ "Stooora yllätys: Google ostaa Summan tehtaan" (in (Finnish)). Kauppalehti (Helsinki). 2009-02-12. http://www.kauppalehti.fi/5/i/talous/uutiset/etusivu/uutinen.jsp?oid=2009/02/18987. Retrieved 2009-02-12.
- ↑ "Google investoi 200 miljoonaa euroa Haminaan" (in (Finnish)). Taloussanomat (Helsinki). 2009-02-04. http://www.taloussanomat.fi/talous/2009/03/04/google-investoi-200-miljoonaa-euroa-haminaan/20095951/133. Retrieved 2009-03-15.
- ↑ Mark Levene (2005). An Introduction to Search Engines and Web Navigation. Pearson Education. p. 73. ISBN 0321306775.
- ↑ http://www.artima.com/weblogs/viewpost.jsp?thread=143947
- ↑ http://python.org/about/quotes/
- ↑ http://highscalability.com/google-architecture
- L.A. Barroso, J. Dean, and U. Hölzle (March/April 2002). "Web search for a planet: The Google cluster architecture" (PDF). IEEE Micro 23: 22–28. doi:10.1109/MM.2003.1196112. http://dcagency.netfirms.com./m2022.pdf.
- Shankland, Stephen , CNET news "Google uncloaks once-secret server." April 2, 2009.
- Google Research Publications
- The Google Linux Cluster — Video about Google's Linux cluster
- Web Search for a Planet: The Google Cluster Architecture (Luiz André Barroso, Jeffrey Dean, Urs Hölzle)
- Underneath the Covers at Google: Current Systems and Future Directions (Talk gave by Jeff Dean at Google I/O conference in May 2008)
- Original Google Hardware Pictures
- Google uncloaks once-secret server