We use software known as âweb crawlersâ to discover publicly available WebPages. The most well-known crawler is called âGoogle bot.â Crawlers look at WebPages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those WebPages back to Googleâs servers.
The crawl process begins with a list of web addresses from past crawls and sitemaps provided by website owners. As our crawlers visit these websites, they look for links for other pages to visit. The software pays special attention to new sites, changes to existing sites and dead links.
The web is like an ever-growing public library with billions of books and no central filing system. Google essentially gathers the pages during the crawl process and then creates an index, so we know exactly how to look things up. Much like the index in the back of a book, the Google index includes information about words and their locations. When you search, at the most basic level, our algorithms look up your search terms in the index to find the appropriate pages.
The search process gets much more complex from there. When you search for âdogsâ you donât want a page with the word âdogsâ on it hundreds of times. You probably want pictures, videos or a list of breeds. Googleâs indexing systems note many different aspects of pages, such as when they were published, whether they contain pictures and videos, and much more. With the Knowledge Graph, weâre continuing to go beyond keyword matching to better understand the people, places and things you care about.