The addition of view-based KML search in Google Earth a few days ago (and soon Maps) was typically understated, though the implications for our online search habits are huge. I sent Google Earth CTO Michael Jones some questions about the thinking behind Google’s place-search technology, and he was happy to answer them:
- Ogle Earth: What makes Google Search so good is that the most relevant hits crowd at the top of the results. Do you apply a ranking algorithm to KML files?
Michael Jones: Yes, the basis of place searching (Earth Search) is a textual, spatial, and contextual ranking of results with the intent that the first set of results returned contain the information that a user is searching for. Ranking in this context is different from that in former efforts. Unlike web searching, cross-links and popularity estimates may not be the key notion; unlike local searching, the professional and casual reviews are not so helpful; and, unique to exploratory Earth browsing, where you look is a big part of understanding what you are looking for–a search for ‘railroad’ when all of France is in view clearly asks a different question from the same text when the user’s view is zoomed to show part of the viaduct in Morlaix.
One aspect of our web-of-places ranking technique leverages ideas from Google Local Search. Other components are unique to the place-ranking systems we developed to select the GeoWeb layer’s “Golden I” placemarks from the much larger set of Google Earth Community placemarks. Finally, like cherished recipes, there are a few secret ingredients as well.
- OE: In the absence of a lot of direct links to KML files, do you use the pagerank of the enclosing URL as a proxy? Do you measure clicks on search results inside Google Earth for feedback?
MJ: The Google pagerank value is a proven indicator of search relevance. Using this value (and related values) is an important component of our placerank system. However, the search contexts are so different that the choice of which KML file to show is really a new search result optimization question.
For example, government GIS data — such as tax values for each property in a country — may make their way to searchability. When they do, your search for the tax records for your house may well be the ONLY Earth-search for that particular record ever performed. In this case, it would be true that the “popularity” of that particular KML is zero before you ask for it and only slightly better afterwards, but still it is the absolute authority on the query in question. Even though others would have also trusted their own tax searches, none of them would be likely to have searched for the same item as you and overall, probably only a small percentage of the possible results at a tax site would be searched. Having this site and these results automatically acquire great prominence is an example of the unique aspects of ranking for Earth-based visual searching.
- OE: What if people start wrapping spam inside KML? What if people start using KML files to cybersquat on the competition — literally? :-)
MJ: First, remember that it is not the goal to prevent spam and worthless content. The lesson of Larry & Sergey’s work in web search is not that Google prevented bad web pages, but that users are consistently shown good pages with relevant content in the first page of search results. The same is true of other filters. Bookstores don’t prevent bad books from being written but they do specialize in having good books in stock. Likewise, we’re primarily concerned about delivering high quality Earth-search results irrespective of the existence of low quality results in terms of ranking or spam content.
Despite that, spam-like content will happen and since it may get through our defenses, we must work to fight it. The evolution of content in email and web of pages will be replayed here in the web of places. It is logical that remedy’s for these problems in the web of pages will help us fight the good fight in this new context.