Flat World Strategies: Google and Search Wikia, Search Technology Explained [23:10]

Published: Jan. 7, 2007, 5:42 p.m.

Intro: Right before the 2006\nholidays Jimmy Wales, creator of the online encyclopedia Wikipedia, announced\nthe Search Wikia project. This project will rely on search results based on the\nfuture sites community of users. In this podcast we take a look at popular\nsearch engine technologies and discuss the Search Wikia project concept.

\n\n\n\n\n\n\n\n

Question: I know this project was\nreally just announced. Before we get into the technology involved - can you\ntell us what phase the project is in?
According to the BBC Jimmy Wales is currently recruiting\npeople to work for the company and he's buying hardware to get the site up and\nrunning.\xa0

\n\n\n\n\n\n\n\n\n\n\n\n\n\n

Question: What makes this concept\nfundamentally different than what Google or Yahoo! Are doing?
When Wales announced the project he came\nright out and said it was needed because the existing search systems for the\nnet were "broken". They were broken, he said, because they lacked\nfreedom, community, accountability and transparency.

\n\n


\n\n

Question:\xa0 This sounds a lot like digg - am I on the\nright track?
Yes you are - what you end up with\nis a digg like application, or what Wales is calling, a\n"people-powered" search site.

\n\n\n\n\n\n\n\n


\n\n

Question: Can you provide a bit more\ndetail on how Google works?
Googlebot is Google's web crawling\nrobot. Googlebot finds pages in two ways: through an add URL form, www.google.com/addurl.html,\nand through finding links by crawling the web.

\n\n\n\n\n\n\n\n

Source: www.google.com\xa0

\n\n\n\n

Question: That's Googlebot, how does\nthe indexer work?
Googlebot gives the indexer the\nfull text of the pages it finds. These pages are stored in Google's index\ndatabase. This index is sorted alphabetically by search term, with each index\nentry storing a list of documents in which the term appears and the location\nwithin the text where it occurs. This data structure allows rapid access to\ndocuments that contain user query terms.

\n\n\n\n

Source: www.google.com

\n\n\n\n\n\n\n\n

Question: So now that everything is\nindexed, can you describe the search query?
The query processor has several\nparts, including the user interface (search box), the "engine" that\nevaluates queries and matches them to relevant documents, and the results\nformatter.

\n\n\n\n

PageRank\nis Google's system for ranking web pages. A page with a higher PageRank is\ndeemed more important and is more likely to be listed above a page with a lower\nPageRank.

\n\n\n\n\n\n\n\n

Source: www.google.com\xa0

\n\n\n\n

Question: Can you run us through,\nstep by step, a Google search query?
Sure - this is also off of Google's\nsite, Here's the steps in a typical query process:

\n\n\n\n

1. User accesses google server at\ngoogle.com and makes query.

\n\n\n\n

2. The web server sends the query\nto the index servers. The content inside the index servers is similar to the\nindex in the back of a book--it tells which pages contain the words that match\nany particular query term.

\n\n\n\n

3. The query travels to the doc\nservers, which actually retrieve the stored documents. Snippets are generated\nto describe each search result.

\n\n\n\n

4. The search results are returned\nto the user in a fraction of a second.

\n\n\n\n

Source: www.google.com

\n\n\n\n\n\n\n\n

Question: OK, so now we know how\nGoogle and Yahoo! How will this new Search Wikia type search engines work.
I can give some details based on\nwhat I've taken a look at. As we've said the Search Wikia project will not rely\non computer algorithms to determine how relevant webpages are to keywords.\nInstead the results generated by the search engine will be decided and edited\nby the users.

\n\n\n\n

\xa0

\n\n\n

There are a couple of projects\ncalled Nutch and Lucene, along with some others that can\xa0 now provide the\nbackground infrastructure needed to generate a new kind of search engine, which\nrelies on human intelligence to do what algorithms cannot. Let's take a quick\nlook at these projects.

\n\n\n\n\n

\xa0

\n\n\n\n\n

Lucene: Lucene is a free and\nopen source information retrieval API, originally implemented in Java by Doug\nCutting. It is supported by the Apache Software Foundation and is released\nunder the Apache Software License.\xa0

\n\n\n\n\n

\xa0

\n\n\n

We mentioned Nutch earlier. Nutch\nis a project to develop an open source search engine. Nutch is supported by the\nApache Software Foundation, and is a subproject of Lucene since 2005.

\n\n\n

\n

With Search Wikia Jimmy Wales hopes to build on Lucene and Nutch by adding the social component. What we'll end up with in the end is more intelligent and\nsocial based search tools. Now, don't think Google, Yahoo!, Microsoft and all\nthe rest are not working on these kinds of technologies. It will be interesting\nto watch how these new technologies and methods are implemented.

\n\n\n\n\n\n

Sources: http://search.wikia.com
http://search.wikia.com/wiki/Nutch
http://lucene.apache.org/java/docs/

\n

http://wikipedia.org/

\n


\n

References:

\n\n\n\n\n\n

Wikipedia creator\nturns to search: http://news.bbc.co.uk/2/hi/technology/6216619.stm\n

\n\n

How\nGoogle Works: http://www.googleguide.com/google_works.html\n

\n\n\n\n

Search Wikia website:\nhttp://search.wikia.com

\n\n\n\n

Search Wikia Nutch website\nhttp://search.wikia.com/wiki/Nutch\n

\n\n\n\n\n\n

Lucene Website: http://lucene.apache.org/java/docs/

\n\n\n\n\n

Wikipedia Website:\nhttp://wikipedia.org/

\n\n\n