Niche Search [20:52]

Published: Aug. 6, 2007, 10:36 p.m.

b'

Intro:\\nYou may think Google and Yahoo have a lock on search but it may be time\\nto starting thinking a little differently. In this podcast we take a\\nlook at some niche search sites.

\\n

Mike: Gordon, we love Google products and services - is there a the problem?

\\n\\n

It\\nmay be Google does too good of a job! Have you ever tried Google\\nsearching on a persons name? A simple Google search on my first and\\nlast name gives over 1.9 million results!

\\n

Today,\\nthree companies control almost 90% of online search:

\\n

- over\\n\\t50% of all searches are done using Google\\n\\t

\\n

- over\\n\\t25% on Yahoo

\\n

- and\\n\\tover 13% using Microsoft

\\n\\n\\n

There\\nare some problems though \\u2013 these search engines primarily give\\nresults based on the number of sites linking to a page and the\\nprominence of search terms on a page. Because they work this way\\nthere is room for niche.

\\n

Mike: With\\nthis kind of lock on search it would be almost impossible for a\\nstartup to launch a successful general search product - right?
\\n

\\n\\n

Yes\\n- it would be almost impossible but we are seeing some acrivirt in the\\nniche areas. Areas like travel and finance are niches that have already\\nbeen filled but today there seems to be some room in the\\npeople search area. \\n

\\n\\n\\n

Mike: Are there companies in this market we should be looking at?

\\n

One\\nof the startups to watch is Spock at www.spock.com.\\nSpock is scheduled for their public launch the first week of August.\\nAmong other places on the web, Spock scans social networking websites\\nlike Facebook and LinkedIn. Search results give summary information\\n(age, address, etc) about the person along with a list of website links\\nthat refer to the person.

\\n\\n

\\n

According\\nto Spock 30% of the 7 billion searches done on the web every month\\nare related to individuals. Spock says about half of those searches\\nconcern celebrities with the other half including business and\\npersonal lookups. According to Spock, a common problem that we face\\nis that there are many people with the same name. Given that, how do\\nwe distinguish a document about Michael Jackson the singer from\\nMichael Jackson the football player?

\\n\\n\\n

With\\nbillions of documents and people on the web, we need to identify and\\ncluster web documents accurately to the people they are related to.\\nMapping these named entities from documents to the correct person is\\nwhat Spock is all about and they\\u2019re coming at the problem in an\\ninteresting way. \\n

\\n\\n\\n

Mike: I\'ve looked at Spock - what is the Spock Challenge?

\\n

They\\u2019ve\\nlaunched what they call the Spock Challenge \\u2013 more formally\\nreferred to as the SPOCK Entity Resolution Problem linked here:\\nhttp://challenge.spock.com/pages/learn_more\\n\\n

\\n\\n

If\\nyou go to the site you can download a couple of data sets \\u2013 one\\ncalled a training set (approx 25,000 documents) and the other called\\na test set (approx 75,000 documents). \\n

\\n\\n\\n

Along\\nwith the document sets they include a set of target names. You assume\\nthat each document contains only one of the target names (even though\\nmost documents contain many names). The challenge is to partition all\\nthe documents relevant to a target name by their referent. \\n

\\n\\n\\n

Mike: When does the contest begin and end?

\\n\\n\\nIt has already begun on 4/16/07. It will end on 11/16/07. On\\n11/16/07, Spock will run the final round of the competition and announce\\nthe winner.
Here are the dates off the website:

\\n
4/16 Registration started \\n
\\n\\n5/1- 8/15 Proposal submissions accepted \\n
\\n\\n7/1 Leader board live \\n
\\n\\n11/1 Finalists announced \\n
\\n\\n11/16 Final round at Spock, winner announced
\\n

Mike: What languages and tools be used?

You can use any language and any non-commercial libraries, tools\\nand data to develop the solution. There is one catch - the winner grants Spock\\nnon-exclusive right to use the software and data. As an FYI, much of Google is actualy written in Python with the Search Engine Core written in C++. Python provied scripting\\nsupport for the search engine. and some apps like google code are done\\nin python

Mike: Can you give us and example of how this works?

From their website: Consider\\nthe following two documents with the target name "Michael\\nJackson":
\\n

Michael\\nJackson - The King of Pop or Wacko Jacko?

\\n\\n

Michael\\nJackson statistics - pro-football-reference.com

\\n\\n

The\\nreferents of these articles are the pop star and football player,\\nrespectively. They\\u2019ve also included the ground truth for the\\ntraining set so you have something to compare against.

\\n\\n\\n

Once\\nyou\'re done training, you can run your algorithm on the test set and\\nsubmit your results on this site. Spock will provide instant feedback\\nin the form of a percentage rank score. This way you can see how you stack up against the\\nother teams. \\n

\\n\\n

So\\nthey provide you with a lot of well constructed data, and the ground\\ntruth about that data. \\u201cGround truth? data is real\\nresults and you use this information to validate your search\\nalgorithm results. \\n

\\n\\n

\\n

This\\ndata is documents about people, and the challenge is to determine all\\nthe unique people described in the data set. This data can be your\\ntraining set. Once you have got your basic algorithm working against\\nthe training set, they let you further tune your code by running it\\nagainst a second test data set and give you instant accuracy feedback\\nin the form of a score. The score depends on how many correct unique\\npeople you can identify in the data. This way you can continue to\\nrefine your work, and see how you are doing, and how well others are\\ndoing. \\n

\\n\\n\\n

This looks like a great academic challenge. At\\nthe end of the contest time, you submit your code, a 3 page\\ndescription of your approach, pre-built binary executables that can\\nrun in isolation on Spock servers, and your results (the \\u201cSoftware\\nEntry?). Spock will select the finalists based upon\\nsubmissions, and fly the finalists to visit the judges. The winner\\nwill win $50,000, 2nd place wins $5000 and 3rd place wins $2000.\\n

\\n\\n
\\n
Mike: How doe people enter?

You\\nmay enter the Contest by registering online at\\nwww.spock.com/contestregistration\\n. You may register as an individual or as a team. During the\\nregistration process, you must provide your name, your age, your\\nemail address, and the country you are from. If you are entering on\\nbehalf of an organization, a school or a company, you must identify\\nits name. If you are registering as a team, you must provide the same\\ninformation for each member of your team as well as the identity of a\\nteam leader. You will also provide a name for your team or for\\nyourself by which you or your team will be known to other\\nparticipants in the Contest. Spock may change the name if it feels\\nthe name you select is not appropriate for any reason.

\\n\\n

Mike: What are the differences between the Spock Challenge and the Netflix Challenge?

From Netflix website: The Netflix Prize (http://www.netflixprize.com ) seeks to substantially improve the accuracy of predictions\\nabout how much someone is going to love a movie based on their movie\\npreferences. Improve it enough and you win one (or more) Prizes.
\\n\\nWinning the\\nNetflix Prize improves Netflix ability to connect people to the movies they love.

\\n
Netflix provides you with a lot of anonymous rating data,\\nand a prediction\\naccuracy bar that is 10% better than what Cinematch can do on the same\\ntraining data set. (Accuracy is a measurement of how closely predicted\\nratings of movies match subsequent actual ratings.) If you develop a\\nsystem that Netflix judges\\xa0 beats that bar on the qualifying test set\\nthey\\nprovide, you get serious money and the bragging rights. But (and you\\nknew there would be a catch, right?) only if you share your method with\\nNetflix and describe to the world how you did it and why it works.
In addition to the Grand Prize, we\\u2019re also offering a $50,000\\nProgress Prize each year the contest runs. It goes to the team whose\\nsystem we judge shows the most improvement over the previous year\\u2019s\\nbest accuracy bar on the same qualifying test set. No improvement, no\\nprize. And like the Grand Prize, to win you\\u2019ll need to share your\\nmethod with us and describe it for the world.\\n\\n
\\nThe Netflix contest started October 2, 2006 and continues through at least October 2, 2011.

So..... back to your question - The Netflix Challenge will run another 4 years; Spock Challenge has\\nevery intention to give out the grand prize to a team with a reasonable\\nsolution at the end of the 6 months.
\\nNetflix Chellenge sets an absolute standard for winning the grand\\nprize; Spock Challenge intends to award to the best reasonable solution.
\\n
\\n\\n

\\n

\\n\\n

Mike: How about some other companies?

\\n\\n\\n

Wink\\n\\u2013 www.wink.com Similar\\nto Spock \\u2013 launched a few months ago. Claim that Wink People\\nSearch now searches over two hundred million people profiles.\\nSearches people across numerous social networks including MySpace,\\nLinkedIn, Friendster, Bebo, Live Spaces, Yahoo!360, Xanga, Twitter\\nand more. Also included in the results are Web sources such as\\nWikipedia and IMDB with more coming all the time.

\\n\\n\\n\\n

Zoominfo\\n\\u2013 www.zoominfo.com Specializes\\nin executive searches. Claim 37,131,140 People and 3,518,329\\nCompanies indexed. You can currently search on three categories \\u2013\\npeople, jobs and companies.

\\n

Searchwikia - http://search.wikia.com Jimmy Wales and his open-source search protocol and human collaboration project. From Press release:

\\n

"Last week Wikia acquired Grub, the original visionary\\ndistributed search project, from LookSmart and released\\nit under an open source license for the first time in four years. Grub\\noperates under a model of users donating their personal computing\\nresources towards a common goal, and is available today for download\\nand testing at: http://www.grub.org/ . \\n

\\n\\n\\n
Grub, now open source, is designed with modularity so that\\ndevelopers can quickly and easily extend and add functionality,\\nimproving the quality and performance of the entire system. By\\ncombining Grub, which is building a massive, distributed\\nuser-contributed processing network, with the power of a wiki to form\\nsocial consensus, the open source Search Wikia project has taken the\\nnext major step towards a future where search is open and transparent".

\\n

\\n

'