b'
Intro:\\nYou may think Google and Yahoo have a lock on search but it may be time\\nto starting thinking a little differently. In this podcast we take a\\nlook at some niche search sites.
\\nMike: Gordon, we love Google products and services - is there a the problem?
It\\nmay be Google does too good of a job! Have you ever tried Google\\nsearching on a persons name? A simple Google search on my first and\\nlast name gives over 1.9 million results!
Today,\\nthree companies control almost 90% of online search:
\\n- over\\n\\t50% of all searches are done using Google\\n\\t
- over\\n\\t25% on Yahoo
- and\\n\\tover 13% using Microsoft
\\n\\n\\nThere\\nare some problems though \\u2013 these search engines primarily give\\nresults based on the number of sites linking to a page and the\\nprominence of search terms on a page. Because they work this way\\nthere is room for niche.
\\nMike: With\\nthis kind of lock on search it would be almost impossible for a\\nstartup to launch a successful general search product - right?
\\n
Yes\\n- it would be almost impossible but we are seeing some acrivirt in the\\nniche areas. Areas like travel and finance are niches that have already\\nbeen filled but today there seems to be some room in the\\npeople search area. \\n
\\n\\n\\nMike: Are there companies in this market we should be looking at?
\\nOne\\nof the startups to watch is Spock at www.spock.com.\\nSpock is scheduled for their public launch the first week of August.\\nAmong other places on the web, Spock scans social networking websites\\nlike Facebook and LinkedIn. Search results give summary information\\n(age, address, etc) about the person along with a list of website links\\nthat refer to the person.
According\\nto Spock 30% of the 7 billion searches done on the web every month\\nare related to individuals. Spock says about half of those searches\\nconcern celebrities with the other half including business and\\npersonal lookups. According to Spock, a common problem that we face\\nis that there are many people with the same name. Given that, how do\\nwe distinguish a document about Michael Jackson the singer from\\nMichael Jackson the football player?
\\n\\n\\nWith\\nbillions of documents and people on the web, we need to identify and\\ncluster web documents accurately to the people they are related to.\\nMapping these named entities from documents to the correct person is\\nwhat Spock is all about and they\\u2019re coming at the problem in an\\ninteresting way. \\n
\\n\\n\\nMike: I\'ve looked at Spock - what is the Spock Challenge?
\\nThey\\u2019ve\\nlaunched what they call the Spock Challenge \\u2013 more formally\\nreferred to as the SPOCK Entity Resolution Problem linked here:\\nhttp://challenge.spock.com/pages/learn_more\\n\\n
\\n\\nIf\\nyou go to the site you can download a couple of data sets \\u2013 one\\ncalled a training set (approx 25,000 documents) and the other called\\na test set (approx 75,000 documents). \\n
\\n\\n\\nAlong\\nwith the document sets they include a set of target names. You assume\\nthat each document contains only one of the target names (even though\\nmost documents contain many names). The challenge is to partition all\\nthe documents relevant to a target name by their referent. \\n
\\n\\n\\nMichael\\nJackson - The King of Pop or Wacko Jacko?
\\n\\nMichael\\nJackson statistics - pro-football-reference.com
\\n\\nThe\\nreferents of these articles are the pop star and football player,\\nrespectively. They\\u2019ve also included the ground truth for the\\ntraining set so you have something to compare against.
\\n\\n\\nOnce\\nyou\'re done training, you can run your algorithm on the test set and\\nsubmit your results on this site. Spock will provide instant feedback\\nin the form of a percentage rank score. This way you can see how you stack up against the\\nother teams. \\n
\\n\\nSo\\nthey provide you with a lot of well constructed data, and the ground\\ntruth about that data. \\u201cGround truth? data is real\\nresults and you use this information to validate your search\\nalgorithm results. \\n
\\n\\n\\nThis\\ndata is documents about people, and the challenge is to determine all\\nthe unique people described in the data set. This data can be your\\ntraining set. Once you have got your basic algorithm working against\\nthe training set, they let you further tune your code by running it\\nagainst a second test data set and give you instant accuracy feedback\\nin the form of a score. The score depends on how many correct unique\\npeople you can identify in the data. This way you can continue to\\nrefine your work, and see how you are doing, and how well others are\\ndoing. \\n
\\n\\n\\nThis looks like a great academic challenge. At\\nthe end of the contest time, you submit your code, a 3 page\\ndescription of your approach, pre-built binary executables that can\\nrun in isolation on Spock servers, and your results (the \\u201cSoftware\\nEntry?). Spock will select the finalists based upon\\nsubmissions, and fly the finalists to visit the judges. The winner\\nwill win $50,000, 2nd place wins $5000 and 3rd place wins $2000.\\n
\\n\\nYou\\nmay enter the Contest by registering online at\\nwww.spock.com/contestregistration\\n. You may register as an individual or as a team. During the\\nregistration process, you must provide your name, your age, your\\nemail address, and the country you are from. If you are entering on\\nbehalf of an organization, a school or a company, you must identify\\nits name. If you are registering as a team, you must provide the same\\ninformation for each member of your team as well as the identity of a\\nteam leader. You will also provide a name for your team or for\\nyourself by which you or your team will be known to other\\nparticipants in the Contest. Spock may change the name if it feels\\nthe name you select is not appropriate for any reason.
\\n\\n\\n
\\n\\nMike: How about some other companies?
\\n\\n\\nWink\\n\\u2013 www.wink.com Similar\\nto Spock \\u2013 launched a few months ago. Claim that Wink People\\nSearch now searches over two hundred million people profiles.\\nSearches people across numerous social networks including MySpace,\\nLinkedIn, Friendster, Bebo, Live Spaces, Yahoo!360, Xanga, Twitter\\nand more. Also included in the results are Web sources such as\\nWikipedia and IMDB with more coming all the time.
\\n\\n\\n\\nZoominfo\\n\\u2013 www.zoominfo.com Specializes\\nin executive searches. Claim 37,131,140 People and 3,518,329\\nCompanies indexed. You can currently search on three categories \\u2013\\npeople, jobs and companies.
\\nSearchwikia - http://search.wikia.com Jimmy Wales and his open-source search protocol and human collaboration project. From Press release:
\\n"Last week Wikia acquired Grub, the original visionary\\ndistributed search project, from LookSmart and released\\nit under an open source license for the first time in four years. Grub\\noperates under a model of users donating their personal computing\\nresources towards a common goal, and is available today for download\\nand testing at: http://www.grub.org/ . \\n
\\n\\n\\n\\n
\\n'