Friday, January 30, 2009

Taking a DeepDyve into the Deep Web

If the latest figures from Internet research companies can be believed, Google serves the needs of most web searchers most of the time. But serious researchers know that the web boasts nooks and crannies that even Google’s spiders can’t reach. DeepDyve thinks it can help solve this problem.

These secret caverns, nicknamed the “deep web,” hide their content behind subscription-based firewalls. Mainly of interest to academics and researchers in specialized fields (such as law or medicine), these databases of information are expensive to build and expensive to maintain. Fortunately, their main clientele includes those with the money to pay for access: lawyers, large libraries, medical schools, major universities and the like.

That’s not quite so helpful for the individual, independent researcher. There are parts of the deep web that don’t require a subscription, but a lone researcher may face other obstacles. Many important scientific papers don’t receive a lot of links, regardless of their scholarly citations. This stymies the search engines’ usual approach to finding documents. So what can you do when you know the information is out there, but you just can’t get to it by the usual means?

This is where DeepDyve comes in. Two bio-informatics scientists who worked on the Human Genome Project founded the company as Infovell in 2005. Their genetics background shows in the algorithm DeepDyve uses. Called “KeyPhrases,” it indexes passages up to 20 keywords in length rather than single keywords. Indeed, rather than focusing on key words, KeyPhrases matches patterns and symbols. DeepDyve CEO William Park told Wired that the algorithm “is really doing pattern matching; it’s not at all language dependent. In fact it’s actually language agnostic.”

This reflects a genetics background in at least two ways. First, think of the length of the human genome; it’s huge, and made up entirely of three-letter “words,” the amino acid codes that together form the chains of proteins that keep our bodies functioning. DeepDyve’s KeyPhrases algorithm uses indexing techniques from the field of genomics; if they work on the human genome, what can they do for the Internet?
Seo India, Seo services Hyderabad

1 comment: