Scale-Free Networks and The Deep Web
Theoretically, that's why the web is a distinctive form of typical index system: You can follow hyperlinks from page to another. In the "small world" theory of the web, every website is considered to be separated from various other Web sites by typically about 19 clicks. In 1968, sociologist Stanley Milgram invented small-world theory for social support systems by noting that each human was separated from various other humans by only six degrees of separation.
On the Web, the tiny world theory was supported by early research on a small sampling of empire market mirror web sites. But research conducted jointly by scientists at IBM, Compaq, and Alta Vista found something entirely different. These scientists used a net crawler to identify 200 million Website pages and follow 1.5 billion links on these pages.
The empire market link researcher unearthed that the web wasn't such as an index web at all, but rather like a bow tie. The bow-tie Web had a strongly connected component" (SCC) includes about 56 million Web pages. On the proper side of the bow tie was a set of 44 million OUT pages that you can get from the guts, but may not go back to the guts from. OUT pages tended to be corporate intranet and other those sites pages that are designed to trap you at your website when you land. On the left side of the bow tie was a set of 44 million IN pages that you can get to the guts, but you could not go from the center. These were recently created pages that had not even been associated with many center pages. Furthermore, 43 million pages were classified as " tendrils" pages that didn't URL to the guts and couldn't be associated with from the center. However, the tendril pages were sometimes associated with IN and/or OUT pages. Occasionally, tendrils linked to one another without passing through the guts (these are called "tubes"). Finally, there were 16 million pages totally disconnected from everything.
Further evidence for the non-random and structured nature of the Web is provided in research performed by Albert-Lazlo Barabasi at the University of Notre Dame. Barabasi's Team found that not even close to being truly a random, exponentially exploding network of 50 billion Website pages, activity on the Web was highly concentrated in "very-connected supernodes" that provided the connectivity to less well-connected nodes. Barabasi dubbed this sort of network a "scale-free" network and found parallels in the growth of cancers, diseases transmission, and computer viruses. As its computes, scale-free networks are highly prone to destruction: Destroy their supernodes and transmission of messages reduces rapidly. On the upside, if you should be a marketer attempting to "spread the message" about your products, place your products on among the supernodes and watch the news headlines spread. Or build supernodes and attract a massive audience.
Thus the picture of the web that emerges out of this research is quite distinctive from earlier reports. The notion that numerous pairs of website pages are separated by several links, more frequently than not under 20, and that the number of connections would grow exponentially with how large the web, isn't supported. Actually, there is a 75% chance that there's no path from randomly chosen page to another. With this particular knowledge, it now becomes clear why the absolute most advanced web search engines only index a very small percentage of website pages, and more or less 2% of the entire population of internet hosts(about 400 million). Search engines cannot find most those sites because their pages are not well-connected or associated with the central core of the web. Another important finding will be the identification of a "deep web" includes over 900 billion website pages are not easily available to web crawlers that numerous search engine companies use. Instead, these pages are either proprietary (not offered to crawlers and non-subscribers) like the pages of (the Wall Street Journal) or are not easily available from web pages. Within the last several years newer search engines (such whilst the medical search engine Mammaheath) and older ones such as for example yahoo have now been revised to locate the deep web. Because e-commerce revenues simply are determined by customers to be able to find an internet site using search engines, internet site managers need certainly to take steps to make certain their website pages are the main connected central core, or "supernodes" of the web.