1) Search and Mining as a Disruptive Technology
Search, combined with the increasing availability of online text, is already a disruptive technology -- search allows direct access to information that had previously been organized by libraries, book stores, consultants and other experts, and news media. For many users, keyword search and page-ranking algorithms have displaced the Dewey Decimal system, book categories, consultant reports, newspaper sections, and catalog categories as a way to zero in quickly on where to find relevant information, and electronic access has replaced visits to physical institutions. Search is being embedded into knowledge management and enterprise applications and is beginning to replace traditional enterprise systems as a way for employees to locate information. These innovations are truly terrific because they make information instantly available that we used to spend a lot of time looking for.
While search is a tremendous automation tool that makes us more efficient, text mining goes beyond efficiency-oriented applications to enable radical improvements in key business processes, such as scientific discovery, risk identification, customer relationship management, and market research. These new applications are disruptive in a less direct way than search, but with enterprise-level impact. For example, text mining makes it possible to find hidden linkages between people that can be important for risk assessment, and between companies for competitor assessment. By "reading" the flow of electronic discussion on newsgroups, blogs, and other internet content, companies can continuously monitor their reputation, what consumers are thinking about when making a purchase decision, and how these are changing. The ability to "sense" market changes almost instantly is putting pressure on companies to respond equally quickly. As time cycles are shortened from months, to weeks, to overnight, many of the tried-and-true business processes for supplying refined research to decision makers are being challenged by less-digested, but more current information.
1A. Do Opinions on the Web Predict Real-World Actions?
Or, Does Web Chatter Really Matter?
Technical questions:
(i) There has been recent work around "transmission graphs", which model the propagation of information from one individual to another, typically following the well-developed field of study around the spread of infectious disease (see, for example, the work of Pedro Domingos at UW, or David Kempe, Jon Kleinberg and Eva Tardos at Cornell). However, these techniques address point problems, and do not represent an overall understanding of the space of topic propagation.
(ii) Researchers at IBM and the MIT Media Lab have claimed that (respectively) online buzz around musicians, and blog traffic about musicians, is a leading indicator of billboard chart movement.
(iii) A fundamental operation required to track the propagation of ideas from one online venue to another is the ability to identify a topic when it occurs.
1B. Internet Data and Business Processes: (or, May the Web Be With You)
Technical questions: (these broaden the topics described above to the general topic of how online information can be made actionable)
(i) The motivation of iterative schemes for hyperlinked search like PageRank and HITS has been to recover a weak notion of source authoritativeness from a large number of low-quality “judgments” in the form of hyperlinks. More broadly, however, there is a cornucopia of online information to help us make and combine authoritativeness judgments about sources.
1C. Search and Mining in Support of Disruptive Business Models
Technical questions:
(i) Online product search is already having an impact on traditional corporate channel strategies, as consumers have the ability to find products from lower-cost vendors who ordinarily would never have access to a global customer base.
(ii) Online sites have access to increasingly granular individual information, but social factors will determine how much of this information eventually becomes available during typically browsing sessions.