In this project, we will develop and deploy an intelligent computer system is called Tool for Intelligent Knowledge Management and Discovery (TIKManD). The system can mine Internet homepages, emails, chat lines, and/or authorized wire tapping information (which may include multi-lingual information) to recognize, conceptually match, and rank potential terrorist activities (both common and unusual) by the type and seriousness of the activities. This will be done automatically or semi-automatically based on predefined linguistic formulations and rules defined by experts or based on a set of known terrorist activities given the information provided through law enforcement databases (text and voices) and huge number of "tips" received immediately after the attack.
To use the wire tapped and/or chat information, we will use state-of-the-art technology to change voice to text. Given the fact that the information to be gathered will include both Arabic and Farsi information, we will use state-of-the-art Arabic-English and Farsi-English dictionaries and thesauruses. We intent to develop multi-lingual terrorism ontology and a dictionary that will be developed both through existing documents related to terrorism activities and defined by experts and will be linked to English ontology that will be built for terrorism recognition. An intelligent technique based on integrated Latent Semantic Indexing and Self Organization Map (LSI-SOM), developed at LBNL and UC Berkeley, will be used for this aspect of the project.
In addition, a Conceptual Fuzzy Set (CFS) model will be used for intelligent information and knowledge retrieval through conceptual matching of text and voice (here defined as "concept"). The CFS can be used for constructing fuzzy ontology or terms relating to the context of the investigation (terrorism) to resolve the ambiguity. This model can be used to calculate conceptually the degree of match to the object or query. In addition, the ranking can be used for intelligently allocating resources given the degree of match between objectives and resources available.
Finally, we will use our state-of-the-art technology as a tool for effective data fusion and data mining of textual and voice information. Data fusion would attempt to collect seemingly related data to recognize: (1) increased activities in a given geographic area, (2) unusual movements and travel patterns, (3) intelligence gathered through authorized wire tapping of suspected subgroups, (4) unexplainable increase or decrease in routine communication, and (5) integration of reports and voice (text/audio) from different sources.
Data mining would use the above data to pinpoint and extract useful information from the rest. This would also apply to the "response" part of the initiative on how to fuse the huge number of "tips" received immediately after the attack, filter out the irrelevant parts, and act on the useful subset of such data in a speedy manner.
For example, given data about movement of people of interest, we can use the proposed data analysis technology to identify suspicious patterns of behavior, raise alarms and do interactive data analysis on geographical maps of the world or specific countries. For email and telephone monitoring, standard search technology would not be suitable, as they are often short and non-standard, so the system will need significant prior knowledge about what to look for. Our recent work on fuzzy query, search engines, and ranking can be leveraged here.
Finally, in phase two we intend to design a secure and distributed IT infrastructure to provide a means for secure communication between resources and to build textual, voice, and image databases online. Given the distributed nature of the information sources, a federated database framework is required to distribute storage and information access across multiple locations.
Tracks the Terrorist (TrasT): phase one of this project includes: (1) text data mining, (2) ontology related to terrorism activities, (3) a tracking system, and (4) a federated data warehouse.
Text data mining and ontology: we intend to develop ontology and a dictionary that will be developed both through existing documents related to terrorism activities and defined by experts and will be linked to English thesauruses that will be used for terrorism recognition. An Intelligent technique based on integrated Latent Semantic Indexing and Self Organization Map (LSI-SOM) will be used in this project.
Tracking system: given the information about suspicious activities such as phone calls, emails, meetings, credit card information, hotel, and airline reservations that are stored in a database containing the originator, recipient, locations, times, etc. We can use visual data mining software developed at Btexact Technologies (UC Berkeley sponsor) to find suspicious patterns in data using geographical maps. The technology developed can detect unusual patterns, raise alarms based on classification of activities, and offer explanations based on automatic learning techniques for why a certain activity is placed in a particular class such as "safe," "suspicious," "dangerous," etc. The underlying techniques can combine expert knowledge and data driven rules to continually improve its classification and adapt to dynamic changes in data and expert knowledge.
The system is Java-based, web-enabled, and uses MapInfo (commercial mapping software) and Oracle database to display and access the data. It consists of two main subsystems: modeling and visualization. The modeling subsystem deals with data analysis and explanation generation, and visualization represents extracted patterns on geographical maps and enables visual interaction with the modeling back end. For example, US authorities have announced that 5 out of 19 suspected September 11 hijackers met in Las Vegas several days before the attack. Some of the terrorists attended the same training school, had communication tracks, and flew at the same time period to the same location in the same flight. In addition, some of the suspects were on the FBI watch list. The new model intends to capture and recognize such activities and raise alarms by profiling, e.g., spatial clustering.
Data warehouse architecture: we will leverage DOE security (funded through National Collaboratory) to provide a means for secure communication between resources and to build textual databases. Given the nature of the information sources, data warehouse architecture for information access across multiple locations is required.