Current Stuff |
Problems of counterterrorism (and fraud which is closely related) form a major application focus for my work. Strong data mining techniques can help detect terrorist attacks before they happen (or make sure that the culprits pay a price afterwards). All of my work is unclassified. Working on this problem outside the intelligence community provides a different viewpoint so that useful ideas are not missed; provides trained people who can work in the intelligence sector; and means that I can discuss the issues raised by counterterrorism (e.g. privacy) in public and with the media.
Fraud is a major problem in developed countries, accounting for perhaps 8-12% of GDP, especially in sectors such as insurance, health, taxation, and customs. On the other hand, each dollar spent on fraud detection has an expected return of $25-29. Data mining for fraud detection has the potential to be as large a market as customer relationship management. I am working on the following topics at present:
Enron. We have analysed emails collected in the three years before the collapse of Enron to try and understand how email is used to communicate, and how deceptiveness of various kinds shows up in such data. We have also been able to validate models of deception against this dataset.
At present, treatment strategies for most diseases are determined based only on the disease diagnosis, and perhaps some obvious properties of the patient. We are extending this to allow treament decisions to be made based on subtype of the disease (determined from microarray data), the patient's genetic makeup (determined, for example, from SNPs), other properties of the patient (for example, age), and constraints of the health system. We intend to build a system that will provide guidance to physicians of the treatment-outcome landscape. Such as system should improve patient recovery rates and reduce costs.
Mineral exploration typically involves deep drilling which, because of its expense, can only be done at large intervals. We are exploring data mining of geochemical data from surface or near-surface samples to predict the presence of underlying mineralization.