Current Stuff

What I'm working on

Signs of adversarial presence and activity in datasets

Systems of all kinds are subject to attack and disruption by adversaries. Some examples include transportation, infrastructure, and banking attacked by terrorists; businesses attacked by those committing fraud or denial of service; governments attacked by insurgents or covert forces from other countries; and other kinds of systems such as tax and borders.

In the data collected about such systems, there are usually traces of what the adversaries are doing or planning to do, but these traces can be hard to find: the volume of data is typically very large while the adversaries' actions are small.

In some settings, for example financial auditing, there are known adversarial activities whose traces can be looked for explicitly. However, even for those settings, adversaries are always looking for new ways to accomplish their ends, so the list of bad things to look for is never complete.

How then can data analysis be used to find the traces of adversaries when, by definition, these traces will look different as adversaries modify their activities to avoid detection? One basic answer is that normality tends to look largely the same: most parts of the system exhibit the same small range of variation over long periods of time. So anything anomalous automatically looks at least a little bit suspicious. Of course, some of the anomalies are just eccentricity; and some (more than you would expect in systems that collect data automatically) are artifacts of failures in collection and processing. However, correlated anomaly is often a strong hint of adversarial activity (since eccentrics tend to differ from one another as well as from the mainstream).

Another place to look for adversarial activity, though, is in those parts of the data that describe unconscious or social activity. Adversaries are inherently doing things that are not socially acceptable, and this creates a kind of self-consciousness that creates visible signals. Adversaries also connect to others in unusual ways, which creates visible signals in their social media and communication activities.

That's why much of my recent work has been focused on language and social networks.

Understanding how language is produced in an operational way makes it possible to reverse engineer documents to get at the mental states that existed in their creators when they were produced. For standalone documents (or speeches) we have made considerable headway in understanding how to measure deception, jihadist intensity, and fraud in a set of documents.

Unfortunately, in most real-world settings, documents (utterances) don't exist in a vacuum, but are part of an ongoing conversation. In this case, verbal mimicry plays a strong role: the signals in any particular document or utterance are an amalgam of what the author is creating at that moment with mimicry of the preceding steps. Untangling this interaction is challenging, although we have made some progress with deception. Some small-scale experiments by others have elicited some of the ways in which mimicry and fresh creation interact, but there haven't been any large-scale (corpus based) experiments.

Social networks also reflect characteristics of their participants in subtle, largely unconscious ways. For example, Christakis has shown how influence flows in social networks over surprisingly long distances. The structure of such networks is also strongly determined. For example, Dunbar's number, the number of acquaintances, seems to transfer to social media quite well.

Given rich ways of modelling social networks, it becomes possible to look for adversarial traces in terms of small, unusually tightly bound subgroups; subgroups that are not connected to the rest of a network in 'normal' ways; or even individuals whose local environment is unusual.

Standard social media analytics isn't rich enough to provide the level of detail and resolution that's need for these problems. So we are working on modelling social networks in which the edges are directed, the edges have types (so 'friend' is different from 'colleague'); and where the connection pattern changes over time.

All of my research work is unclassified. Working on these problems outside the intelligence community provides a different viewpoint so that useful ideas are not missed; provides trained people who can work in the intelligence sector; and means that I can discuss the issues raised by adversarial knowledge discovery (e.g. privacy) in public and with the media.

I also dabble in:

Anomaly detection, broadly understood, is helpful in many of these areas, and I also work at extending our understanding of what 'anomalous' means computationally.

Back to David Skillicorn's home page.