Spinning the U.S. Presidential Election
David Skillicorn and Ayron Little, School of Computing, Queen's University, skill at cs dot queensu dot ca, 613 533 6065.
Update: Obama's spin level continues to increase whenever things don't go well for him. Clinton's spin is increasing, presumably for the same reason. See the updated plot at the end.
Barack Obama has changed his word usage in a major way, starting over the weekend of February 23rd-24th, and very obvious in the Ohio debate of February 26th. It is unlikely that he has done this consciously -- rather he has reframed the situation to himself in a way that allows him to be much more open than he has been up to now. It seems likely that he has concluded, or been told, that his position is unassailable, and he will be the nominee. So his level of spin has dropped precipitately. We will have to wait and see whether this affects how he comes across in this new persona.
People leak their mental state whenever they communicate, via changes in the way they use small words: prepositions, pronouns, and auxiliary verbs. One facet of communication that can be detected this way is deception, which itself ranges from outright lying to socially-acceptable negotiation. Here we consider the form of deception that occurs in politics: spin, the way in which politicians subtly change what they say to appeal to the widest possible audience, while maintaining a level of deniability. We look at the speeches by the candidates for the U.S. presidential election from the beginning of 2008 to the middle of February.
Here are some results for speeches by the current contenders for president in the U.S. election: Hillary Clinton, Barack Obama, and John McCain. The table is ordered from least spinful to most spinful (so positive scores are good).
|Candidate||Speech number||Score||Speech details|
|McCain||29||4.7||Feb 7th, CPAC|
|Clinton||1||2.9||Jan 3rd, Iowa|
|Obama||15||0.3||Jan 22nd, Economic Speech|
|Obama||19||0.1||Jan 29th, Reclaiming the American Dream, Kansas|
|Clinton||10||-0.1||Feb 9th, Virginia Jefferson-Jackson Dinner|
|Clinton||6||-0.3||Jan 24th, Solutions for the American Economy|
|Clinton||9||-0.9||Feb 5th, Super Tuesday|
|Clinton||5||-1.8||Jan 14th,SEIU 32BJ Event Honoring the Legacy of Martin Luther King Jr|
|McCain||27||-2.2||Jan 12th, Americans For Prosperity Michigan Summit|
There are some obvious results here: John McCain has the lowest level of spin (justifying the straight talk reputation he claims); followed by Hillary Clinton; while Barack Obama shows a definite tendency to spin his message.
Some of the speeches with unusual ranks are labelled with the occasion on which they were delivered. Not surprisingly, the tendency to spin increases when the candidate is in difficulties.
This analysis is based on work by James Pennebaker, from the University of Texas at Austin, who developed a model for deception in text. The model is based on the relative frequency of certain kinds of words.
The Pennebaker model predicts that deceptive text will be marked by:
The Pennebaker model scores each individual text based on these four different kinds of markers. However, deciding when the usage frequency of, for example, first-person pronouns is decreased requires some assessment of how frequent they are in the first place; and this varies in different contexts: freeform speech, formal business writing, and political speeches. Correlation among the different kinds of signature words is also potentially important. We have extended Pennebaker's model to incorporate context and correlation information. This turns out to be important. It also turns out to be useful to collect related words together and consider the frequency of classes of words. We use: first-person singular pronouns; the word "but"; the word "or"; exclusive words; negative emotion words; and action words. As in the model above, deception is signalled by decreases in words of the first four kinds; and increases in words of the last two kinds.
The model was applied to the speeches of party leaders in the 2006 Canadian Federal election with some success (results here). For what it's worth, the leader with the least amount of spin won that election.
McCain tends to score well because he uses the pronoun "I" heavily. In contrast, Obama tends to use "we", and this pronoun plays a much more complicated role in communication. For example, in two-person interactions, the lower-status person tends to use more first-person singular pronouns, so using "I" creates an impression of humility (which is not what people intuitively think). In contrast, while female use of "we" does seem to signal inclusiveness, male use of "we" is often used to soften an imperative, and so tends not to come across as inclusive.
Even knowing this model, it is hard for a politician to adjust to reduce his/her spin score. Language production is an unconscious process, and the "small" words are not easily controllable. A prepared speech can, to some extent, be polished; but performance in a debate cannot, and so is more revealing.
More Technical Stuff
To produce these results, we counted the frequency of 6 word classes derived from Pennebaker's deception model, and divided the counted frequencies by the length of the speech in which they appeared (so that long speeches did not appear more spinful just because they were longer). The columns of the matrix were zero-centred, and those columns corresponding to words where a reduced frequency is significant were negated (reversed around the origin). A singular value decomposition was used to create a perceptual space for both speeches and words.
This figure shows the perceptual space for speeches. McCain's speeches are shown as red dots; Clinton's as blue dots; and Obama's as blue stars. Notice that there is significant clustering for all three. The straight line is the axis that defines spin, the green end indicating low spin and the red end high spin. Although both McCain and Clinton are towards the green end of the spin axis, their speeches are quite different from each other.
The scores above were calculated by projecting the points onto the spin axis. The number associated with each point is the speech number from the table above.
We can learn about the role of classes of words in the deception model by considering a perceptual space of the six word classes, like this:
The further a point corresponding to a word is from the origin, the more significant it is as a marker for spin. We can see that first-person singular pronouns, action verbs, and the exclusive word "but" dominate. The points corresponding to the other word classes are very close to the origin and play very little role.
The magenta lines indicate which direction corresponds to increased signal for each word class (e.g. increased action verbs but decreased first-person singular pronouns).
The following plot shows both speeches and words in the same plot. Although the plot is very busy and cannot be rotated to see another angle, some interesting structure is visible.
The plot helps to understand what makes each candidate different, because the word-usage patterns and the position of the speech points must agree with each other. For example, we can see that McCain's speeches get such low scores for spin mostly because he uses large numbers of first-person pronouns -- he says "I" a lot. Clinton's speeches get lowish scores for spin because they contain lots of action verbs.
This work is an offshoot of my work in extracting mental state from text, primarily focused on counterterrorism, fraud, and other kinds of crime. More details can be found on my home page and ongoing discussion on my blog.
More details can also be found on James Pennebaker's home page.
Also see Pennebaker's blog about language in the U.S. election.
Updated March 18th: Note the increasing spin from Clinton, and the oscillation between low and high levels of spin from Obama. His spin levels have been highest recently when (a) he lost the Ohio and Texas primaries, and (b) he made his speech about race to try and settle the issue of his pastor.