Thursday, January 27, 2011

Word clouds and sputnik moments

The US President's State of the Union address always attracts a fair bit of media analysis, but in recent years the analysis has taken on an apparently more linguistic angle. Thanks to cool tools like Wordle and Concordle which allow you to paste in text and then create word clouds based on lexical frequency (see the graphic above, for a word cloud of this blog post), many commentators have started to argue that key themes can be discerned, and from these patterns judgements can be made about the president's concerns.

This link from the BBC news site gives us a chance to examine the relative frequency of 10 words over the last 220 years. Meanwhile, this piece from the BBC last year takes a look at the frequency of particular words in British political parties' election manifestos from 1945 to 2010.

On the surface this sort of analysis makes sense, but as linguists point out, just because you use a word many times doesn't necessarily mean that that specific word reveals a great deal about you. For example, I really dislike Michael Gove, the Education Secretary. The reasons are numerous and varied, ranging from a knee-jerk dislike of Gove's posh Tory background, through to Gove's dogmatic belief that all schools should become "free" schools or academies, Gove's scrapping of the EMA and his bizarre insistence that state schools' problems can be solved by frisking teenagers for porn and putting ex-soldiers in classrooms. But it doesn't stop there: I also have (an admittedly childish) dislike of his resemblance to Dobbie from Harry Potter. Now, if you were to create a word cloud of this post, I'm sure the words Michael and Gove would crop up quite frequently, leading some to suggest that this post is all about Michael Gove. It's not. It's about language, not about Gove at all. But presumably, you can see my point by now...

 This blog entry by the Lousy Linguist puts it more analytically (and more sensibly), so is well worth a read.

Arguably, one of the most telling moments in President Obama's speech this week was his use of the phrase Sputnik moment to draw a parallel between the US's economic and technological status in the world now  - and the need for a reinvention of the USA's research and development programmes - compared to the moment in the last century when the Soviet Union gave the USA a nasty shock by successfully launching the world's first satellite. However, he used the word sputnik only twice in his address, so such an interesting image is unlikely to be identified in a basic crunching of the data.
So, in short, word clouds and simple crunching tools such as those mentioned above can be really helpful for carrying out what corpus linguists like to call "quick and dirty" breakdowns of word frequencies in texts, but we always need to be aware of context and meaning if we're to avoid drawing unhelpful conclusions from the data. In many ways, a quick corpus analysis using something like Wordle can be a brilliant way of opening up new ways into a text, be it a speech, a poem or an extract from a novel, but it's not a substitute for detailed analysis.

