Wednesday, May 19, 2010

In ur fridge eatin ur foodz

OK, this may or may not help you with your ENGA3 language change topic, but it's a great article nonetheless. In it, David Bamman writes about what the data Twitter produces tells us about language usage in the USA and some of the pros and pitfalls of analysing it. As he points out, while there's lots of information stored about each tweet and about each twitter user, some of it isn't easy to pin down:

The user-defined geographic information is noisy data: while "Boston, MA" can be automatically disambiguated relatively easily to a physical location on the earth (corresponding to coordinates 42.35843, -71.05977), others ("Springfield") are more difficult (there are many Springfields); others still are nearly impossible ("home of that boy Biggie," a reference to New York City quoted from Jay-Z's "Empire State of Mind"), and some ("in ur fridge eatin ur foodz") don't map to any space in physical reality. The sheer volume of data, however, gives us the flexibility to focus more on precision than on overall accuracy - we can throw away all tweets where we aren't over 99% sure of the physical location.

What I like about the article and the research connected to it is that instead of talking about the technology (Is it bad for our language? Is it destroying our ability to talk to each other?) it just gets on and uses it to find out interesting things about the language around us.

It's linked to this site, The Lexicalist, which calls itself "a demographic dictionary of modern American English"

No comments: