There’s a lot of language out there on the Internet.
But how does the volume of language on Twitter, web pages, and the rest of the Internet compare with the amount of face-to-face conversation?
Computational linguist Robert Munro has taken on the mammoth task of working out how we communicate today, and he tells linguist Daniel Midgley all about it on this episode of Talk the Talk.
Listen to this episode
You can listen to all the episodes of Talk the Talk by pasting this URL into your podlistener.
http://danielmidgley.com/talkthetalk/talk_classic.xml
Now’s your chance to hear what happens when two computational linguists get together and nerd out. I got the chance to talk to Robert Munro, who’s put together a chart showing how we communicate today. Don’t worry — Ben Ainslie keeps us on track.
I find it very encouraging that despite the tendency of English to dominate and kill other languages, mobile phones (and other forms of language tech) are having the opposite effect. They’re reducing the barriers to communication, keeping minority languages alive. It used to be that if speakers of a language were dispersed, the language would die. Not anymore — they’re talking on their indestructible Nokias.
The exciting part is that if we have access to SMS corpora of a lot of minority languages, computers could automatically build dictionaries and grammars for them — at least, once we get good at that stuff. But we’re working on it.
Show notes
A fascinating graphic for visualising all the world’s languages.
https://web.archive.org/web/20140219075950/http://strata.oreilly.com/2013/02/how-the-world-communicates-in-2013.html
(originally http://strata.oreilly.com/2013/02/how-the-world-communicates-in-2013.html)
Robert Munro is a computational linguist
http://www.robertmunro.com/
and CEO of Idibon, a language tech company.
http://www.idibon.com/
Two hundred million tweets per day seems like a lot, but it’s just a drop in the ocean.
https://dev.twitter.com/discussions/3914
You can use aggregated human behaviour to track flu epidemics.
http://www.google.org/flutrends/
More about Google Flu, in cartoon form.
http://www.google.com/trends/correlate/comic?p=9
A scholarly paper on tracking epidemics by Munro et al.
http://www.robertmunro.com/research/munro12epidemics.pdf
But could there be a limit to this approach?
http://slashdot.org/topic/bi/google-flu-trends-suggests-limits-of-crowdsourcing/