There’s a lot of language out there on the Internet.

But how does the volume of language on Twitter, web pages, and the rest of the Internet compare with the amount of face-to-face conversation?

Computational linguist Robert Munro has taken on the mammoth task of working out how we communicate today, and he tells linguist Daniel Midgley all about it on this episode of Talk the Talk.


Listen to this episode

111: All the Words in the World (featuring Robert Munro)

Download this episode

You can listen to all the episodes of Talk the Talk by pasting this URL into your podlistener.

http://danielmidgley.com/talkthetalk/talk_classic.xml

Now’s your chance to hear what happens when two computational linguists get together and nerd out. I got the chance to talk to Robert Munro, who’s put together a chart showing how we communicate today. Don’t worry — Ben Ainslie keeps us on track.

I find it very encouraging that despite the tendency of English to dominate and kill other languages, mobile phones (and other forms of language tech) are having the opposite effect. They’re reducing the barriers to communication, keeping minority languages alive. It used to be that if speakers of a language were dispersed, the language would die. Not anymore — they’re talking on their indestructible Nokias.

The exciting part is that if we have access to SMS corpora of a lot of minority languages, computers could automatically build dictionaries and grammars for them — at least, once we get good at that stuff. But we’re working on it.


Show notes

A fascinating graphic for visualising all the world’s languages.
https://web.archive.org/web/20140219075950/http://strata.oreilly.com/2013/02/how-the-world-communicates-in-2013.html
(originally http://strata.oreilly.com/2013/02/how-the-world-communicates-in-2013.html)

Robert Munro is a computational linguist
http://www.robertmunro.com/

and CEO of Idibon, a language tech company.
http://www.idibon.com/

Two hundred million tweets per day seems like a lot, but it’s just a drop in the ocean.
https://dev.twitter.com/discussions/3914

You can use aggregated human behaviour to track flu epidemics.
http://www.google.org/flutrends/

More about Google Flu, in cartoon form.
http://www.google.com/trends/correlate/comic?p=9

A scholarly paper on tracking epidemics by Munro et al.
http://www.robertmunro.com/research/munro12epidemics.pdf

But could there be a limit to this approach?
http://slashdot.org/topic/bi/google-flu-trends-suggests-limits-of-crowdsourcing/