People are biased. And computers learn from people.
That means our data is biased, and in a big data world, that can cause big problems.
But researchers are finding ways to turn down the bias in a dataset. We’re talking to two of them on this episode of Talk the Talk.
Listen to this episode
You can listen to all the episodes of Talk the Talk by pasting this URL into your podlistener.
http://danielmidgley.com/talkthetalk/talk_classic.xml
Promo
Full interview
A lot of attention has been focused lately on bias in big data. The short version: People are biased, data comes from people, so the data is biased. And that means that our computational tools may come up with answers that exclude people, marginalise people, or might be just plain wrong. So can we fix this?
Daniel caught up with Robyn Speer (ConceptNet) and Kai-Wei Chang (UCLA) at the 2018 conference for the Association of Computational Linguistics in Melbourne. They work on reducing bias in data, and they explain how it all works.
Thanks to Kai-Wei and Robyn for the chat, and thanks to the ACL for making this interview possible.
Also at https://www.patreon.com/posts/21301041
Cutting Room Floor
Hedvig gives us a brain teaser: In Swedish, adding your to anything makes it an insult: your idiot or even your linguist. But why?
There’s a bonus quiz for Kylie and Hedvig about which way certain words are biased. Who will prevail?
And Hedvig reveals a secret technique she uses for ferreting out her bias.
Also at https://www.patreon.com/posts/21463476
Patreon supporters
Our Patreon patrons are helping us make the show better — and keeping it ad-free and on the airwaves. They include:
- Jerry
- Nicki
- Termy
- Ann
- Helen
- Jack
- Matt
- Sabrina
Thanks to all our patrons! Your support means a lot.
We’re Because Language now, and you can become a Patreon supporter!
Depending on your level, you can get bonus episodes, mailouts, shoutouts, come to live episodes, and of course have membership in our Discord community.
Show notes
AI sucks at stopping online trolls spewing toxic comments
https://www.theregister.co.uk/2018/08/31/ai_toxic_comments/
Gröndahl et al.: All You Need is “Love”: Evading Hate Speech Detection (PDF)
https://arxiv.org/pdf/1808.09115.pdf
Lee, Cho, and Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation
https://arxiv.org/abs/1610.03017
Mehl, et al.: Are Women Really More Talkative Than Men?
http://science.sciencemag.org/content/317/5834/82
Computational linguistics reveals pervasive gender bias in modern English novels
https://www.technologyreview.com/s/611820/computational-linguistics-reveals-pervasive-gender-bias-in-modern-english-novels/
ConceptNet Numberbatch 17.04: better, less-stereotyped word vectors
http://blog.conceptnet.io/posts/2017/conceptnet-numberbatch-17-04-better-less-stereotyped-word-vectors/
I Am Part of the Resistance Inside the Trump Administration
https://www.nytimes.com/2018/09/05/opinion/trump-white-house-anonymous-resistance.html
What is a lodestar, the word from The New York Times Op-Ed people can’t stop talking about?
https://www.usatoday.com/story/news/nation-now/2018/09/06/new-york-times-editorial-lodestar-defined/1210402002/
The @nytimes just published an anonymous op-ed from a “senior administration official.” I’d like to posit a guess as to who wrote it. Getting my @ashleyfeinberg on began with a single word that jumped out at me… https://t.co/ajS2JI8WH2
— Dan Bloom (@danbl00m) 5 September 2018
Language Log: Lodestar
http://languagelog.ldc.upenn.edu/nll/?p=39910
Etymonline: lodestar (n.)
https://www.etymonline.com/word/lodestar
Counsellors dismissed as ‘gender whisperers’ deny teachers have been trained to spot transgender children
https://www.smh.com.au/politics/federal/counsellors-dismissed-as-gender-whisperers-deny-teachers-have-been-trained-to-spot-transgender-children-20180905-p501zd.html
We Fact-Checked The Daily Telegraph’s Rubbish About “Gender Whisperers” And Trans Kids
http://junkee.com/scott-morrison-gender-whisperer/174136
Prime Minister’s ‘gender whisperer’ comments deeply offensive and divisive
https://www.news.com.au/lifestyle/parenting/school-life/prime-ministers-gender-whisperer-comments-deeply-offensive-and-divisive/news-story/ca6cfafce5a713d0e3deac56897b922a
Scott Morrison confronted by transgender child on The Project
https://www.news.com.au/entertainment/tv/current-affairs/scott-morrison-confronted-by-transgender-child-on-the-project/news-story/b896352bda24f8147934d5ecc906e3f0
The oleaginous Mike Pence, with his talent for toadyism and appetite for obsequiousness, could, Trump knew, become America’s most repulsive public figure.
George Will, Trump is no longer the worst person in government, Washington Post
George Will really doesn’t like ‘oleaginous’ Mike Pence, but he loves big words
https://www.marketwatch.com/story/george-will-really-doesnt-like-oleaginous-mike-pence-but-he-loves-big-words-2018-05-10
Fire Devastates Brazil’s Oldest Science Museum
https://www.nationalgeographic.com/science/2018/09/news-museu-nacional-fire-rio-de-janeiro-natural-history/
The irreplaceable scientific treasures lost in Brazil’s National Museum blaze
https://elpais.com/elpais/2018/09/07/inenglish/1536314750_865530.html
Brazil’s Museum Fire Proves Cultural Memory Needs A Digital Backup
https://www.wired.com/story/brazil-museum-fire-digital-archives/
Think the museum fire in Brazil can’t happen here? Think again
http://www.latimes.com/opinion/op-ed/la-oe-mccormack-brazil-museum-fire-funding-20180909-story.html
Transcript
We’re working our way back through the archives. If you think we should prioritise a transcript of this episode, let us know!