Do you know where your data is?
Language researchers can learn a lot from publicly available internet data. But what are the ethical issues surrounding the collection and use of this information? What about data that comes from home assistants? Does it matter if the data is anonymised?
We’re taking to two researchers who know their way around these issues on this episode of Talk the Talk.
Listen to this episode
You can listen to all the episodes of Talk the Talk by pasting this URL into your podlistener.
http://danielmidgley.com/talkthetalk/talk_classic.xml
Full interview
Hannah Rashkin and Maarten Sap are students of Natural Language Processing from the University of Washington. They’ve been working on big projects we all use, and they come with a unique insight into the ethical responsibilities that comes with crowdsourced data.
What conversations are researchers having about ethics in big data? Is there anything we should be doing? Daniel Midgley asks all this and more.
Thanks to Hannah, Maarten, Emily Bender, and the Association for Computational Linguistics for making this interview possible.
Also at https://www.patreon.com/posts/21735806
Cutting Room Floor
some NSFW language
Daniel and Hedvig are talking about babies and child language. And how do people wave in Italy and Greece?
Belief in Santa Claus has an elaborate support network in the Netherlands. And we nerd out on Washingtonian speaking patterns.
Most importantly, will Hedvig ever get to program a song for the show? Maybe one day, but she can definitely slip a few Swedish words into our Words of the Week.
Also at https://www.patreon.com/posts/21816232
Patreon supporters
When we see our list of patrons, it feels like “There are now 87 people who believe in you, and are interested in what you’re doing.” It’s the kind of thing that keeps you going.
We’d like to create these patrons, especially.
- Jerry
- Nicki
- Termy
- Ann
- Helen
- Jack
- Matt
- Sabrina
Thanks, all.
We’re Because Language now, and you can become a Patreon supporter!
Depending on your level, you can get bonus episodes, mailouts, shoutouts, come to live episodes, and of course have membership in our Discord community.
Show notes
A little labeling goes a long way: Infants can use a few labeled examples to spark the acquisition of object categories — ScienceDaily
https://www.sciencedaily.com/releases/2018/09/180919133006.htm
A little labeling goes a long way: Semi‐supervised learning in infancy [$$$]
https://onlinelibrary.wiley.com/doi/abs/10.1111/desc.12736
Study: Infants Use Same Gestures as Chimpanzees | Biology | Sci-News.com
http://www.sci-news.com/biology/infants-gestures-06405.html
Wave (gesture) – Wikipedia
https://en.wikipedia.org/wiki/Wave_(gesture)#African_culture
Neural Coreference – Hugging Face
https://huggingface.co/coref/
State-of-the-art neural coreference resolution for chatbots
https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
Mitkov: Anaphora Resolution: The State Of The Art (PDF)
https://pdfs.semanticscholar.org/e782/00b1e3ba2a72de1ca9b9b2c5efa775151bfa.pdf
I will not die, not yet, my friend – TranslationParty
http://www.translationparty.com/i-will-not-die-not-yet-my-friend-13404900
Analysis of billions of Twitter words reveals how American English develops
https://phys.org/news/2018-09-analysis-billions-twitter-words-reveals.html
New Approaches to Ethno-Linguistic Maps
http://humans-who-read-grammars.blogspot.com/2017/06/new-approaches-to-ethno-linguistic-maps.html
Siri’s Dead Body Joke Is Evidence in a Murder Trial
https://mashable.com/2014/08/13/siri-hide-body-murder-trial/
Siri Won’t Help You Hide Your Victims’ Bodies Anymore – ANIMAL
http://animalnewyork.com/2014/siri-will-longer-help-hide-bodies-murdered-roommates/
Your Tweets Are Somehow Worthy Of Scientific Study | FiveThirtyEight
https://fivethirtyeight.com/features/your-tweets-are-somehow-worthy-of-scientific-study/
“Participant” Perceptions of Twitter Research Ethics – Casey Fiesler, Nicholas Proferes, 2018
http://journals.sagepub.com/doi/10.1177/2056305118763366
Many social media users unaware researchers study their data
https://phys.org/news/2018-04-social-media-users-unaware.html
Ethics in machine-learning, natural language processing, and AI
https://medium.com/@TSchnoebelen/ethics-in-machine-learning-natural-language-processing-and-ai-609277a66c01
The Ethical Challenges of Publishing Twitter Data for Research Dissemination
https://dl.acm.org/citation.cfm?id=3091489
Ahmed, W., Bath, P. and Demartini, G. (2017) Chapter 4 Using Twitter as a Data Source: An Overview of Ethical, Legal, and Methodological Challenges. (PDF)
http://eprints.whiterose.ac.uk/126729/8/Normal_-Ethics_Book_Chapter_WA_PB_GD_Peer_Review_comments_implemented__1.pdf
Data journalism and the ethics of publishing Twitter data | News & Analysis | Data Driven Journalism
http://datadrivenjournalism.net/news_and_analysis/data_journalism_and_the_ethics_of_publishing_twitter_data
Proposed guidelines for the ethical use of Twitter for research. | Download Scientific Diagram
https://www.researchgate.net/figure/Proposed-guidelines-for-the-ethical-use-of-Twitter-for-research_fig1_266813208
Once Again With Feeling: ‘Anonymized’ Data Isn’t Really Anonymous | Techdirt
https://www.techdirt.com/articles/20170803/14480037916/once-again-with-feeling-anonymized-data-isnt-really-anonymous.shtml
Your ‘anonmyized’ web browsing history may not be anonymous — ScienceDaily
https://www.sciencedaily.com/releases/2017/01/170119134540.htm
Kate Manne: Brett Kavanaugh and America’s ‘Himpathy’ Reckoning
https://www.nytimes.com/2018/09/26/opinion/brett-kavanaugh-hearing-himpathy.html
Ford v Kavanaugh: an American horror story on live TV | Suzanne Moore | Opinion | The Guardian
https://www.theguardian.com/commentisfree/2018/sep/28/brett-kavanaugh-christine-blasey-ford-senate-supreme-court
The Women Who Confronted Jeff Flake In An Elevator Spoke Up About Why They Did It
https://www.bustle.com/p/the-women-who-confronted-jeff-flake-in-elevator-spoke-up-about-why-they-did-it-12098355
The resistance to Donald Trump is not what you think – The Globe and Mail
https://www.theglobeandmail.com/opinion/article-the-resistance-to-donald-trump-is-not-what-you-think/
MINERVA-II1: Images from the surface of Ryugu | Topics | JAXA Hayabusa2 project
http://www.hayabusa2.jaxa.jp/en/topics/20180927e_MNRV/
Hayabusa Sends Back Photos and Video From Surface of Asteroid Ryugu – ExtremeTech
https://www.extremetech.com/extreme/277854-hayabusa-sends-back-photos-and-video-from-the-surface-of-asteroid-ryugu
Touchdown: Hayabusa 2 Deploys Rovers to Explore Ryugu – Sky & Telescope
https://www.skyandtelescope.com/astronomy-news/touchdown-hayabusa-2-deploys-rovers-explore-ryugu/
Transcript
We’re working our way back through the archives. If you think we should prioritise a transcript of this episode, let us know!