Do you know where your data is?

Language researchers can learn a lot from publicly available internet data. But what are the ethical issues surrounding the collection and use of this information? What about data that comes from home assistants? Does it matter if the data is anonymised?

We’re taking to two researchers who know their way around these issues on this episode of Talk the Talk.


Listen to this episode

Download this episode

You can listen to all the episodes of Talk the Talk by pasting this URL into your podlistener.

http://danielmidgley.com/talkthetalk/talk_classic.xml

Full interview

Interview with Hannah Rashkin and Maarten Sap (complete)

Hannah Rashkin and Maarten Sap are students of Natural Language Processing from the University of Washington. They’ve been working on big projects we all use, and they come with a unique insight into the ethical responsibilities that comes with crowdsourced data.

What conversations are researchers having about ethics in big data? Is there anything we should be doing? Daniel Midgley asks all this and more.

Thanks to Hannah, Maarten, Emily Bender, and the Association for Computational Linguistics for making this interview possible.

Also at https://www.patreon.com/posts/21735806


Cutting Room Floor

Cutting Room Floor 340: Ethics

some NSFW language

Daniel and Hedvig are talking about babies and child language. And how do people wave in Italy and Greece?

Belief in Santa Claus has an elaborate support network in the Netherlands. And we nerd out on Washingtonian speaking patterns.

Most importantly, will Hedvig ever get to program a song for the show? Maybe one day, but she can definitely slip a few Swedish words into our Words of the Week.

Also at https://www.patreon.com/posts/21816232


Patreon supporters

When we see our list of patrons, it feels like “There are now 87 people who believe in you, and are interested in what you’re doing.” It’s the kind of thing that keeps you going.

We’d like to create these patrons, especially.

  • Jerry
  • Nicki
  • Termy
  • Ann
  • Helen
  • Jack
  • Matt
  • Sabrina

Thanks, all. 

We’re Because Language now, and you can become a Patreon supporter!
Depending on your level, you can get bonus episodes, mailouts, shoutouts, come to live episodes, and of course have membership in our Discord community.

Become a Patron!

Show notes

A little labeling goes a long way: Infants can use a few labeled examples to spark the acquisition of object categories — ScienceDaily
https://www.sciencedaily.com/releases/2018/09/180919133006.htm

A little labeling goes a long way: Semi‐supervised learning in infancy [$$$]
https://onlinelibrary.wiley.com/doi/abs/10.1111/desc.12736

Study: Infants Use Same Gestures as Chimpanzees | Biology | Sci-News.com
http://www.sci-news.com/biology/infants-gestures-06405.html

Wave (gesture) – Wikipedia
https://en.wikipedia.org/wiki/Wave_(gesture)#African_culture

Neural Coreference – Hugging Face
https://huggingface.co/coref/

State-of-the-art neural coreference resolution for chatbots
https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30

Mitkov: Anaphora Resolution: The State Of The Art (PDF)
https://pdfs.semanticscholar.org/e782/00b1e3ba2a72de1ca9b9b2c5efa775151bfa.pdf

I will not die, not yet, my friend – TranslationParty
http://www.translationparty.com/i-will-not-die-not-yet-my-friend-13404900

Analysis of billions of Twitter words reveals how American English develops
https://phys.org/news/2018-09-analysis-billions-twitter-words-reveals.html

New Approaches to Ethno-Linguistic Maps
http://humans-who-read-grammars.blogspot.com/2017/06/new-approaches-to-ethno-linguistic-maps.html

Siri’s Dead Body Joke Is Evidence in a Murder Trial
https://mashable.com/2014/08/13/siri-hide-body-murder-trial/

Siri Won’t Help You Hide Your Victims’ Bodies Anymore – ANIMAL
http://animalnewyork.com/2014/siri-will-longer-help-hide-bodies-murdered-roommates/

Your Tweets Are Somehow Worthy Of Scientific Study | FiveThirtyEight
https://fivethirtyeight.com/features/your-tweets-are-somehow-worthy-of-scientific-study/

“Participant” Perceptions of Twitter Research Ethics – Casey Fiesler, Nicholas Proferes, 2018
http://journals.sagepub.com/doi/10.1177/2056305118763366

Many social media users unaware researchers study their data
https://phys.org/news/2018-04-social-media-users-unaware.html

Ethics in machine-learning, natural language processing, and AI
https://medium.com/@TSchnoebelen/ethics-in-machine-learning-natural-language-processing-and-ai-609277a66c01

The Ethical Challenges of Publishing Twitter Data for Research Dissemination
https://dl.acm.org/citation.cfm?id=3091489

Ahmed, W., Bath, P. and Demartini, G. (2017) Chapter 4 Using Twitter as a Data Source: An Overview of Ethical, Legal, and Methodological Challenges. (PDF)
http://eprints.whiterose.ac.uk/126729/8/Normal_-Ethics_Book_Chapter_WA_PB_GD_Peer_Review_comments_implemented__1.pdf

Data journalism and the ethics of publishing Twitter data | News & Analysis | Data Driven Journalism
http://datadrivenjournalism.net/news_and_analysis/data_journalism_and_the_ethics_of_publishing_twitter_data

Proposed guidelines for the ethical use of Twitter for research. | Download Scientific Diagram
https://www.researchgate.net/figure/Proposed-guidelines-for-the-ethical-use-of-Twitter-for-research_fig1_266813208

Once Again With Feeling: ‘Anonymized’ Data Isn’t Really Anonymous | Techdirt
https://www.techdirt.com/articles/20170803/14480037916/once-again-with-feeling-anonymized-data-isnt-really-anonymous.shtml

Your ‘anonmyized’ web browsing history may not be anonymous — ScienceDaily
https://www.sciencedaily.com/releases/2017/01/170119134540.htm

Kate Manne: Brett Kavanaugh and America’s ‘Himpathy’ Reckoning
https://www.nytimes.com/2018/09/26/opinion/brett-kavanaugh-hearing-himpathy.html

Ford v Kavanaugh: an American horror story on live TV | Suzanne Moore | Opinion | The Guardian
https://www.theguardian.com/commentisfree/2018/sep/28/brett-kavanaugh-christine-blasey-ford-senate-supreme-court

The Women Who Confronted Jeff Flake In An Elevator Spoke Up About Why They Did It
https://www.bustle.com/p/the-women-who-confronted-jeff-flake-in-elevator-spoke-up-about-why-they-did-it-12098355

https://twitter.com/mgallagher822/status/1045742922291970052

The resistance to Donald Trump is not what you think – The Globe and Mail
https://www.theglobeandmail.com/opinion/article-the-resistance-to-donald-trump-is-not-what-you-think/

MINERVA-II1: Images from the surface of Ryugu | Topics | JAXA Hayabusa2 project
http://www.hayabusa2.jaxa.jp/en/topics/20180927e_MNRV/

Hayabusa Sends Back Photos and Video From Surface of Asteroid Ryugu – ExtremeTech
https://www.extremetech.com/extreme/277854-hayabusa-sends-back-photos-and-video-from-the-surface-of-asteroid-ryugu

Touchdown: Hayabusa 2 Deploys Rovers to Explore Ryugu – Sky & Telescope
https://www.skyandtelescope.com/astronomy-news/touchdown-hayabusa-2-deploys-rovers-explore-ryugu/


Transcript

We’re working our way back through the archives. If you think we should prioritise a transcript of this episode, let us know!