Do you know where your data is?

Language researchers can learn a lot from publicly available internet data. But what are the ethical issues surrounding the collection and use of this information? What about data that comes from home assistants? Does it …