Selling user data to artificial intelligence (AI) companies is simply mass surveillance under a new guise. People are rightfully worried about governmental mass surveillance. Yet most people are blithely unaware of the surveillance they sign up to when opening an account with a Web 2.0 company.
Harry Halpin is the CEO and co-founder of Nym Technologies.
Recently, we are all being forced to sign new “terms of service.” What most people don’t know is that these contracts allow their raw data to be sold to train AI models. The latest of these new data-heists is between Reddit and Google, where Reddit gives real-time data to Google for reportedly $60 million.
This hits me personally. The late Aaron Swartz, Reddit’s co-founder, would be spinning in his grave if he knew of this deal.
Just as soylent green ended up being made of people, AI models are actually made of data created by humans. Every time you contribute data to a platform like Reddit or Instagram, the company captures and owns it. They can then sell it all under the conditions to which you have “consented.” Of course, no one reads these terms: they are long, tedious and often purposefully inscrutable.
See also: What's at the Intersection of Crypto and AI? Perhaps Murder
Generative AI models compete on training data, and the more data the better. Yet some of this data may be copyrighted or even personal. No wonder there are many lawsuits by companies like the New York Times against OpenAI. While it’s true that AI models only keep statistical models of the data, the right prompt may elicit the actual underlying data itself. This can in turn reveal potentially private information.
A safer situation for everyone would be if AI companies trained only on publicly available data where the creator of the data gave consent, which only can be meaningful if the user controls their data.
The real problem is that when you put data on social media sites like Reddit, your data becomes the product. So even though you are creating the data, you have no control or ownership of it. By using the app, you’ve already legally “consented” to your own surveillance in order for you to enjoy the “free” privilege of using the platform.
The entire idea of Web3 was that users – not platforms – would own and control their data, even if, like a Reddit post, it is meant to be public. Ownership could be cryptographically inscribed in a decentralized blockchain so that no single platform could sell your data without your permission.
Sure, AI is exciting, yet we seem to have forgotten this vision in which users are remunerated for their own data. Although Reddit killed its tokenized community points program last October, do we really want to throw this vision out the window to welcome our new AI overlords?
See also: What Reddit's IPO Filing Says About Crypto Regulation | Opinion
Aaron Swartz, the co-founder of Reddit via a twisted history with Infogami and Y Combinator, was the greatest child prodigy of the internet generation. I knew him through his standards work on decentralizing social media with RSS and the Semantic Web at the World Wide Web Consortium at MIT, where I worked on WebCrypto and related standards.
Aaron was an incredibly kind and thoughtful programmer. He is most well known for his push for opening up government and research data to the public. Yet Aaron was also a staunch defender of personal privacy. He was interested in decentralizing Wikileaks via his work on DeadDrop (later SecureDrop), and even using a blockchain to decentralize domain names.
After selling Reddit, Aaron was convinced the future would require political change from inside the U.S. government. However, the very political system he hoped to reform drove him to suicide when the government charged him with 50 years of imprisonment for using MIT computers to access and download a massive amount of paywalled academic articles to share freely.
I suspect that like me, Aaron would be personally excited by AI. I equally believe he would be supportive of a world where zero-knowledge proofs and mixnets defend citizens against government corruption and corporate overreach. He would want a world where publicly-funded data is free to access and use, but where ordinary people have a choice to protect and control their own data.
As the cypherpunks say: “Transparency for the powerful, privacy for the weak.”