www.theguardian.com/commentisfree/2023/aug/19/the-world-has-a-big-appetite-for-ai-but-we-really-need-to-know-the-ingredients
We’ve unleashed powerful new technologies and we don’t know their limits or their accuracy in using our data for our benefit. We do know the down side of CorporateControl of our data: misinformation, lies and cultural divides. Social Media is not neutral. The European Countries Regulate the Corporations and crimes of violence, suicide , hate crimes and abuse of data. We can copy their Regulations. We have a mechanism: The FCC Fair Decency Communications Act, Section 230.
Why would you want Corporations to control your data? Why would you want Corporations to control whether you get a loan, get school loans, get a job, see your health data? Shouldn’t you control your own data? That’s one argument.
The 2 nd argument, if Algorithms aren’t transparent and the aggregators of the digital and written word aren’t transparent, how do we know when they’ve trespassed on property rights or their giving you correct information? The simple answer, we don’t. All algorithms must be transparent for Regulation and understanding.
Without Regulation, our elections in jeopardy because we won’t be able to distinguish what’s real. The same is true of video reproductions. See Mission Impossible, ChatGPT makes it hard to believe what you see.
That’s the whole point, do we want a Corporation to continue controlling our world and our choices or do we take control.
#FightForDemocracy
#PoliticsAffectsUs
#CorporateAccountability
One of the oldest principles in computing is GIGO – garbage in, garbage out. It applies in spades to LLMs, in that they are only as good as the data on which they have been trained. But the AI companies are extremely tight-lipped about the nature of that training data. Much of it is obtained by web crawlers – internet bots that systematically browse the web. Up to now, ChatGPT and co have used the services of Common Crawl, a digital spider that traverses the web every month, collecting petabytes of data in the process and freely providing its archives and datasets to the public. But this training data inevitably includes large numbers of copyrighted works that are being hoovered up under “fair use” claims that may not be valid. So: to what extent have LLMs been trained on pirated material? We don’t know, and maybe the companies don’t either.
The same applies to the carbon footprint of these systems. At the moment we know three things about this. First, it’s big: in 2019 training an early LLM was estimated to emit 300,000 kg of CO2 – the equivalent of 125 round-trip flights between New York and Beijing; today’s models are much bigger. Second, companies rationalise these emissions by buying “offsets”, which are the contemporary equivalent of the medieval indulgences that annoyed Martin Luther. And third, the companies are pathologically secretive about the environmental costs of all this – as the distinguished AI researcher Timnit Gebru discovered.
There’s lots more where that came from, but the moral of the story is stark. We’re at a pivotal point in the human journey, having invented a potentially transformative technology. At its core are inscrutable machines owned by corporations that abhor transparency. We may be able to do little about the machines, but we can certainly do something about their owners. As the tech publisher Tim O’Reilly puts it: “Regulators should start by formalising and requiring detailed disclosure about the measurement and control methods already used by those developing and operating advanced AI systems.” They should. We need to know how these sausages are made.