AI & Data Incest - What is the Solution?

PLUS How Kenyan journalists are moving forward with AI in a HUGE way

Jul 17, 2024

My brother is a biologist and he said early on in the recent AI revolution that AI models are in danger of going the way of a small cult that only sleeps with their cousins: total incestuous break down. He predicted that the Internet will become so clogged with AI generated content that eventually the models will only be fed on their own “inhuman” content and seize up. Just over a year later, this is rumoured to be why ChatGPT 5 has been postponed, because it is scraping mostly AI content and that is no good at progressing the model.

I envision a special kind of dystopia where we have farms of people on terminals not connected to the Internet creatively writing for the models. They can write what they want as the content will be only used in the abstract, to train the new models. It is like in high school when you were asked to write so many pages or words on anything you liked as a punishment. And actually working in the farm would be a great job: the more creative your writing (judged succinctly by the AI model) the more you would get paid per hour. This would be to stop people just Shining it in and writing the same sentence again and again.

The other consequence of this “data incest” problem is OpenAI will need to invent a tool that properly detects AI content (there are a range of tools already available that are featured below). Currently, these tools don’t work too well, but how else is the new ChatGPT going to sniff out fresh content unless they create one that works? And that magic tool could be licensed out to academics and high schools in their pursuit to police their students.

On that issue, academia is going to need to meet their students half way. They can’t ban the use of AI completely, but they might be able to introduce a GPT powered by the course materials, one that is accurate and possible to monitor behind the scenes, that students would be allowed to use. There could be a score given for how intelligently a student interacts with the bot. And if they use AI outside of this bot then they would be punished. And the students would be motivated to stick with the in-house bot as it would actually give better answers on what you needed to know (if any academics would like help building a model like this, get in touch).

The larger issue of how we guide these models to help us learn and be better people is complicated. When I was 11-years-old we had Encarta Encyclopedia on CD ROM. We had recently moved to South Africa and no teacher at our primary school had heard of it. For every project I copy and pasted from Encarta and got incredible marks (shout out to Mr Jakins who was completely clueless). A year later the teachers became savvy, I was never caught, but general warnings were issued and I went back to writing the projects myself. My point is: we are in an Encarta bubble and it isn’t on the kids to figure out this problem… and, let’s face it, the skills they are learning on how to work the system are probably more valuable than the facts they were meant to be learning anyway.

This week’s AI tools for people to use… detecting AI with AI

Here is a selection of AI detection tools that I have been testing. I wish I had better news, but they don’t work as well as we need them to in order to “trust” them, but it is good to know what is out there.

DEEPFAKE-O-METER. A platform that claims to be able to detect deepfakes.

GPTZero. The guys who claim to have invented AI detection. They still have a long way to go.

Deepware Scanner. Claims to detect AI-generated images, videos, and deepfakes. It uses advanced algorithms to identify manipulated media.

Sensity. Focuses on detecting deepfakes and other AI-generated media that could be used for malicious purposes. It is also very hit and miss.

5 Lessons on AI from journalists in Kenya

I recently conducted two AI workshops for The Nation Media Group and Royal Media Services in Kenya with the incredible Camilla Bath. Special thanks to The U.S. Agency for Global Media for hosting us. I was there to teach, but I learnt a great deal. Here are my top 3 takeaways:

There is a concern that if media houses disclose that they are using AI then their content will be associated with being “fake”. Already “AI” has become interchangeable with “untrue” in the culture and though we want the media to be transparent with how they are using AI, it is reasonable to worry about how audiences will take it.
There is a huge need to use AI for African language translation and to collaborate as much as possible to make these models a reality.
The phenomenon of cloned voices, particularly in the context of the recent protests in Kenya, needs to be kept in check. And the best way to do this is to educate audiences with how far this tech has come and how it is being used to trick people. Plus, families are adopting “safe words” to use in emergencies and if you don’t hear that word then you should assume that the voice note you receive is synthetic.

What AI was used in creating this newsletter?

I have finally migrated to Midjourney. I used it to created the picture above and though it is fiddly to run through Discord (and there is no free tier) I am never going back.

Pay $5 to support this newsletter

For $5 a month you can help support this newsletter (and keep it free for everyone).

A comment from one of our paid subscribers

What is happening at Develop AI?

Develop AI will be at Radiodays Asia in September to give a comprehensive workshop on AI and podcasting. See you there!

Develop AI will be giving a workshop on AI and podcasting at Radiodays Asia

See you next week. All the best,

Develop Al is an innovative company that builds AI focused projects, reports on AI and provides training on how to use AI responsibly.

Check out Develop AI’s press and conference appearances.

Look at our training workshops (and see how your team could benefit from being trained in using AI).

This newsletter is syndicated to millions of people on the Daily Maverick.

Email me directly on paul@developai.co.za. Or find me on our WhatsApp Community.

Physically we are based in Cape Town, South Africa.

For $5 a month you can help support this newsletter (and keep it free for everyone).

Our sister company Develop Audio creates investigative podcasts like Asylum.

Develop AI by Paul McNally

Discussion about this post