Use AI to build a complete podcast episode - no mics required
We break down how you can build a podcast episode entirely with AI tools. Get ready for everything to change.
I was in Athens earlier this year. Once I arrived I wasn’t sure entirely why I was there. Ostensibly I was meant to be covering an assassination that had happened two years previously, but quickly I was told that there was a reason this story hadn’t been covered. It’s the only time I have been explicitly told not to cover a story for my safety and actually taken that advice.
So, for the next couple of months I walked around the city, hung out with great journalists and ate a crazy amount of feta. But I was also shaken by the encroaching world of AI and was slightly panicked with where media was going. I hunkered down in my Airbnb and started talking to ChatGPT - the world’s new best friend.
This was back in April 2023. Since then much has been written and said about AI and what it will mean for every industry, including media. But I had a dream of building an application that could produce a podcast episode from scratch without even needing to record a human voice. I imagined a version of this podcast for Athens, one for Johannesburg, another for London… the day’s news produced by robots and voiced synthetically.
I spoke to ChatGPT more than anyone else that month. I used Google’s Colab to code (in Python) and had a repetitive, occasionally frustrating and generally exhilarating time.
I managed to get a working script together that would spit out a complete ten minute podcast episode recounting the daily news from Johannesburg, in a discussion format between three trained imaginary voices.
I had the idea that one of the presenters would always be predicting what would come next in the news - so after the two main hosts recounted the details of a story they would pass it to the third host for his predictions of where the story would go. This was because ChatGPT was stronger at creating imaginary narratives than pumping out facts.
The program spoke to Google News, found the top stories of the day and scraped prominent sites for material. ChatGPT couldn’t produce a script from straight wesbite text, so I first had to ask it to convert the news articles into lists of facts and then build a script from the facts. The difficulty was getting each host to then be paired with a different synthetic voice. So, I needed to break the script up into its different lines of dialogue and then send each line to its appropriate synthetic voice emulator. You then pull all the lines back as dozens of small MP3s, stitch them all back together and spit it out as a complete MP3.
There were a couple of snags - the cost of synthetic voices (charged per character) was too high. I had envisioned a dystopian factory of a hundred podcast episodes a day - no one even listening to the content before they were published. This was not going to be possible. The other issue was the speed of producing each episode wasn’t as quick as a few clicks and then it was done. You had to send the prompts back to ChatGPT sometimes multiple times to get a coherent script. I had budgeted 2 minutes per episode, but it was closer to ten (which is still impressive for ten minutes of scripted content). The third snag was that the podcast episode was unforgivingly boring… this last one, for someone who creates podcasts, was almost a relief. However, the “chatter” among the three hosts was unnervingly authentic and the audio sounded incredibly real. I knew there was huge potential, but it wasn’t a fix all solution.
I eventually left Athens. I booked a flight to Cape Town (where I went to High School and University) and have settled back here. It’s been five months and since then AI has exploded. I've started Develop AI and you can already listen to a podcast episode developed with the AI tool described in this letter using this newsletter as a basis. We will be releasing this app to the public soon so you can produce your own automated podcast episode.
We already have a WhatsApp disinformation bot called Uni (that I talked about in the last newsletter). Try it out and then join our WhatsApp Community and tell us what you think.
A version of this article also appeared in The Podcast Sessions.
This week’s AI tool for journalists to use…
Eleven Labs is by far the best AI text to voice service on the market (I used it in my podcast AI tool described above). And you need surprisingly little audio to clone your own voice and use it in your daily life. However, Microsoft and Open AI have swiftly entered the fray by giving ChatGPT ears and a mouth this week. Listen to samples here.
Coding Corner (the gradual process of a journalist learning how to code)
This is a little route one, but Google’s Colab is the place to start when coding.
The advantages:
1) No setup required: You can run your Python code directly in your browser.
2) Easy sharing: It's integrated with Google Drive so you can share your work with others and access your notebook from any device.
3) Support for various libraries and frameworks: Colab supports TensorFlow, PyTorch, Keras, and OpenCV. This makes it perfect for Machine Learning and Data Analysis tasks.
To read about Colab’s disadvantages visit our coding blog.
What AI was used in creating this newsletter?
This week is a moderate win for the humans. AI was used for the photo (created with a free service from Bing) and to summarise the Coding Corner section.
In the news…
Spotify is starting a service that will translate podcast episodes and spit them out in the language of your choice BUT keep them in the original voice of the host. I’m a huge fan of The Ringer Podcast Network and Bill Simmons (who sold to Spotify in 2020 and consulted on this process). Here are the examples done so far. It is absolutely bananas.
A questionably optimistic report was published this week by Boston Consulting Group and Microsoft (who own a chunk of Open AI) which concluded that AI “can address some of South Africa’s most pressing challenges” and “transform the lives of South African citizens”. Expect more of this type of biased pandering from the people profiting from this tech in the months to come.
What’s new at Develop AI? We are speaking on a panel at Jamfest in Johannesburg
I am going to Jamfest (an African media and information festival). I was part of the Jamlab accelerator program back in 2017 so I know these folk well (and really respect what they do). I’ll be on a panel talking about best practises of AI in Journalism on the 17th of October at Tshimologong in Braamfontein.
Get in touch (by replying to this mail) if you would like to chat to us about training workshops or building you a chatbot or AI solution for your newsroom.
You can already listen to a podcast episode developed with the AI tool described in this letter using this newsletter as a basis.
See you next week. All the best,
Join our WhatsApp community here, visit our website or contact us on X, Threads, LinkedIn or Instagram. Physically we are based in Cape Town, South Africa.
Howzit from Copenhagen @Paul!
I was wondering about the audio in the podcast - it seems to sound distorted most of the time and the levels go up and down both between hosts and during clips. Do you know why that is? It seems weird that the audio would not be perfectly levelled when it is generated?