AI TTS in 2024 = Drum Machines in 1980


I was just thinking about how AI-generated TTS engines seem to have replaced voice-over artists. I can imagine a very similar discussion happening in 1980 when Roger Linn introduced his first drum machine.

Roger intended the Linn LM-1 to be a rhythm backing machine that sounded better than the available Wurlitzers for session guitarists to rehearse with, but people started recording it and selling it on what would turn out to be multi-platinum records. At first, there was about a 2-year span of time where people went absolutely mad for this new piece of technology. It sounded like a live drummer, but it cost about half as much as a drummer would expect for their work. So, studios loved it on those grounds because they could replace expensive instrumentalists with a computer. Drummers were complaining that they couldn't get any work because they'd been replaced by the LM-1 and so they started offering their services as LM-1 programmers. Obviously, the machine's sound had some glaring flaws, such as a lack of any crash cymbals, something of a machine-gun repetition effect on rolls, an inability to play ghost rolls and snare diddles, and various other esoteric technical things. It took almost as much work to get an LM-1 to sound like a real drummer than it would have been to just use a real drummer.

This is exactly the same as what's currently happening with AI-generated voices. The first TTS engines were meant as screen readers and vocal replacements for people who couldn't speak, but as the technology improved, people started using them to record commercial spots and things for broadcast. Announcers can't get any work because they're getting replaced by somewhat-natural sounding TTS programs, so a lot of them are simply selling samples of their vocal patterns to AI startups. The programs have glaring flaws of their own, like mispronouncing words, an inability to rapidly shift tone-of-voice, no capacity for performance effects, like whispering or producing character voices, and other esoteric technical things. It takes actually more work to get an AI TTS to sound like a human announcer than it would to just hire a human announcer.

Like I said, for about a year, everyone went mad for the Linn LM-1, and loads of imitators showed up over the course of the '80s; company after company putting out acoustically-sampled drum machines, each possessing their own set of flaws and even compounding the ones inherent in the LM-1. But, after a period of experimentation, a clear distinction formed between musical genres that would use drum machines and those that would not. Ultimately, the drum machine did not wholly obsolete human drummers.

This is already happening with AI TTS. Divisions are forming between studios that will only use AI voices and those that prefer humans. Corporate media is never going to hire another voice-over artist to announce their commercials, that's patently obvious. However, independent producers who don't have the money for a subscription to whatever online AI TTS service tend to just announce their stuff themselves, or in the case of voice acting, they have friends who can talk and read from a script and they pay them in pizza and Pepsi. AI TTS will not wholly obsolete human announcers.

--7 June 2024--

HOME