Providence: The voice Alexis “Lexi” Bogan had before last summer was exuberant. She loved singing Taylor Swift and Zach Bryan ballads in the car. She laughed all the time, even while she corralled misbehaving preschoolers or debated politics with friends around a backyard campfire. In high school, she was a soprano in the choir.
Then that voice disappeared.
In August, doctors removed a potentially deadly tumor lodged near the back of his brain. When the breathing tube came out a month later, Bogan had trouble swallowing and struggled to greet her parents. Months of rehabilitation helped his recovery, but his speech is still affected. Friends, strangers, and her own family struggle to understand what she is trying to tell them.
In April, the 21-year-old regained her old voice. She is not the real one, but a voice clone generated by artificial intelligence which you can summon from a phone app. Trained on a 15-second time capsule of her teenage voice, sourced from a cooking demonstration video she recorded for a high school project, her synthetic but surprisingly real-sounding AI voice can now say almost anything she wants. want.
You type a few words or sentences on your phone and the app instantly reads them out loud.
“Hi, could I have an espresso shaken with oat milk and iced brown sugar?” Bogan’s AI voice said as he held his phone out his car window at a Starbucks drive-thru.
Experts have warned that rapidly improving AI voice cloning technology can amplify phone scams, disrupt democratic elections and violate the dignity of people, living or dead, who never consented to it being recreated. their voice to say things they never said.
It has been used to produce fake robocalls to New Hampshire voters imitating President Joe Biden. In Maryland, authorities recently charged a high school athletic director with using artificial intelligence to generate a fake audio clip of the school principal making racist comments.
But Bogan and a team of doctors at Lifespan Hospital Group in Rhode Island believe they have found a use that justifies the risks. Bogan is one of the first people, the only one with her condition, who has been able to recreate a lost voice with OpenAI’s new Voice Engine. Some other AI providers, such as startup ElevenLabs, have tested similar technology for people with speech impairments and loss, including a lawyer who now uses her voice clone in the courtroom.
“We expect Lexi to be a pioneer as the technology develops,” said Dr. Rohaid Ali, a neurosurgery resident at Brown University School of Medicine and Rhode Island Hospital. Millions of people with debilitating strokes, throat cancer or neurogenerative diseases could benefit, he said.
“We must be aware of the risks, but we cannot forget about the patient and the social good,” said Dr. Fatima Mirza, another resident working on the pilot project. “We can help Lexi regain her true voice and she can speak in terms truer to herself.”
Mirza and Ali, who are married, caught the attention of OpenAI, creator of ChatGPT, due to their previous research project at Lifespan that used the AI chatbot to simplify medical consent forms for patients. The San Francisco company reached out earlier this year while searching for promising medical applications for its new AI speech generator.
Bogan was still slowly recovering from surgery. The illness began last summer with headaches, blurred vision and a droopy face, alarming doctors at Hasbro Children’s Hospital in Providence. They discovered a vascular tumor the size of a golf ball pressing on his brainstem and entangling blood vessels and cranial nerves.
“It was a battle to control the bleeding and remove the tumor,” said pediatric neurosurgeon Dr. Konstantina Svokos.
The 10-hour surgery duration, along with the location and severity of the tumor, damaged Bogan’s tongue muscles and vocal cords, impeding his ability to eat and speak, Svokos said.
“It’s almost like a part of my identity was taken away from me when I lost my voice,” Bogan said.
The feeding tube came out this year. Speech therapy continues, allowing him to speak intelligibly in a quiet room, but with no signs of him regaining the full lucidity of his natural voice.
“At some point, I started to forget what it sounded like,” Bogan said. “I’ve been getting really used to how I sound now.”
Every time the phone rang at the family home in North Smithfield, a suburb of Providence, she would hand it to her mother to take her calls. She felt like she was burdening her friends every time they went to a loud restaurant. Her father, who has hearing loss, struggled to understand her.
Back at the hospital, doctors were looking for a pilot patient to experiment with the OpenAI technology.
“The first person that came to mind for Dr. Svokos was Lexi,” Ali said. “We reached out to Lexi to see if she would be interested, not knowing what her response would be. She was willing to try it and see how it worked for her.”
Bogan had to go back a few years to find a suitable recording of his voice to “train” the artificial intelligence system how to speak. It was a video in which he explained how to make a pasta salad.
Their doctors intentionally fed the AI system just a 15-second clip. The sounds of the kitchen make other parts of the video imperfect. It was also everything OpenAI needed: an improvement over previous technology that required much larger samples.
They also knew that getting something useful in 15 seconds could be vital for any future patients who have no trace of their voice on the Internet. A brief voice message left for a family member may be sufficient.
When they tried it for the first time, everyone was stunned by the quality of the voice clone. The occasional glitches (a mispronounced word, a missing intonation) were mostly imperceptible. In April, doctors equipped Bogan with a personalized phone app that only she can use.
“I get so emotional every time I hear his voice,” said his mother, Pamela Bogan, with tears in her eyes.
“I think it’s great to be able to have that sound again,” he added. Lexi Bogansaying that it helped “boost my confidence to where I was before all this happened.”
He now uses the app about 40 times a day and provides feedback that he hopes will help future patients. One of her first experiments was talking to the children at the preschool where she works as an assistant teacher. She typed “ha ha ha,” expecting a robotic response. To her surprise, it sounded like her old laugh.
Used it at Target and Marshall’s to ask where to find items. She has helped him reconnect with her father. And she has found it easier to order fast food.
Bogan’s doctors have begun cloning the voices of other willing patients in Rhode Island and hope to bring the technology to hospitals around the world. OpenAI said it is moving forward cautiously in expanding use of Voice Engine, which is not yet publicly available.
Several smaller AI startups already sell voice cloning services to entertainment studios or make them more widely available. Most speech generation providers say they prohibit spoofing or abuse, but they vary in how they enforce their terms of use.
“We want to make sure that everyone whose voice is used in the service continually gives consent,” said Jeff Harris, product lead at OpenAI. “We want to make sure it’s not used in political contexts. That’s why we’ve adopted a strategy of being very limited in who we give the technology to.”
Harris said OpenAI’s next step involves developing a secure “voice authentication” tool so users can replicate only their own voice. That could be “limiting for a patient like Lexi, who had a sudden loss of her speech ability,” she said. “So we think we’re going to need to have high-trust relationships, especially with medical providers, to give a little more unlimited access to the technology.”
Bogan has impressed his doctors with his focus on thinking about how technology could help other people with similar or more severe speech impairments.
“Part of what he’s done throughout this whole process is think about ways to modify and change this,” Mirza said. “She has been a great inspiration to us.”
Although for now you have to manipulate your phone for the voice engine to speak, Bogan envisions an AI voice engine that improves on older remedies for speech recovery, such as the robotic-sounding electrolarynx or a voice prosthesis, by merging with the body. or translate words in real time.
She’s less sure what will happen as she gets older, and her AI voice still sounds like it did when she was a teenager. Perhaps the technology could “age” her AI voice, she said.
For now, “even though I haven’t fully recovered my voice, I have something to help me find it again,” he said. (AP) SPG