ChatGPT can now ‘speak,’ listen and process images, OpenAI says

OpenAI’s ChatGPT, one of the most advanced language models, has received a significant update, giving it the ability to “see, hear, and speak.” This major update, the most significant since the introduction of GPT-4, enables users to engage in voice conversations on ChatGPT’s mobile app, offering a selection of five synthetic voices for the bot to respond with. Additionally, users can now share images with ChatGPT and highlight specific areas for analysis.

The new features are set to roll out to paying users over the next two weeks. While voice functionality will be available on the iOS and Android apps, image processing capabilities will be accessible on all platforms.

This update reflects the competitive landscape of the artificial intelligence arms race, particularly among major chatbot developers like OpenAI, Microsoft, Google, and Anthropic. Tech giants are striving to integrate generative AI into users’ daily lives by launching new chatbot apps and introducing innovative features.

Concerns have been raised about AI-generated synthetic voices, particularly regarding the potential for creating convincing deepfakes. OpenAI addressed these concerns by stating that the synthetic voices were created with voice actors the company directly worked with, as opposed to being collected from strangers. However, details about how OpenAI would use consumer voice inputs and the security measures in place were not elaborated in the announcement. The company’s terms of service assert that consumers own their inputs “to the extent permitted by applicable law.”