Home » Tips & Tricks » Can ChatGPT Transcribe Audio?

Can ChatGPT Transcribe Audio?

What exactly are ChatGPT's transcription capabilities?

Updated: Jul 19, 2023 10:43 am
Can ChatGPT Transcribe Audio?

WePC is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Prices subject to change. Learn more

ChatGPT, powered by the advanced multimodal models GPT-3.5 and GPT-4, has captivated users as a leading language model chatbot. But can it transcribe audio? The answer is yes. Using OpenAI‘s Whisper API, ChatGPT incorporates a Speech-to-Text function. Users can leverage the speech recognition algorithm to obtain corresponding text output by uploading audio files.

Transcription accuracy relies on factors such as audio quality and language complexity. With the ChatGPT API granting access to cutting-edge Whisper models, developers can harness state-of-the-art language and speech-to-text capabilities.

Early Users

ChatGPT and Whisper APIs have garnered attention from various companies and platforms.

Snap Inc., the creator of Snapchat, has introduced “My AI for Snapchat+” utilizing the ChatGPT API. This feature offers Snap chatters a customizable chatbot experience, providing recommendations and generating haikus.

Quizlet, a global learning platform, has collaborated with OpenAI for three years and now introduces “Q-Chat,” an adaptive AI tutor using the ChatGPT API to engage students with tailored questions based on study materials.

Instacart plans to launch “Ask Instacart” using ChatGPT and its AI and product data to provide inspirational and shoppable answers.

Shopify’s consumer app, Shop, employs ChatGPT API for its shopping assistant, offering personalized recommendations to users.

The Whisper API is also used by the AI-powered language learning app Speak to improve spoken fluency and deliver precise feedback.

ChatGPT API

The get-3.5-turbo model, the same model used in the ChatGPT product, is accessible through the ChatGPT API. Compared to earlier GPT models, it offers a more affordable alternative for $0.002 per 1,000 tokens. While GPT models traditionally process unstructured text, ChatGPT models work with sequences of messages and metadata. Chat Markup Language (ChatML) input is rendered as tokens for the model to consume. OpenAI has introduced a new endpoint for interacting with ChatGPT models, allowing developers to make requests and receive responses. To explore the capabilities of the ChatGPT API, detailed information can be found in the Chat guide.

OpenAI is dedicated to improving ChatGPT models for developers. The get-3.5-turbo model offers stability, and developers can choose specific versions. Dedicated instances provide more profound control over system performance, allowing developers to optimize their workload and reduce costs.

Whisper API

The Whisper API, with its speech-to-text capabilities, is now accessible and offers convenient on-demand access to the large-v2 model. It supports transcriptions and translations with faster performance compared to other services. Developers can use the provided endpoints and Python bindings to make requests. For dedicated instances or more information about the Whisper API, developers can head to the OpenAI website.

Developers Focus

Customers of OpenAI’s API have provided feedback that the company values and modifications have been made. Data provided over the API is no longer used without the user’s permission to improve the service. A 30-day minimum data retention period by default has been set up, with the option for a more extended period if necessary. The developer documentation has been revised, and the pre-launch review has been dropped.

The Terms of Service and Usage Policies make clear that users own the input and output of the models. Ensuring stability for use cases in production is the engineering team’s main objective.


Trusted Source

WePC’s mission is to be the most trusted site in tech. Our editorial content is 100% independent and we put every product we review through a rigorous testing process before telling you exactly what we think. We won’t recommend anything we wouldn’t use ourselves. Read more