Home » Tips & Tricks » How is Chat GPT trained?

How is Chat GPT trained?

What exactly is Chat GPT learning from?

Updated: Feb 20, 2023 11:34 am
How is Chat GPT trained?

WePC is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Prices subject to change. Learn more

If you are familiar with ChatGPT, you may have heard that it is trained on a vast corpus of data. But what exactly does this mean? In this article, we will delve into the intricacies of how is ChatGPT trained?”

ChatGPT is a pre-trained language model that has been adjusted through a combination of supervised and reinforcement learning techniques. The training process of ChatGPT involved inputting a large amount of text data into the model and adjusting its parameters so that it can generate text similar to the text in the training corpus.

The unsupervised learning approach was used for this process, meaning the model was not given explicit feedback on whether its generated text was correct or incorrect. Instead, the model adjusts its parameters based on the likelihood of the generated text being similar to the text in the training corpus.

GPT-3, the parent model of ChatGPT-3, is one of the largest language models ever created, with 175 billion parameters and a 2048-token-long context. It is trained on hundreds of billions of words from Common Crawl, WebText2, Books1/2, Wikipedia in English, and examples of code in CSS, JSX, Python, and other programming languages.

The training method used for GPT-3 is generative pretraining, meaning it is trained to predict the next token or word in the input sentence.

READ NOW: Best Chat GPT alternative

Supervised learning

The ChatGPT model was fine-tuned through a process of supervised learning by human trainers. These trainers engaged in conversations, taking on both the role of the user and the AI assistant.

They were given suggestions from the model to guide them in composing their responses, which were then mixed with the InstructGPT dataset that had been converted into a dialogue format.

Reinforcement learning

The model was further improved through reinforcement learning by using Proximal Policy Optimization (PPO). Human trainers evaluated responses generated by the model from a previous conversation and used those evaluations to develop reward models. The model was then fine-tuned based on these reward models.

The process of fine-tuning was done several times to achieve better performance. PPO algorithms are cost-effective compared to other algorithms and have faster performance, making them ideal for this process.

OpenAI continues to collect information from users who interact with ChatGPT, which can then be utilized to enhance and refine the model further.

The users have the option to vote on ChatGPT’s responses by either upvoting or downvoting, and they also have the opportunity to offer extra feedback. This data is used to improve the performance of the model further and make it better at generating human-like text.

Data used to Train the model

ChatGPT-3 is a language model fine-tuned from the GPT-3.5 series, which was trained using an Azure AI supercomputing infrastructure. It was trained on a massive amount of text scraped from the internet, which includes books, chat forums, articles, websites, Academic papers, code, and other sources.

The corpus of text data used for training ChatGPT-3 was over 45Terabyte in size, which is extremely large and contributes to the model’s ability to generate texts that are similar to what a journalist or author might produce.


Shaun, with a computer science degree and 15 years of computer experience, has been passionate about competitive FPS gaming since the mid-2000s.

Trusted Source

WePC’s mission is to be the most trusted site in tech. Our editorial content is 100% independent and we put every product we review through a rigorous testing process before telling you exactly what we think. We won’t recommend anything we wouldn’t use ourselves. Read more