Research says ChatGPT is dying, but some experts aren’t sure

Benj Edwards/Getty Images

On Tuesday, researchers from Stanford University and the University of California, Berkeley published a research paper which aims to show change GPT-4effect on time. This paper supports the popular—but unproven—belief that AI language quality has grown exponentially in writing and programming over the past few months. Some experts are not satisfied with the results, but say that the lack of information points to a serious problem with the way OpenAI handles its output.

In a study titled “How Has ChatGPT Behavior Changed Over Time?” published on arXiv, Lingjiao Chen, Matei Zaharia, and James Zou, questions the performance of the main OpenAI language versions (LLMs), specifically GPT-3.5 and GPT-4. Usage API access, tested the March and June 2023 models on tasks such as solving math problems, answering complex questions, coding, and visual reasoning. In particular, GPT-4’s ability to identify large numbers is said to have dropped significantly from 97.6 percent accuracy in March to 2.4 percent in June. Surprisingly, GPT-3.5 showed performance at the same time.

Usage of GPT-4 and GPT-3.5 models for March 2023 and June 2023 for four projects, taken from
Grow up / Usage of GPT-4 and GPT-3.5 versions for March 2023 and June 2023 for four applications, taken from “How Are ChatGPT Behaviors Changing Over Time?”


This lesson comes on the heels of people often to complain that GPT-4 has dropped in the last few months. Popular theories about why include OpenAI models “distilling” to reduce their complexity in an attempt to speed up output and save GPU resources, optimization (additional training) to reduce negative results that can have unintended consequences, and the violation of unsupported conspiracy theories such as OpenAI reducing GPT-4’s coding capabilities so that people pay more for GibHub.

So far, OpenAI has consistently denied claims that GPT-4 has fallen into disrepair. As recently as last Thursday, OpenAI VP of Product Peter Welinder tweeted, “No, we didn’t make GPT-4 dumber. Quite the contrary: we make each new version smarter than the previous one. Modern theory: The more you use it, the more you start to notice things you never saw before.”

Although this new study may seem like a smoking gun to highlight the hunt for opponents of GPT-4, others say not so fast. Princeton computer science professor Arvind Narayanan he thinks that their findings do not confirm the lower performance of GPT-4 and may be related to the dynamic changes made by OpenAI. For example, based on the ability to generate codes, he criticized the study for evaluating the speed of the codes rather than their accuracy.

“The change they describe is that the new GPT-4 adds non-code text to the results. They don’t test the correctness of the code (surprisingly),” he said. tweeted. “They only look at whether the code is directly usable. So the test of a new model being useful counts against them.”

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *