Monday, August 18, 2025

GPT-5's Mixed Reception: What's Behind the Divided Opinions?

The Gap Between Expectations and Reality

Since GPT-5's release, the AI community has been completely split down the middle. On one side, people are cheering "Wow, this is absolutely incredible!" while on the other, they're expressing disappointment with "This is supposed to be AGI? Not impressed." To understand why we're seeing such polarized reactions, we need to look at OpenAI's service strategy and model architecture.

Let's start with the objective performance metrics. In Artificial Analysis's latest benchmarks, GPT-5 High and GPT-5 Medium claimed the top two spots. Previously, Grok had been dominating the upper ranks, but GPT-5 completely turned the tables. What's particularly interesting, though, is the significant performance variation even within the GPT-5 family itself. While GPT-5 High sits at the very top, GPT-5 Minimal ranks lower than even GPT-4o1.

OpenAI's Strategy of Removing User Choice

The biggest issue is that OpenAI didn't give users the ability to choose their model. Previously, users could directly select from various models like GPT-4, GPT-4 Turbo, and others. But with GPT-5, they switched to a system that analyzes your prompt and automatically selects the appropriate model.

This is similar to the automatic suction control feature on cordless vacuum cleaners. While it's convenient for the vacuum to adjust its power automatically, there are times when you want to keep it running at maximum power, right? GPT-5 works the same way. The system might determine that GPT-5 Minimal is sufficient for your question and select the lower-performing model, leaving users thinking "Something feels off – the responses aren't as good as before."

GPT-5: Designed with Developers in Mind

OpenAI officially announced that they targeted developers as the primary audience when developing GPT-5. They specifically focused on optimizing API calls for implementing agent AI and enhanced the ability to generate large-scale code with a single prompt. Instead of the previous iterative revision process, they evolved toward providing high-quality results from the get-go.

However, not all users are welcoming these changes. You can see this in OpenAI's official release of the 'GPT-5 Prompt Guide.' They said that in the age of agent AI, context engineering would become more important than prompt engineering, yet here we are again with users needing to write complex prompts.

According to the guide, GPT-5 is trained to explain its plan in advance when using tools and continuously report progress. Users need to clearly present their goals, request step-by-step explanations, and even specify reporting styles. You can also control how deeply it thinks through the 'Reasoning Effort' setting.

Technical Achievements Are Undeniable

Despite the criticism, GPT-5's technical advancement is undeniable. It scored 92 points on the HumanEval benchmark, which means it can accurately solve virtually all programming problems. We can expect all future frontier models to perform at this level or higher.

What's particularly noteworthy is the significant improvement in hallucination. It dropped dramatically from the previous 4.5% to 0.7%. That's a reduction from about 1-in-20 to 1-in-100, meaning reliability for actual work applications has improved tremendously.

It's also interesting that all training data consisted of synthetic data. This represents important progress in data quality management and copyright issue resolution.

OpenAI's Dilemma and Strategic Choices

The background behind OpenAI's choice lies in their limitations as a startup. Unlike Meta or Google, OpenAI doesn't have stable revenue streams like advertising income. In a situation where they need continuous investment funding, they had to show innovative changes that would appeal to investors.

They seem to have tried to create a structure that could improve profitability while efficiently using computing resources by integrating models, while simultaneously delivering the message of being "a new starting point for the agent AI era." However, this strategy appears to be backfiring in terms of user experience.

The AGI Debate and Future Outlook

Sam Altman's claim of achieving AGI remains highly controversial. Critics are asking whether a model that can't even properly answer which is larger between 9.1 and 9.2 can really be called AGI. Despite such criticism, looking at the continuous performance improvement trend following scaling laws, AGI achievement might not be that far in the future.

What's important is that this isn't the end. AI competition is still ongoing, and OpenAI reaching this level means other companies will likely achieve similar levels soon. Indeed, all major AI companies including Google, Meta, and Anthropic are pouring enormous investments into this space.

Conclusion: Technology Advanced, but Service Falls Short

In conclusion, while GPT-5 has clearly made technical progress, there's much to be desired in terms of service delivery. Removing user choice and letting the system make decisions automatically might be meaningful as an attempt toward AGI, but at current technology levels, it has actually resulted in decreased user satisfaction.

If OpenAI continues to address these issues through ongoing updates, they can maintain their position as a leader in the AI field. However, with other companies rapidly catching up, technical superiority alone isn't enough. It's time to pay more attention to user experience and service quality.

Looking at the pace of AI technology development, there's a high probability that models truly approaching AGI will emerge within the next few years. How well OpenAI balances technological innovation with user satisfaction until then will determine their position in the future AI market..

Share: