The highly anticipated GPT-5 has finally made its debut. But the reaction has been quite different from what we expected. While some people are amazed by its coding capabilities and others are satisfied with its pricing and speed, there are also loud voices calling it "the worst AI model ever." What exactly is going on here?
A New Approach: Model Ensemble
GPT-5's biggest feature is that it's not a single model but rather an ensemble of multiple models. It has built-in routing functionality that analyzes user queries and automatically selects the most suitable model to execute. For example, if you give it a "think about this" command, it runs a reasoning-specialized model, and if you make a coding request, it uses a coding-optimized model.
This approach is definitely innovative. Moving away from the traditional method of trying to handle everything with one massive model to combining multiple specialized models has significant advantages in terms of efficiency. However, it also makes it harder to provide a consistent user experience.
How's the Performance? Benchmark Analysis
Interestingly, OpenAI didn't release performance comparison charts with other companies' models this time. This is quite unusual. Looking at benchmark results I found myself:
Math Performance: It scored quite high. However, this was when using 'thinking mode'. For example, if you ask a calculation like "-1 - (-9)" that AIs typically struggle with in normal mode, it gives you the wrong answer in 0.1 seconds. But in thinking mode, it carefully sets up the equation and derives the correct answer.
Medical Knowledge: Benchmark scores improved significantly. There are many evaluations saying that hallucinations (generating incorrect information) have definitely decreased.
Coding Ability: It's at a similar level to other companies' models, but particularly excels in frontend development. It has design sense built-in, so it can generate trendy UI that fits current standards in one go.
Price Revolution, But Hidden Pitfalls
One of GPT-5's biggest advantages is its overwhelmingly low price. Offering similar performance to the latest models at a fraction of the cost is welcome news for developers using APIs. This price competitiveness is especially valuable for people developing agents, since they consume a lot of tokens.
But this is where problems begin. Looking closely at the official report, you notice strange chart lengths and suspicious parts. It seems like they probably had GPT create the charts, and these detailed mistakes raise questions about credibility.
Dramatic Changes in User Experience
The real big problem emerged in user experience. People who've tried GPT-5 commonly point out these issues:
- Responses have become noticeably shorter
- The friendly tone has disappeared
- It barely uses emojis
- Overall, it's blunt and businesslike
To use a restaurant analogy, it feels like someone saying "Here's your food" while roughly dropping the plate in front of you. This might actually be better for people who only use it for coding or work purposes, but it's a major inconvenience for users who want conversation or use it for writing.
The disappointment is particularly strong among users who sought emotional connection through AI conversations. Even AI dating communities are flooded with complaints about the change in tone.
OpenAI's Response and Solutions
In response to users' strong backlash, OpenAI provided Plus users with the option to choose previous versions again. But this isn't a complete solution either. Many evaluate that it still doesn't feel like the old GPT-4, and free users still can't switch, so complaints continue.
In an AMA with OpenAI staff, they revealed that these changes were intentional. They explained that you can have friendly conversations by changing the tone in settings or requesting "please add emojis and give me compliments" in prompts. However, this also means users have to do additional work every time.
Technical Limitations and Future Prospects
Looking at GPT-5 raises a fundamental question: Can we really create true AGI (Artificial General Intelligence) with current language model architectures?
Most current AI models are structured around predicting the next word plus human feedback reinforcement learning. Research results are showing that even CoT (Chain of Thought) models, created to mimic human reasoning, easily break down when they go beyond their training scope.
That's why arguments are emerging that neuro-symbolic AI like GPT-5, which combines external tools, is the future. The idea is to acknowledge the limitations of pure neural network-based approaches and move toward combining logical reasoning with symbolic manipulation capabilities.
Personal Assessment and Implications
To be honest, compared to all the hype about AGI coming soon, GPT-5 is less revolutionary than expected. There are definitely performance improvements, but not at a paradigm-shifting level.
But there are also surprising aspects. It's certainly impressive that large language models have developed this much in just 2-3 years since they emerged. Particularly looking at price competitiveness and specialized features, there's been significant progress in practicality.
However, there's been a clear regression in user experience. It seems like they missed what users actually want while pursuing only technical performance. They overlooked that AI shouldn't just be a tool that gives accurate answers, but should also serve as a partner that interacts with people.
In conclusion, GPT-5 has made technical progress but leaves something to be desired in user experience. It'll be interesting to see how OpenAI reflects this feedback going forward, and what differentiation strategies other AI companies will present. Above all, I hope AI technology develops in a direction that truly helps humans, beyond simple performance competition.