Sunday, July 6, 2025

John Carmack's AI Research Journey: From Games to Reality and Unsolved Challenges

Recently, John Carmack has provided detailed insights into his AI research direction and current projects. Let's explore what challenges this id Software founder and VR technology pioneer now faces as an AI researcher, and how he views the current limitations in the AI field.


From Game Developer to AI Researcher

Carmack's career path is quite unique. In the early 1990s, he co-founded id Software and defined the first-person shooter genre. Notably, Quake's GL Quake version became a key driver for early GPU adoption. He later developed vertical takeoff and landing rockets at Armadillo Aerospace and built foundational VR technologies at Oculus.

His interest in AI began when OpenAI's founders approached him with a recruitment offer. Though he wasn't an AI expert at the time, Carmack began studying the field deeply after their proposal and concluded it was "the most interesting thing anyone can do right now."

Interestingly, Carmack initially started his research in "Victorian gentleman scientist" mode. With sufficient personal assets, he planned to conduct scientific research independently. However, following advice from others, he eventually established a company, secured venture funding, and now works with a team of six researchers.

LLM Limitations and Fundamental Problems

Carmack maintains a measured perspective on the current LLM boom. While he uses LLMs daily and acknowledges their remarkable capabilities, he also points out fundamental limitations.

"LLMs cannot be the complete answer. Transformer-based models are not how the human brain works. What they do seems magical, but they can't handle many basic tasks that cats, dogs, or even young children can do."

Carmack particularly emphasizes that LLMs are like "putting all human knowledge into a giant blender and training on it." This reveals fundamental limitations when learning something entirely new.

The Value of Atari Games in AI Research

Carmack's research team has chosen Atari games as their primary research platform. While some might question whether this approach is "outdated," Carmack defends Atari's value for several reasons.

First, the diversity offered by over 100 different games. Second, the absence of researcher bias. When researchers create their own benchmarks, they unconsciously exclude elements that might disadvantage their algorithms. However, Atari games were created for humans before AI research began, providing unbiased challenges.

Yet Carmack also points out problems with current Atari research methods. The standard approach involves training a single agent for 200 million frames (about a month of playtime), which he considers unrealistic.

The Reality Gap: Physical Gameplay Experiments

One of Carmack's team's most intriguing projects involves a system that plays Atari games in actual physical environments. They built a system using cameras, robot servos, and joysticks to play games on real Atari consoles.

This experiment led to several important discoveries:

Latency Effects: Using the robot controller introduces about 180 milliseconds of delay. While this is similar to human reaction times (150-200 milliseconds), they found that many modern RL algorithms are extremely vulnerable to such delays.

Physical Constraint Complexity: When moving the joystick from one diagonal to another, unintended actions occur during intermediate steps. For example, in Atlantis, when trying to move the joystick right while pressing the fire button, the shot fires before the joystick reaches the full right position, causing shots to go in unintended directions.

Score Recognition Difficulties: The most unexpected problem was reading scores from the screen. In simulation, scores can be directly accessed from internal memory, but in real environments, the system must recognize scores visually. This proved much more challenging than anticipated.

Unsolved Core Challenges

Carmack presents several key problems that remain unsolved in current AI research:

Sequential Multi-task Learning

Current RL agents almost completely lose their ability in previous games when learning new ones. This is completely different from human learning. When humans play 10 games for a month each and then encounter a new game, they can quickly adapt based on their understanding of how games work.

Transfer Learning Failures

In OpenAI's "Gotta Learn Fast" challenge, no one demonstrated meaningful transfer learning. The Gato agent even showed negative transfer learning, meaning the multi-game trained model performed worse on new games than training from scratch.

Sparse Reward Problems

In the real world, you rarely receive rewards multiple times per second like in video games. Even among Atari games, titles like Pitfall or Montezuma's Revenge require playing for minutes without any reward signal.

Exploration and Action Space Issues

Most current RL systems use epsilon-greedy methods (random action once every 100 times), which is completely different from human behavior. Additionally, modern game controllers (like Xbox controllers) have over a million possible actions, making them difficult to model with discrete action spaces.

New Benchmark Proposals

Carmack points out problems with current ML benchmarking and proposes new approaches. Most Atari performance reports use their own training frameworks with important details that are not reproducible.

He proposes creating a harness that calls agents through a very simple interface: "Agent, here's an observation and reward. Give me an action." The environment continues regardless of the agent's control.

This new benchmark would feature:

- Sequential learning through 8 games cycled 3 times

- Testing in the final cycle

- No explicit evaluation phases

- Full action sets and sticky actions

Technical Optimization and Practical Considerations

Carmack mentions his low-level optimization background and provides interesting insights. He initially started with custom CUDA kernels but eventually moved to PyTorch. He reflects that this was "a mistake," pointing out the problems of starting too low-level.

Their current system uses CUDA Graphs to handle the entire pipeline in a single CUDA call. This includes everything from policy evaluation to training, with early termination while the real world continues, preparing the next training set.

Human vs Machine: Reflections on Intrinsic Rewards

Carmack presents interesting perspectives on the differences between human and machine intrinsic rewards. While humans are proof of existence for the intelligence we want to emulate, it's questionable whether we need to copy all human characteristics.

He believes some obvious characteristics are worth emulating. For example, rewards from feeling in control of something, or rewards based on the magnitude of visual effects (screen-wide explosions are more rewarding than small pixel changes).

However, he considers reward hacking in human minds, like online gambling, non-ideal. Ultimately, the final score should be the main driver, and intrinsic rewards that help improve scores are good rewards.

Conclusion: AI Research as Science

The most impressive aspect of Carmack's presentation is his approach to AI research as "science." While all his previous work was engineering, he says he's now doing science—discovering knowledge that nobody knows.

He views the current AI field as "the time when everything is happening" and predicts the next few years will be truly critical. Despite LLMs' remarkable achievements, fundamental problems like continual learning, transfer learning, and learning in sparse reward environments remain unsolved.

Carmack's approach appears simple but has depth. Through the seemingly "outdated" Atari platform, he explores AI's most fundamental problems, directly verifies the gap between simulation and reality through experiments, and proposes new benchmarks the entire community can use.

His message is ultimately clear: don't be dazzled by current AI technology's flashy achievements, but focus on the fundamental problems that remain unsolved. And to solve those problems, we need to approach them scientifically, starting from simple environments and building up systematically.

Share: