The absolute powerhouse of AI image generation, Midjourney, has finally stepped into the video realm. The video model v1, unveiled recently through their web app, isn't just another feature addition—it's a major announcement that signals another seismic shift in the AI content creation ecosystem.
Two Video Creation Approaches: Prompts vs. Existing Images
Midjourney's video creation is divided into two main approaches. The first method involves creating directly from the prompt input field. You can drag your desired image to the Starting Frame tab, or upload external images through 'Add Image' → 'Choose File'. This process is similar to existing image generation, but it's differentiated by setting the first frame of a dynamic video rather than a static image.
The second approach is based on already-created Midjourney images. Clicking the 'Animate' button on an existing image automatically generates four Low-version videos. For more detailed control, you can click on the image and use the 'Animate Image' option in the bottom right corner.
How Subtle Motion Settings Create Dramatically Different Results
The most crucial setting in video creation is the motion level. Low motion keeps the camera mostly fixed and adds only subtle movements to the subject. Sometimes it produces videos that are almost static, but this can be effective when creating calm and sophisticated atmospheres.
High motion, on the other hand, makes both the camera and subject move dynamically. It's suitable when you want vibrant videos, but occasionally produces unrealistic or awkward movements. In actual testing, when comparing Low and High motion with the same cat image, High motion expressed the cat's actions much more vividly and provided more varied camera movements.
What's fascinating is that even with identical frames and prompts, completely different-feeling videos emerge depending on the motion setting. This suggests that motion should be strategically chosen based on the video's atmosphere and directorial intent.
Surprising Quality at 480p Resolution
Midjourney's video model currently outputs at 480p resolution. While this is half the level of commonly used Full HD (1080p), the actual results don't feel low-resolution at all. This is thanks to Midjourney's accumulated vast image data and exceptional detail expression capabilities. Since the original image quality is so outstanding, it maintains premium and sharp quality even when converted to video.
This is a great example showing that video quality can't be judged by resolution numbers alone. It demonstrates how significantly the quality of an AI model's training data and algorithm sophistication impact the final output.
AI's Imagination: The Ability to Implement Even Invisible Elements
The most impressive aspect is AI's ability to naturally generate elements that weren't in the original image. In one test, when converting a lioness photo to High motion video, a male lion that wasn't in the original was naturally added, and even a camera that wasn't in the photo was implemented in iPhone form.
This goes beyond simple image animation, showing AI's ability to understand scenes and logically expand them. The vast image data that Midjourney has accumulated over time seems to enable this contextual understanding.
Limitations and Possibilities of Prompt Understanding
However, not everything is perfect. Understanding of complex prompts is still limited. Multiple test results showed that it hasn't reached the level of implementing sophisticated instructions perfectly like existing professional video AI models. Particularly when complex prompts were input, they often weren't reflected at all.
This is an understandable limitation considering this is Midjourney's first video model. Even in image generation, initially only simple prompts were well understood, but through continuous updates, it reached today's amazing level. The video model is expected to follow the same development trajectory.
Video Extension Feature: Up to 21 Seconds
The created 5-second videos can be extended through the 'Extend' feature. The 'Auto' option automatically extends the video, while the 'Manual' option allows you to directly specify the next scene with prompts. Each extension adds 4 seconds, and with up to 4 extensions possible, you can create videos up to 21 seconds total.
Each extension takes about 8 minutes of processing time, which is somewhat long currently. However, considering the quality of the results, it's worth the wait.
Strategic Positioning in the AI Video Market
Midjourney is a latecomer to the AI video creation field. They're entering a market where Runway, Pika Labs, Stable Video Diffusion, and others have already established their presence. However, Midjourney's differentiation points are clear.
First, video generation based on overwhelming image quality. Unlike existing video AIs that generate videos from scratch, Midjourney chose an approach of expanding already-verified high-quality images into videos. This provides a significant advantage in terms of quality.
Second, user-friendly interface. The ability to create high-quality videos with just a few clicks without complex settings is a major attraction for general users.
Future Vision: Toward Real-Time Interactive Virtual Worlds
According to Midjourney's official website, their ultimate goal is implementing virtual worlds with real-time interaction capabilities. This video model can be seen as an intermediate step toward that goal. A roadmap is being drawn from static images to dynamic videos, and ultimately to interactive 3D environments.
This is technology that can bring innovation to various fields including metaverse, gaming, education, and entertainment, beyond simple content creation tools. Particularly as individual creators become able to produce movie-quality content, the democratization of content creation is expected to accelerate further.
Midjourney's video model v1 isn't perfect. It has limitations including prompt understanding constraints, relatively long processing times, and 480p resolution. However, for a first version, it has shown remarkable quality and potential. Considering the continuous improvement and innovation Midjourney has demonstrated in the image field, I'm confident it will soon become a game-changer in the video field as well.
The pace of AI technology development is sometimes amazing and sometimes frightening. However, I hope these tools develop in a direction that amplifies rather than replaces human creativity. Midjourney's video model will be another milestone showing such possibilities.
