I’ve been following AI video models since the early research papers, but I’ve always been a bit put off by how “technical” the prompting has to be. It usually feels like you need a degree in prompts just to get a character to walk properly. I finally sat down with Pixverse v5.6 to see if the “ease of use” claims were real. Instead of a 50-word technical script, I tried a much simpler approach using the First/Last Frame method. I gave it a starting image and an ending image, and instead of a massive prompt, I just typed a basic description of the action. What I did is to use its transition mode, upload the image that I want to start the video and the one that I want to end it with. Say something along the lines of I want the action to happen from image 1 and then end it on image 2. It felt like the model was finally doing the “thinking” for me. It filled in the motion between the frames with way more physical coherence than I expected from such a simple input. It’s a huge shift for someone who isn’t a pro “prompter” but wants high-end output. Still it can be way off with some of the physics like most models. But for the vagueness I prompted, it was aok. Are we moving toward a stage where the LLM inside these video models is smart enough to handle the “direction” so we don’t have to keep hacking the text? submitted by /u/Pretty_Eabab_0014
Originally posted by u/Pretty_Eabab_0014 on r/ArtificialInteligence
