The key to creating AI text to video or text to image is a clear and concise prompt. That is often easier said than done. I have gone through multiple prompting attempts and still not achieved the desired results. This can become expensive and time consuming. The problem is compounded when you use multiple platforms. The inconsistency of interpretation can create bizarre results. JSON prompt for text to video and text to image standardizes prompt language that can improve your overall results.
JSON Prompt Definition
Claude from Anthropic provide the following definition:
A JSON prompt for video creation is a structured data format that contains instructions and parameters for AI video generation tools.
Common Parameters
The exact parameters vary by platform, but commonly include:
• prompt/text_prompt: Main description of the video content
• duration: Length in seconds
• resolution/dimensions: Video size (width x height)
• fps: Frames per second
• style: Visual style or artistic direction
• camera_movement: How the camera should move
• seed: For reproducible results
• negative_prompt: What to avoid in the generation
Different AI video platforms (like Runway, Pika Labs, Stable Video Diffusion) may have their own specific JSON schema and supported parameters.
JSON vs Text Based
The best way to show the difference is by example. I am going to show two videos. One was created from a text prompt and the other was created from a JSON prompt. The JSON prompt was created by having Claude convert the text prompt to a JSON prompt.
This is the original prompt:
“A wizard who resembles Gandalf the white is standing inside the throne room of a castle. The castle is in the style of Gondor of the middle earth. There are Lords and ladies and chairs looking at him. The camera shows a wide angle view. The camera focuses as He points at a large glowing orb in the middle of the room. He says” This is the new magic of artificial intelligence. It brings great power into the universe. It is neither good or evil. It is neither white or black magic. The yielder of the power determines the nature of the magic.” “
The prompt generated this video
This is the JSON prompt that Claude created from my original text prompt:
{
"scene": {
"setting": {
"location": "throne room",
"architecture": "castle in the style of Gondor from Middle Earth"
},
"characters": {
"main_character": {
"description": "wizard who resembles Gandalf the White",
"position": "standing inside the throne room"
},
"background_characters": [
"lords and ladies seated in chairs",
"audience looking at the wizard"
]
},
"camera": {
"initial_shot": "wide angle view of the entire throne room",
"focus_transition": "camera focuses as wizard points at orb"
},
"props": {
"central_object": {
"type": "large glowing orb",
"position": "middle of the room"
}
},
"action": {
"wizard_gesture": "points at the glowing orb"
},
"dialogue": {
"speaker": "wizard",
"speech": "This is the new magic of artificial intelligence. It brings great power into the universe. It is neither good or evil. It is neither white or black magic. The wielder of the power determines the nature of the magic."
}
},
"visual_style": {
"genre": "fantasy",
"inspiration": "Lord of the Rings aesthetic",
"atmosphere": "mystical, ceremonial"
}
}
This is the video that this prompt generated
I created both of these videos using Veo 3. I also generated the same videos using Seedance 1.0. I access this application via openart.ai. Open Art gives you a menu of various platforms used to create videos. There can be a significant cost and quality difference between the various platforms so it offers you some decent choices.
Seedance text prompt
Seedance 1.0 JSON prompt
Seedance does not do audio neatly as well as Veo 3. You have to work around to be able to achieve limited audio.However, it is significantly less expensive to generate a video.
Analysis
There are several factors that need to be considered. The first factor is that I am not a professional video creator. There are people who really study this and can create a much more detailed prompt with clearer directions. The other issue is that theese videos are created in eight second clips. You have to be careful that you do not include so much information that it overruns the time limit. I had several clips that I discarded specifically because they could not include the correct amount of information.
The Veo 3 clip I created from my text prompt was acceptable, but the JSON clip was closer to my original intent. The main difference is the camera creating a close-up of the glowing orb. This was to set the stage for the next clip, which would show the orb transforming into a view of a dark AI society.
The Seedance clip from the text prompt was not acceptable. The JSON clip was better, but was not as good as either of the VEO 3 clips. However, they are literally 80% less expensive to create.
Summary
The AI platform you use is trying to interpret your text to create the vision you desire. The more detailed and clear the text the better the results. This is complicated by the fact that the AI has no memory of your previous creations. There is also the problem of term ambiguity. It may interpret your desires differently based upon the context and how you wrote the text prompt. This is particularly true for common elements, such as camera angles and lighting.
JSON prompts use a standard terminology for basic film functions. In enables you to communicate your desires in a consistent language.. The AI model will recognize the consistent language and create a much cleaner interpretation of your thoughts. This allows you to concentrate on the more creative aspects of your video.
You do not need to understand how to create the prompt yourself. Every large language model can convert your text prompt into a JSON prompt. However, I believe it is important that you understand the terminology so you can interpret the results of the JSON prompt. This is particularly important if you do not obtain the desired result in the original prompt. You can then go back in and fine tune the prompt to get the results you want.
JSON prompts are a tool to increase video quality. I have used it to upgrade my video quality. Give it a try, you may find it help you as well.