Glossary
AI creator glossary
Plain-English definitions of the terms creators run into when working with AI video, image, voice, and music tools — from text-to-video to AI lip-sync, voice cloning, and Google AI Overview.
A
C
- Cinematic promptA prompt that explicitly directs the model to produce film-quality framing, lighting, and motion — using terms like 'cinematic 16:9', 'shallow depth of field', 'anamorphic lens', 'golden hour'.
- Credit-based pricingA pricing model where you buy a pool of credits and each AI generation deducts a fixed amount based on the model and length.
D
E
F
I
- Image upscalingIncreasing the resolution of an image while reconstructing detail using an AI model — typically going from 1080p or 2K up to 4K or 8K.
- Image-to-videoA generative AI workflow where you upload a still image and an optional motion prompt, and the model animates the image into a short clip while keeping the subject on-model.
K
L
N
P
R
S
T
- Text-to-imageA generative AI workflow where you describe an image in plain English and the model renders it as a still picture.
- Text-to-speechSynthesizing spoken audio from written text using an AI voice model.
- Text-to-videoA generative AI workflow where you describe a scene in plain English and the model renders it as a short video clip.
V
- Google VeoGoogle DeepMind's flagship text-to-video model.
- Vertical videoVideo shot or rendered in 9:16 aspect ratio — taller than it is wide.
- Voice cloningA generative AI process that builds a digital model of a real voice from a short audio sample (typically 30–60 seconds), then synthesizes new speech in that voice from text.
