Three Things AI Creates
Different outputs (text, images, or both) require different technologies.
Text is processed as word pieces, images as dots (pixels).
Text, image, multimodal - each creates content in its own way.
LLM = Next-token Prediction
LLM is a model that predicts the probability of the next token.
Sentence generation is repeatedly picking 'what word comes next.'
The bar chart shows each candidate token's probability, and one gets selected.
RAG = Retrieval + Generation
RAG is a system that combines 'search' and 'generation'.
AI fills knowledge gaps by searching external documents.
Find documents in the library, convert to vectors, and fetch the most relevant info.
“Employees receive 15 days of paid vacation annually.”
— HR Policy Handbook [1]
Diffusion = Denoising Process
Diffusion gradually creates a clear image from blurry noise.
Instead of finishing at once, it completes through multiple steps.
The timeline shows how noise is progressively cleaned up step by step.
