
How does an LLM decide what comes next? A sequence of precise mathematical transformations.
The Prompt
The Goal: The model must predict the next token (word) to complete the sentence. It isn’t magic; it is a calculation from raw numbers to a final word.
The journey begins when the model scans its entire vocabulary (often 50,000+ words) and assigns a raw score to every possible next token. These scores are called Logits.
Logits are unnormalized real numbers. They can be positive or negative and have no upper limit. A higher logit means the model thinks the word is more likely, but these numbers do not represent percentages yet.
Insight: These are raw outputs from the neural network’s final layer.
Before converting scores to probabilities, we can scale them using the Temperature parameter. We divide all logits by the temperature value.
The Physics of Choice:
The Softmax function is the translator. It takes the arbitrary logit values and squashes them into a normalized probability distribution.
After Softmax, every number is between 0 and 1, and the sum of all numbers is exactly 1 (100%). Now we know the mathematical likelihood of each word:
First, we Sort the vocabulary from highest probability to lowest. Then, we apply Top-p (Nucleus) Sampling.
This technique sets a cumulative threshold (e.g., P = 0.90). We keep the top words whose probabilities sum up to 90% and aggressively discard the rest. This prevents the model from choosing nonsensical words (the “tail” of the distribution).