Anatomy of a Choice

AI Architecture Series

ANATOMY OF A CHOICE

How does an LLM decide what comes next? A sequence of precise mathematical transformations.

The Prompt

“The weather today is very…”

The Goal: The model must predict the next token (word) to complete the sentence. It isn’t magic; it is a calculation from raw numbers to a final word.

01

Logits (The Raw Scores)

The journey begins when the model scans its entire vocabulary (often 50,000+ words) and assigns a raw score to every possible next token. These scores are called Logits.

Logits are unnormalized real numbers. They can be positive or negative and have no upper limit. A higher logit means the model thinks the word is more likely, but these numbers do not represent percentages yet.

Hot

12.5

Cold

11.0

Cloudy

6.0

Frog

-2.0

Insight: These are raw outputs from the neural network’s final layer.

02

Temperature (Controlling Creativity)

Before converting scores to probabilities, we can scale them using the Temperature parameter. We divide all logits by the temperature value.

New_Logit =

Old_Logit Temperature

The Physics of Choice:

Conservative (Temp < 1): Differences between numbers are exaggerated. The “winner” becomes much stronger.
Creative (Temp > 1): Differences are flattened. Outliers (like “Frog”) get a better fighting chance.

03

Softmax (Enter Probability)

The Softmax function is the translator. It takes the arbitrary logit values and squashes them into a normalized probability distribution.

Softmax(x) =

exp(x) ∑ exp(all_x)

After Softmax, every number is between 0 and 1, and the sum of all numbers is exactly 1 (100%). Now we know the mathematical likelihood of each word:

Hot

70%

Cold

20%

Cloudy

8%

Frog

2%

04 & 05

Filtering (Sort & Nucleus Sampling)

First, we Sort the vocabulary from highest probability to lowest. Then, we apply Top-p (Nucleus) Sampling.

This technique sets a cumulative threshold (e.g., P = 0.90). We keep the top words whose probabilities sum up to 90% and aggressively discard the rest. This prevents the model from choosing nonsensical words (the “tail” of the distribution).

Filtering Logic (Target P ≥ 0.90)

Hot (0.70)

Cumulative: 0.70

KEEP

Cold (0.20)

Cumulative: 0.90

KEEP

Cloudy (0.08)

Threshold reached

DISCARD

Frog (0.02)

Threshold reached

DISCARD

Ali Reza Rashidi

Ali Reza Rashidi, a BI analyst with over nine years of experience, He is the author of three books that delve into the world of data and management.

LLM thinking

Machine vs Deep Learning

ANATOMY OF A CHOICE

Logits (The Raw Scores)

Temperature (Controlling Creativity)

Softmax (Enter Probability)

Filtering (Sort & Nucleus Sampling)

Filtering Logic (Target P ≥ 0.90)

Ali Reza Rashidi

LLM thinking

Machine vs Deep Learning

Logits (The Raw Scores)

Temperature (Controlling Creativity)

Softmax (Enter Probability)

Filtering (Sort & Nucleus Sampling)

Filtering Logic (Target P ≥ 0.90)

Ali Reza Rashidi

Related posts

Machine vs Deep Learning

Postgre SQL

Vector DB