How LLM Works

Words become tokens.

token is a basic unit that can be encoded.
tokens are usually fraction of a word.

Using billions of words and its nearby words in text publications online, LLMs are trained to create context.

Result of training, we get a massive set of words along a single keyword.

We also get a set of words that do NOT show up (or is near) with the keyword.

When the model process these set of words, it produces a vector (numeric list of values).

Vector is adjusted based on each word's promixty to the keyword in the training data.
Vector is synonymous to word embedding.

Word embedding can have hundreds of values that each represent different aspects of the word.

the values in an embedding quantify a word’s linguistic features.

Although we do not have understanding of what each value represents or what characteristics it represents, we know that similar words often have similar embeddings.

e.g. I and We have similar embeddings.

When we reduce hundreds of values each embedding represents to just two (x and y), we can visualize the embeddings in a 2D space.

This is called dimensionality reduction.