AI
LLM
Basically a "next token predictor".
- Predicting following tokens in sequence.
Training
Main objective of ML training is to minimize loss
(hence, making more accurate predictions) over time.
Value/Loss
Value
= Data's contribution to model's output.
Loss
= Measuring how wrong the model's outputs are.
Loss Functions (Objective Function)
Loss
quantifies the difference between a model’s predictions and actual outcomes.
- measures how wrong the model’s predictions are.
Loss function directly influence the effectiveness of model prediction.
- accuracy of next token
- guided learning
- losses are consumed as a feedback mechanism for models to adjust internal weights and biases
- performance optimization
- efficient loss minimization enhances model performance, reduces overfitting, and improves generalization to unseen data.
Standard Loss Functions
Tokens, Vectors/Embeddings
Tokens
Words become tokens.
- basic unit that can be encoded.
- tokens are usually fraction of a word.
LLMs extract the meaning of words by observing its context using massive amount of data.
- LLM looks for words that are close to the main word or those that were not found near the main word while training.
1M tokens are about 750,000 words.
- how many words do people speak in a day
- avg women 5000
- avg men 2000
Vector/Embedding
As a result of training, we get a vector (a list of values) that adjusts based on each word's proximity to the main word in the training data.
- This vector is known as a word embedding.
Word embedding can have hundreds of values that each represent different aspects of the main word.
- the values in an embedding quantify a word’s linguistic features.
Although we do not have understanding of what each value represents or what characteristics it represents, we know that similar words often have similar embeddings.
- e.g.
I
andWe
have similar embeddings. - Embeddings quantify the closeness.
When we reduce hundreds of values each embedding represents to just two (x and y), we can visualize the embeddings in a 2D space.
- This is called dimensionality reduction.
When we reduce the dimensions, we see clustering of similar words.
Grounding
LLMs hallucinate because it is a statistical next-token predictor.
Grounding is a process to limit LLM to answer with less hallucinations.
- cross-checking an LLM’s outputs against web search results.
- providing citations to users so they can verify.
RLHF (Reinforcement Learning by Human Feedback)
Human beings also contribute to filling the gap and and providing feedback.