Words become tokens.
Using billions of words and its nearby words in text publications online, LLMs are trained to create context.
Result of training, we get a massive set of words along a single keyword.
When the model process these set of words, it produces a vector (numeric list of values).
Word embedding can have hundreds of values that each represent different aspects of the word.
Although we do not have understanding of what each value represents or what characteristics it represents, we know that similar words often have similar embeddings.
I
and We
have similar embeddings.When we reduce hundreds of values each embedding represents to just two (x and y), we can visualize the embeddings in a 2D space.