Transformers

What are transformers

LLMs parse and write fluently.

  • Transformers radically sped up and augmented how computers understood language.

Transformers process and entire sequence at once.

  • transformers analyze all of the document instead of individual words.
  • allowing capturing of entire context and patterns within the document.

This training with faster and broader context allows for accurate text generation by LLMs.

  • tl;dr - transformers now allowed scaled training for higher quality output.

Self-attention

Self-attention looks at each token in a large text and determines which words are most important to understand its meaning.

RNN

Prior to transformers, recurrent neural networks (RNNs) were the primary ways of performing.

  • scanning and processing each word in a sentence sequentially.

With self-attention, transformers compute all words in a sentence at the same time.

Capturing this context gives LLMs far more sophisticated capabilities to parse language.

When an identical word like "Play" is used multiple times in a same sentence:

I like to Play the games I buy from the PlayStore.

The model understands that the "play" in "PlayStore" is different from the "play" in "Play games".

  • The model would understand that first "Play" would be associated with "Game".
  • While the second "Play" would be associated with "Buy" and "PlayStore".

This allowed how AI to use certain words and reduced the wrong use of same word in different contexts.

AI can go beyond differentiating between same words to replacing different words with same meanings like pronouns.

The weather is nice today. It has been really bad lately.

LLM would understand that "it" is referencing the "weather" and can use it interachangably.