Memory
Memory and History
Transformer is stateless.
Transformers store nothing about an existing chat history.
Transformer loses all context after outputting a single token.
LLM has a generator function that converts prompt, converts it to token, and stores it in a buffer.
- The output of using this buffer in a conversation would get appended to the buffer for the next call.
- This cycle is repeated until
EndOfText
token is created by the generator function.
When you hold a long conversation with a LLM chatbot, previous messages in a conversation is sent as a context to the LLM until the context window size is exceeded.
- When context window size exceeds, only last N number of messages in a conversation is sent.