Memory and History

Transformer is stateless.

Transformers store nothing about an existing chat history.

Transformer loses all context after outputting a single token.

LLM has a generator function that converts prompt, converts it to token, and stores it in a buffer.

The output of using this buffer in a conversation would get appended to the buffer for the next call.
This cycle is repeated until EndOfText token is created by the generator function.

When you hold a long conversation with a LLM chatbot, previous messages in a conversation is sent as a context to the LLM until the context window size is exceeded.

When context window size exceeds, only last N number of messages in a conversation is sent.