Swarms agents have disk-backed persistent memory through theDocumentation Index
Fetch the complete documentation index at: https://docs.swarms.world/llms.txt
Use this file to discover all available pages before exploring further.
Conversation class. When persistent_memory=True (the default), each agent writes its active interaction log to a MEMORY.md file and reloads that file when another process starts an agent with the same agent_name. Set persistent_memory=False for ephemeral agents that should keep no on-disk state.
Use this page to understand:
- How
MEMORY.mdis created, loaded, and updated - How context compression keeps memory within the model context window
- How archived transcripts preserve raw chat history
- How to inspect, compact, export, or disable memory in code
- When to use persistent memory versus RAG-based long-term memory
Persistent memory is keyed by
agent_name. Reusing the same agent_name (with persistent_memory=True) resumes the same memory across process restarts. Changing the name starts a separate memory folder; setting persistent_memory=False keeps the agent fully ephemeral.The persistent_memory flag
persistent_memory is the top-level switch that controls whether the agent reads from and writes to MEMORY.md.
Enables disk-backed persistent memory. When
True, the agent creates MEMORY.md on first run, preloads it on subsequent runs, and writes new turns through to disk. When False, the agent runs fully in-process — no MEMORY.md, no archive/, fresh state every run.Memory stack
An agent can use several memory layers at the same time:| Layer | Purpose | Persistence |
|---|---|---|
conversation_history | In-memory messages for the current run | Current process |
MEMORY.md | Active user, agent, and tool interaction log | Disk-backed |
archive/history_<timestamp>.md | Raw transcripts saved before compaction | Disk-backed |
Conversation.compact() | Replaces raw active history with a summary | Disk-backed summary |
ContextCompressor | Automatically calls compaction near the context limit | Runtime behavior |
long_term_memory | Optional vector database for external knowledge retrieval | Depends on database |
MEMORY.md is not the same thing as RAG. Persistent memory records the agent’s own interaction history. RAG retrieves knowledge from external documents or a vector database.
Disk layout
Agent memory lives under the workspace directory:Stable name used to identify the agent’s memory folder.
Key design points
- The folder is keyed by
agent_name, not byid. MEMORY.mdis append-updated during normal operation.- Every
conversation.add(role, content)writes to in-memory history and to disk. - Compression archives the current
MEMORY.mdbefore replacing it with a compact summary. - The agent’s static
system_prompt,rules, and constructor configuration are not repeatedly appended toMEMORY.md.
Lifecycle
1. File creation
On first construction of an agent with a newagent_name, Swarms creates:
2. Preload on construction
DuringConversation.__init__, Swarms reads the existing MEMORY.md and injects it into conversation_history as a single System message.
The resulting prompt order is:
return_history_as_string() builds the prompt, the model sees the system prompt, rules, persistent memory, and current task in order.
3. Write-through on new messages
Everyconversation.add(role, content) call:
- Appends the message to
conversation_history - Appends a timestamped block to
MEMORY.md
Context compression
Without compression, a long-running agent could eventually exceed the model’s context window. Swarms can attach aContextCompressor that summarizes the current transcript and compacts the active memory.
Enables automatic compression when memory approaches the configured context limit.
When compression runs
Compression can run when all of these are true:context_compression=True- The token usage of
short_memory.return_history_as_string()is greater than or equal tothreshold * context_length - The agent is at the top of a loop iteration
0.9, so compression starts when the active prompt reaches about 90% of the context window.
Compression works for both max_loops="auto" and integer max_loops runs. The context_compression flag is the gate.
What compression does
When compression fires:- The current transcript is summarized with an LLM call.
Conversation.compact(summary=...)is called.- The current
MEMORY.mdis copied toarchive/history_<timestamp>.md. - The active
MEMORY.mdis deleted and recreated with a fresh header. conversation_historyis rebuilt with the system prompt, rules, and custom rules.- The summary is appended as one
Systemmessage to both memory andMEMORY.md.
MEMORY.md instead of the raw pre-compaction transcript. The archive keeps the full transcript available without filling the active context window.
Configure compression
Compression is enabled by default:MEMORY.md to remain un-compacted:
ContextCompressor(threshold=0.9). You can replace it after construction to tune the threshold, summarizer model, temperature, or summary length:
Access memory in code
TheConversation object is available as agent.short_memory.
Manual compaction
You can compact memory yourself at any time:Export and load conversations
MEMORY.md is the active persistent memory file. You can also export or load conversation history in other formats:
Search memory
Use built-in search helpers for quick inspection:Disable disk-backed memory
The clean way to keep an agent fully in-process ispersistent_memory=False. Nothing is preloaded, nothing is written to MEMORY.md, and no archive/ directory is created:
memory_md_path:
MEMORY.md. Neither approach disables conversation_history — that always tracks the current run in memory.
Persistent memory vs RAG
UseMEMORY.md for the agent’s own interaction history. Use RAG when the agent needs to retrieve information from documents, databases, or external knowledge stores.
Vector database used for document retrieval.
Query long-term memory on every loop iteration instead of only at the beginning.
Chunk size used when processing memory documents for retrieval.
Best practices
- Use stable, descriptive
agent_namevalues for agents that should remember previous work. - Keep
context_compression=Truefor autonomous or long-running agents. - Tune
ContextCompressor.thresholdlower for agents with large tool outputs or long responses. - Compact manually after major milestones to preserve the important state and reduce prompt size.
- Use RAG for external knowledge. Do not rely on
MEMORY.mdas a document database. - Set
memory_md_path = Nonefor privacy-sensitive or one-off agents that should not write a transcript.
Why it works this way
Why key memory by agent_name?
id values can change between process starts. agent_name is user-controlled and stable, so it gives the agent a durable identity.
Why preload memory as one System message?
The model needs to understand that the content is prior memory, not a current user request. A single system-level memory preamble is compact and less ambiguous than replaying old turns as active messages.
Why wipe MEMORY.md during compaction?
If compaction only appended a summary, the next run would load both the summary and the raw transcript it summarizes. Wiping the active file keeps the working context small, while archive/ preserves the raw log.
Next steps
Agent Configuration
Configure core agent parameters such as
agent_name, max_loops, and context limits.Conversation API
Explore the underlying
Conversation class and its export, load, and search helpers.