Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.swarms.world/llms.txt

Use this file to discover all available pages before exploring further.

Swarms agents have disk-backed persistent memory through the Conversation class. When persistent_memory=True (the default), each agent writes its active interaction log to a MEMORY.md file and reloads that file when another process starts an agent with the same agent_name. Set persistent_memory=False for ephemeral agents that should keep no on-disk state. Use this page to understand:
  • How MEMORY.md is created, loaded, and updated
  • How context compression keeps memory within the model context window
  • How archived transcripts preserve raw chat history
  • How to inspect, compact, export, or disable memory in code
  • When to use persistent memory versus RAG-based long-term memory
Persistent memory is keyed by agent_name. Reusing the same agent_name (with persistent_memory=True) resumes the same memory across process restarts. Changing the name starts a separate memory folder; setting persistent_memory=False keeps the agent fully ephemeral.

The persistent_memory flag

persistent_memory is the top-level switch that controls whether the agent reads from and writes to MEMORY.md.
persistent_memory
bool
default:"True"
Enables disk-backed persistent memory. When True, the agent creates MEMORY.md on first run, preloads it on subsequent runs, and writes new turns through to disk. When False, the agent runs fully in-process — no MEMORY.md, no archive/, fresh state every run.
from swarms import Agent

# Persistent agent (default behavior).
# On first run it creates MEMORY.md. On subsequent runs it picks up
# where it left off — the model sees the prior conversation as a
# system preamble.
persistent_agent = Agent(
    agent_name="ResearchAssistant",
    agent_description="Remembers context across sessions",
    model_name="gpt-4.1",
    max_loops=1,
    persistent_memory=True,  # default — state survives restarts
)

# Ephemeral agent — no disk writes, no preload, fresh every run.
ephemeral_agent = Agent(
    agent_name="OneShotAgent",
    model_name="gpt-4.1",
    max_loops=1,
    persistent_memory=False,
)

Memory stack

An agent can use several memory layers at the same time:
LayerPurposePersistence
conversation_historyIn-memory messages for the current runCurrent process
MEMORY.mdActive user, agent, and tool interaction logDisk-backed
archive/history_<timestamp>.mdRaw transcripts saved before compactionDisk-backed
Conversation.compact()Replaces raw active history with a summaryDisk-backed summary
ContextCompressorAutomatically calls compaction near the context limitRuntime behavior
long_term_memoryOptional vector database for external knowledge retrievalDepends on database
MEMORY.md is not the same thing as RAG. Persistent memory records the agent’s own interaction history. RAG retrieves knowledge from external documents or a vector database.

Disk layout

Agent memory lives under the workspace directory:
$WORKSPACE_DIR/agents/{agent_name}/
|-- MEMORY.md
`-- archive/
    |-- history_2026-04-20_14-30-45.md
    |-- history_2026-04-20_16-12-08.md
    `-- ...
agent_name
str
default:"swarm-worker-01"
Stable name used to identify the agent’s memory folder.
from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
)

# Memory path:
# $WORKSPACE_DIR/agents/ResearchAgent/MEMORY.md

Key design points

  • The folder is keyed by agent_name, not by id.
  • MEMORY.md is append-updated during normal operation.
  • Every conversation.add(role, content) writes to in-memory history and to disk.
  • Compression archives the current MEMORY.md before replacing it with a compact summary.
  • The agent’s static system_prompt, rules, and constructor configuration are not repeatedly appended to MEMORY.md.

Lifecycle

1. File creation

On first construction of an agent with a new agent_name, Swarms creates:
$WORKSPACE_DIR/agents/{agent_name}/MEMORY.md
The file starts with a small header and an interaction log section:
# Agent Memory

**Conversation:** ResearchAgent_id_<uuid>_conversation
**Created:** 2026-04-20T18:33:12

---

## Interaction Log
If the file already exists, Swarms leaves it in place.

2. Preload on construction

During Conversation.__init__, Swarms reads the existing MEMORY.md and injects it into conversation_history as a single System message. The resulting prompt order is:
[0] System: <system_prompt>
[1] User: <rules>                       # if provided
[2] User: <custom_rules_prompt>         # if provided
[3] System: [Persistent Memory - MEMORY.md]
            ... full MEMORY.md contents ...
The preload is added directly to memory, so it is not written back to disk again. When return_history_as_string() builds the prompt, the model sees the system prompt, rules, persistent memory, and current task in order.

3. Write-through on new messages

Every conversation.add(role, content) call:
  1. Appends the message to conversation_history
  2. Appends a timestamped block to MEMORY.md
The on-disk format looks like this:
### User - 2026-04-20T18:35:04
Research cloud database options for low-latency analytics.

---

### ResearchAgent - 2026-04-20T18:35:21
I recommend evaluating BigQuery, ClickHouse Cloud, and AlloyDB...

---
Disk writes are serialized with a per-conversation lock. Construction-time messages such as system prompts and rules are suppressed from disk so static identity does not get duplicated on every restart.

Context compression

Without compression, a long-running agent could eventually exceed the model’s context window. Swarms can attach a ContextCompressor that summarizes the current transcript and compacts the active memory.
context_compression
bool
default:"True"
Enables automatic compression when memory approaches the configured context limit.
from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)

When compression runs

Compression can run when all of these are true:
  • context_compression=True
  • The token usage of short_memory.return_history_as_string() is greater than or equal to threshold * context_length
  • The agent is at the top of a loop iteration
The default threshold is 0.9, so compression starts when the active prompt reaches about 90% of the context window. Compression works for both max_loops="auto" and integer max_loops runs. The context_compression flag is the gate.

What compression does

When compression fires:
  1. The current transcript is summarized with an LLM call.
  2. Conversation.compact(summary=...) is called.
  3. The current MEMORY.md is copied to archive/history_<timestamp>.md.
  4. The active MEMORY.md is deleted and recreated with a fresh header.
  5. conversation_history is rebuilt with the system prompt, rules, and custom rules.
  6. The summary is appended as one System message to both memory and MEMORY.md.
After compaction, active memory is small again:
conversation_history:
  [0] System: <system_prompt>
  [1] User: <rules>                     # if provided
  [2] User: <custom_rules_prompt>       # if provided
  [3] System: [Compressed Memory Summary] ...<summary>

MEMORY.md:
  # Agent Memory
  ...
  ## Interaction Log
  ### System - <timestamp>
  [Compressed Memory Summary] ...<summary>

archive/history_<previous-timestamp>.md:
  Full pre-compaction transcript
On the next process restart, Swarms loads the compact summary from MEMORY.md instead of the raw pre-compaction transcript. The archive keeps the full transcript available without filling the active context window.

Configure compression

Compression is enabled by default:
from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)
Disable compression when you want the active MEMORY.md to remain un-compacted:
from swarms import Agent

agent = Agent(
    agent_name="StaticAgent",
    model_name="claude-sonnet-4-6",
    max_loops="auto",
    context_compression=False,
)
When compression is enabled, the agent attaches a ContextCompressor(threshold=0.9). You can replace it after construction to tune the threshold, summarizer model, temperature, or summary length:
from swarms import Agent
from swarms.agents.context_compressor import ContextCompressor

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)

agent._context_compressor = ContextCompressor(
    threshold=0.75,
    summarizer_model="claude-haiku-4-5",
    summarizer_temperature=0.1,
    summarizer_max_tokens=3000,
)

Access memory in code

The Conversation object is available as agent.short_memory.
from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
)

agent.run("Research low-latency data warehouse options.")
agent.run("Narrow the recommendation to GCP.")

# Path to the active on-disk memory file
print(agent.short_memory.memory_md_path)

# Full prompt-ready history
print(agent.short_memory.return_history_as_string())

# Structured message list
messages = agent.short_memory.to_dict()
print(messages)

# Last response content
print(agent.short_memory.get_final_message_content())

Manual compaction

You can compact memory yourself at any time:
agent.short_memory.compact(
    summary=(
        "Researched cloud data warehouses. "
        "The user prefers GCP for latency and operations reasons. "
        "Shortlist: BigQuery, AlloyDB, and ClickHouse Cloud."
    )
)
Manual compaction follows the same archive, wipe, and re-seed flow as automatic compression.

Export and load conversations

MEMORY.md is the active persistent memory file. You can also export or load conversation history in other formats:
# Save conversation snapshots
agent.short_memory.export(force=True)
agent.short_memory.save_as_json(force=True)
agent.short_memory.save_as_yaml(force=True)

# Load a prior exported conversation
agent.short_memory.load("conversation_agent-123.json")

Search memory

Use built-in search helpers for quick inspection:
results = agent.short_memory.search("GCP")
matches = agent.short_memory.search_keyword_in_conversation("latency")

Disable disk-backed memory

The clean way to keep an agent fully in-process is persistent_memory=False. Nothing is preloaded, nothing is written to MEMORY.md, and no archive/ directory is created:
from swarms import Agent

agent = Agent(
    agent_name="EphemeralAgent",
    model_name="gpt-4.1",
    persistent_memory=False,
)

agent.run("This updates conversation_history but does not write to MEMORY.md.")
If you have already constructed a persistent agent and want to stop further disk writes for the rest of the run, you can also clear memory_md_path:
agent.short_memory.memory_md_path = None
This stops future writes but does not retroactively delete MEMORY.md. Neither approach disables conversation_history — that always tracks the current run in memory.

Persistent memory vs RAG

Use MEMORY.md for the agent’s own interaction history. Use RAG when the agent needs to retrieve information from documents, databases, or external knowledge stores.
from swarms import Agent
from swarms.memory import ChromaDB

vector_db = ChromaDB(
    output_dir="agent_memory",
    docs_folder="knowledge_base",
)

agent = Agent(
    agent_name="KnowledgeAgent",
    model_name="claude-sonnet-4-6",
    long_term_memory=vector_db,
    rag_every_loop=False,
    max_loops=1,
)

response = agent.run(
    "Summarize what our renewable energy documents say about storage."
)
long_term_memory
BaseVectorDatabase
default:"None"
Vector database used for document retrieval.
rag_every_loop
bool
default:"False"
Query long-term memory on every loop iteration instead of only at the beginning.
memory_chunk_size
int
default:"2000"
Chunk size used when processing memory documents for retrieval.

Best practices

  • Use stable, descriptive agent_name values for agents that should remember previous work.
  • Keep context_compression=True for autonomous or long-running agents.
  • Tune ContextCompressor.threshold lower for agents with large tool outputs or long responses.
  • Compact manually after major milestones to preserve the important state and reduce prompt size.
  • Use RAG for external knowledge. Do not rely on MEMORY.md as a document database.
  • Set memory_md_path = None for privacy-sensitive or one-off agents that should not write a transcript.

Why it works this way

Why key memory by agent_name?

id values can change between process starts. agent_name is user-controlled and stable, so it gives the agent a durable identity.

Why preload memory as one System message?

The model needs to understand that the content is prior memory, not a current user request. A single system-level memory preamble is compact and less ambiguous than replaying old turns as active messages.

Why wipe MEMORY.md during compaction?

If compaction only appended a summary, the next run would load both the summary and the raw transcript it summarizes. Wiping the active file keeps the working context small, while archive/ preserves the raw log.

Next steps

Agent Configuration

Configure core agent parameters such as agent_name, max_loops, and context limits.

Conversation API

Explore the underlying Conversation class and its export, load, and search helpers.