Agent Memory - swarms

Swarms agents have disk-backed persistent memory through the Conversation class. When persistent_memory=True (the default), each agent writes its active interaction log to a MEMORY.md file and reloads that file when another process starts an agent with the same agent_name. Set persistent_memory=False for ephemeral agents that should keep no on-disk state. Use this page to understand:

How MEMORY.md is created, loaded, and updated
How context compression keeps memory within the model context window
How archived transcripts preserve raw chat history
How to inspect, compact, export, or disable memory in code
When to use persistent memory versus RAG-based long-term memory

Persistent memory is keyed by agent_name. Reusing the same agent_name (with persistent_memory=True) resumes the same memory across process restarts. Changing the name starts a separate memory folder; setting persistent_memory=False keeps the agent fully ephemeral.

The `persistent_memory` flag

persistent_memory is the top-level switch that controls whether the agent reads from and writes to MEMORY.md.

persistent_memory

bool

default:"True"

Enables disk-backed persistent memory. When True, the agent creates MEMORY.md on first run, preloads it on subsequent runs, and writes new turns through to disk. When False, the agent runs fully in-process — no MEMORY.md, no archive/, fresh state every run.

from swarms import Agent

# Persistent agent (default behavior).
# On first run it creates MEMORY.md. On subsequent runs it picks up
# where it left off — the model sees the prior conversation as a
# system preamble.
persistent_agent = Agent(
    agent_name="ResearchAssistant",
    agent_description="Remembers context across sessions",
    model_name="gpt-4.1",
    max_loops=1,
    persistent_memory=True,  # default — state survives restarts
)

# Ephemeral agent — no disk writes, no preload, fresh every run.
ephemeral_agent = Agent(
    agent_name="OneShotAgent",
    model_name="gpt-4.1",
    max_loops=1,
    persistent_memory=False,
)

Memory stack

An agent can use several memory layers at the same time:

Layer	Purpose	Persistence
`conversation_history`	In-memory messages for the current run	Current process
`MEMORY.md`	Active user, agent, and tool interaction log	Disk-backed
`archive/history_<timestamp>.md`	Raw transcripts saved before compaction	Disk-backed
`Conversation.compact()`	Replaces raw active history with a summary	Disk-backed summary
`ContextCompressor`	Automatically calls compaction near the context limit	Runtime behavior
`long_term_memory`	Optional vector database for external knowledge retrieval	Depends on database

MEMORY.md is not the same thing as RAG. Persistent memory records the agent’s own interaction history. RAG retrieves knowledge from external documents or a vector database.

Disk layout

Agent memory lives under the workspace directory:

$WORKSPACE_DIR/agents/{agent_name}/
|-- MEMORY.md
`-- archive/
    |-- history_2026-04-20_14-30-45.md
    |-- history_2026-04-20_16-12-08.md
    `-- ...

agent_name

str

default:"swarm-worker-01"

Stable name used to identify the agent’s memory folder.

from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
)

# Memory path:
# $WORKSPACE_DIR/agents/ResearchAgent/MEMORY.md

Key design points

The folder is keyed by agent_name, not by id.
MEMORY.md is append-updated during normal operation.
Every conversation.add(role, content) writes to in-memory history and to disk.
Compression archives the current MEMORY.md before replacing it with a compact summary.
The agent’s static system_prompt, rules, and constructor configuration are not repeatedly appended to MEMORY.md.

Lifecycle

1. File creation

On first construction of an agent with a new agent_name, Swarms creates:

$WORKSPACE_DIR/agents/{agent_name}/MEMORY.md

The file starts with a small header and an interaction log section:

# Agent Memory

**Conversation:** ResearchAgent_id_<uuid>_conversation
**Created:** 2026-04-20T18:33:12

---

## Interaction Log

If the file already exists, Swarms leaves it in place.

2. Preload on construction

During Conversation.__init__, Swarms reads the existing MEMORY.md and injects it into conversation_history as a single System message. The resulting prompt order is:

[0] System: <system_prompt>
[1] User: <rules>                       # if provided
[2] User: <custom_rules_prompt>         # if provided
[3] System: [Persistent Memory - MEMORY.md]
            ... full MEMORY.md contents ...

The preload is added directly to memory, so it is not written back to disk again. When return_history_as_string() builds the prompt, the model sees the system prompt, rules, persistent memory, and current task in order.

3. Write-through on new messages

Every conversation.add(role, content) call:

Appends the message to conversation_history
Appends a timestamped block to MEMORY.md

The on-disk format looks like this:

### User - 2026-04-20T18:35:04
Research cloud database options for low-latency analytics.

---

### ResearchAgent - 2026-04-20T18:35:21
I recommend evaluating BigQuery, ClickHouse Cloud, and AlloyDB...

---

Disk writes are serialized with a per-conversation lock. Construction-time messages such as system prompts and rules are suppressed from disk so static identity does not get duplicated on every restart.

Context compression

Without compression, a long-running agent could eventually exceed the model’s context window. Swarms can attach a ContextCompressor that summarizes the current transcript and compacts the active memory.

context_compression

bool

default:"True"

Enables automatic compression when memory approaches the configured context limit.

from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)

When compression runs

Compression can run when all of these are true:

context_compression=True
The token usage of short_memory.return_history_as_string() is greater than or equal to threshold * context_length
The agent is at the top of a loop iteration

The default threshold is 0.9, so compression starts when the active prompt reaches about 90% of the context window. Compression works for both max_loops="auto" and integer max_loops runs. The context_compression flag is the gate.

What compression does

When compression fires:

The current transcript is summarized with an LLM call.
Conversation.compact(summary=...) is called.
The current MEMORY.md is copied to archive/history_<timestamp>.md.
The active MEMORY.md is deleted and recreated with a fresh header.
conversation_history is rebuilt with the system prompt, rules, and custom rules.
The summary is appended as one System message to both memory and MEMORY.md.

After compaction, active memory is small again:

conversation_history:
  [0] System: <system_prompt>
  [1] User: <rules>                     # if provided
  [2] User: <custom_rules_prompt>       # if provided
  [3] System: [Compressed Memory Summary] ...<summary>

MEMORY.md:
  # Agent Memory
  ...
  ## Interaction Log
  ### System - <timestamp>
  [Compressed Memory Summary] ...<summary>

archive/history_<previous-timestamp>.md:
  Full pre-compaction transcript

On the next process restart, Swarms loads the compact summary from MEMORY.md instead of the raw pre-compaction transcript. The archive keeps the full transcript available without filling the active context window.

Configure compression

Compression is enabled by default:

from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)

Disable compression when you want the active MEMORY.md to remain un-compacted:

from swarms import Agent

agent = Agent(
    agent_name="StaticAgent",
    model_name="claude-sonnet-4-6",
    max_loops="auto",
    context_compression=False,
)

When compression is enabled, the agent attaches a ContextCompressor(threshold=0.9). You can replace it after construction to tune the threshold, summarizer model, temperature, or summary length:

from swarms import Agent
from swarms.agents.context_compressor import ContextCompressor

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
    max_loops=5,
    context_compression=True,
)

agent._context_compressor = ContextCompressor(
    threshold=0.75,
    summarizer_model="claude-haiku-4-5",
    summarizer_temperature=0.1,
    summarizer_max_tokens=3000,
)

Access memory in code

The Conversation object is available as agent.short_memory.

from swarms import Agent

agent = Agent(
    agent_name="ResearchAgent",
    model_name="claude-sonnet-4-6",
)

agent.run("Research low-latency data warehouse options.")
agent.run("Narrow the recommendation to GCP.")

# Path to the active on-disk memory file
print(agent.short_memory.memory_md_path)

# Full prompt-ready history
print(agent.short_memory.return_history_as_string())

# Structured message list
messages = agent.short_memory.to_dict()
print(messages)

# Last response content
print(agent.short_memory.get_final_message_content())

Manual compaction

You can compact memory yourself at any time:

agent.short_memory.compact(
    summary=(
        "Researched cloud data warehouses. "
        "The user prefers GCP for latency and operations reasons. "
        "Shortlist: BigQuery, AlloyDB, and ClickHouse Cloud."
    )
)

Manual compaction follows the same archive, wipe, and re-seed flow as automatic compression.

Export and load conversations

MEMORY.md is the active persistent memory file. You can also export or load conversation history in other formats:

# Save conversation snapshots
agent.short_memory.export(force=True)
agent.short_memory.save_as_json(force=True)
agent.short_memory.save_as_yaml(force=True)

# Load a prior exported conversation
agent.short_memory.load("conversation_agent-123.json")

Search memory

Use built-in search helpers for quick inspection:

results = agent.short_memory.search("GCP")
matches = agent.short_memory.search_keyword_in_conversation("latency")

Disable disk-backed memory

The clean way to keep an agent fully in-process is persistent_memory=False. Nothing is preloaded, nothing is written to MEMORY.md, and no archive/ directory is created:

from swarms import Agent

agent = Agent(
    agent_name="EphemeralAgent",
    model_name="gpt-4.1",
    persistent_memory=False,
)

agent.run("This updates conversation_history but does not write to MEMORY.md.")

If you have already constructed a persistent agent and want to stop further disk writes for the rest of the run, you can also clear memory_md_path:

agent.short_memory.memory_md_path = None

This stops future writes but does not retroactively delete MEMORY.md. Neither approach disables conversation_history — that always tracks the current run in memory.

Persistent memory vs RAG

Use MEMORY.md for the agent’s own interaction history. Use RAG when the agent needs to retrieve information from documents, databases, or external knowledge stores.

from swarms import Agent
from swarms.memory import ChromaDB

vector_db = ChromaDB(
    output_dir="agent_memory",
    docs_folder="knowledge_base",
)

agent = Agent(
    agent_name="KnowledgeAgent",
    model_name="claude-sonnet-4-6",
    long_term_memory=vector_db,
    rag_every_loop=False,
    max_loops=1,
)

response = agent.run(
    "Summarize what our renewable energy documents say about storage."
)

long_term_memory

BaseVectorDatabase

default:"None"

Vector database used for document retrieval.

rag_every_loop

bool

default:"False"

Query long-term memory on every loop iteration instead of only at the beginning.

memory_chunk_size

int

default:"2000"

Chunk size used when processing memory documents for retrieval.

Best practices

Use stable, descriptive agent_name values for agents that should remember previous work.
Keep context_compression=True for autonomous or long-running agents.
Tune ContextCompressor.threshold lower for agents with large tool outputs or long responses.
Compact manually after major milestones to preserve the important state and reduce prompt size.
Use RAG for external knowledge. Do not rely on MEMORY.md as a document database.
Set memory_md_path = None for privacy-sensitive or one-off agents that should not write a transcript.

Why it works this way

Why key memory by `agent_name`?

id values can change between process starts. agent_name is user-controlled and stable, so it gives the agent a durable identity.

Why preload memory as one `System` message?

The model needs to understand that the content is prior memory, not a current user request. A single system-level memory preamble is compact and less ambiguous than replaying old turns as active messages.

Why wipe `MEMORY.md` during compaction?

If compaction only appended a summary, the next run would load both the summary and the raw transcript it summarizes. Wiping the active file keeps the working context small, while archive/ preserves the raw log.

Next steps

Agent Configuration

Configure core agent parameters such as agent_name, max_loops, and context limits.

Conversation API

Explore the underlying Conversation class and its export, load, and search helpers.

Documentation Index

​The persistent_memory flag

​Memory stack

​Disk layout

​Key design points

​Lifecycle

​1. File creation

​2. Preload on construction

​3. Write-through on new messages

​Context compression

​When compression runs

​What compression does

​Configure compression

​Access memory in code

​Manual compaction

​Export and load conversations

​Search memory

​Disable disk-backed memory

​Persistent memory vs RAG

​Best practices

​Why it works this way

​Why key memory by agent_name?

​Why preload memory as one System message?

​Why wipe MEMORY.md during compaction?

​Next steps

Agent Configuration

Conversation API

The `persistent_memory` flag

Memory stack

Disk layout

Key design points

Lifecycle

1. File creation

2. Preload on construction

3. Write-through on new messages

Context compression

When compression runs

What compression does

Configure compression

Access memory in code

Manual compaction

Export and load conversations

Search memory

Disable disk-backed memory

Persistent memory vs RAG

Best practices

Why it works this way

Why key memory by `agent_name`?

Why preload memory as one `System` message?

Why wipe `MEMORY.md` during compaction?

Next steps