Learn how to manage short-term and long-term memory in your MCP applications.
Memory is a crucial component of any AI assistant. It allows the assistant to:
MCP provides a structured approach to memory management, dividing memory into short-term and long-term components.
In MCP, memory is typically structured as follows:
memory: {
shortTerm: [
// Recent interactions, conversation context
],
longTerm: {
// User preferences, history, and persistent information
}
}Short-term memory holds recent interactions and conversation context. It's typically implemented as an array of messages or interactions.
shortTerm: [
{
type: "interaction",
timestamp: "2025-05-08T14:30:00Z",
user: "I'm looking for waterproof sneakers.",
assistant: "What style and price range are you looking for?"
},
{
type: "interaction",
timestamp: "2025-05-08T14:31:00Z",
user: "Minimalist style, under €150.",
assistant: "I'll find some options for you."
}
]Long-term memory stores persistent information about the user, such as preferences, history, and other data that should be remembered across sessions.
longTerm: {
preferences: {
style: ["minimalist", "neutral"],
priceRange: "100-150",
size: "EU 43"
},
purchaseHistory: [
{
product: "Nike Air Max",
date: "2024-12-15",
satisfaction: "high"
}
],
topics: ["running", "hiking", "casual wear"]
}MCP provides several methods for working with memory:
When creating a new MCP context, you can initialize the memory:
import { MCPContext } from '@modl/mcp';
const assistant = new MCPContext({
systemInstruction: "You are a helpful assistant.",
userGoal: "Find waterproof sneakers.",
memory: {
shortTerm: [],
longTerm: {
preferences: {
style: ["minimalist", "neutral"],
priceRange: "100-150"
}
}
}
});After each interaction, you can update the memory to reflect new information:
// Add a new interaction to short-term memory
assistant.updateMemory({
shortTerm: [
...assistant.memory.shortTerm,
{
type: "interaction",
timestamp: new Date().toISOString(),
user: "I prefer shoes with good arch support.",
assistant: "I'll keep that in mind when recommending options."
}
]
});
// Update long-term preferences
assistant.updateMemory({
longTerm: {
...assistant.memory.longTerm,
preferences: {
...assistant.memory.longTerm.preferences,
features: ["arch support", "cushioning"]
}
}
});To prevent the short-term memory from growing too large, you can implement a windowing strategy:
// Keep only the last 10 interactions
const MAX_INTERACTIONS = 10;
assistant.updateMemory({
shortTerm: assistant.memory.shortTerm
.slice(-MAX_INTERACTIONS) // Keep only the most recent interactions
.concat([newInteraction]) // Add the new interaction
});For longer conversations, you can summarize older interactions to save space while preserving context:
// Summarize older interactions
async function summarizeOlderInteractions(assistant) {
if (assistant.memory.shortTerm.length > 20) {
// Get the older interactions to summarize
const olderInteractions = assistant.memory.shortTerm.slice(0, 15);
// Use the LLM to generate a summary
const summary = await generateSummary(olderInteractions);
// Update the memory with the summary and recent interactions
assistant.updateMemory({
shortTerm: [
{ type: "summary", content: summary },
...assistant.memory.shortTerm.slice(15)
]
});
}
}To maintain memory across sessions, you'll need to persist it to a database or other storage:
// Save memory to database
async function saveMemory(userId, memory) {
await db.collection('users').updateOne(
{ userId },
{ $set: { memory } },
{ upsert: true }
);
}
// Load memory from database
async function loadMemory(userId) {
const user = await db.collection('users').findOne({ userId });
return user?.memory || { shortTerm: [], longTerm: {} };
}
// Example usage
async function handleUserMessage(userId, message) {
// Load existing memory
const memory = await loadMemory(userId);
// Create or update the assistant with the loaded memory
const assistant = new MCPContext({
systemInstruction: "You are a helpful assistant.",
userGoal: message,
memory
});
// Generate response
const response = await generateResponse(assistant);
// Update memory with the new interaction
assistant.updateMemory({
shortTerm: [
...assistant.memory.shortTerm,
{
type: "interaction",
timestamp: new Date().toISOString(),
user: message,
assistant: response
}
]
});
// Save updated memory
await saveMemory(userId, assistant.memory);
return response;
}Beyond basic memory management, MCP supports several advanced techniques:
You can tag memory items to make them easier to retrieve and filter:
// Add tags to memory items
assistant.updateMemory({
shortTerm: [
...assistant.memory.shortTerm,
{
type: "interaction",
tags: ["product_inquiry", "price_sensitive"],
user: "Do you have any budget-friendly options?",
assistant: "Yes, we have several options under €100."
}
]
});
// Filter memory by tags
const priceSensitiveInteractions = assistant.memory.shortTerm
.filter(item => item.tags?.includes("price_sensitive"));You can assign importance levels to memory items to prioritize what should be included in the context:
// Add importance level to memory items
assistant.updateMemory({
shortTerm: [
...assistant.memory.shortTerm,
{
type: "interaction",
importance: "high",
user: "I have a latex allergy, so I need shoes without latex.",
assistant: "I'll make sure to only recommend latex-free options."
}
]
});
// When compiling context, prioritize high-importance items
function getContextMemory(assistant, maxItems = 10) {
// Sort by importance and recency
const sortedMemory = [...assistant.memory.shortTerm]
.sort((a, b) => {
// First by importance
const importanceOrder = { high: 3, medium: 2, low: 1, undefined: 0 };
const importanceDiff = importanceOrder[b.importance] - importanceOrder[a.importance];
if (importanceDiff !== 0) return importanceDiff;
// Then by recency (assuming items have timestamps)
return new Date(b.timestamp) - new Date(a.timestamp);
});
// Return the top items
return sortedMemory.slice(0, maxItems);
}For more sophisticated retrieval, you can use embeddings to find relevant memory items:
// Store embeddings with memory items
async function addMemoryWithEmbedding(assistant, interaction) {
// Generate embedding for the interaction
const text = interaction.user + " " + interaction.assistant;
const embedding = await generateEmbedding(text);
// Add to memory with embedding
assistant.updateMemory({
shortTerm: [
...assistant.memory.shortTerm,
{
...interaction,
embedding
}
]
});
}
// Retrieve relevant memory items based on query
async function getRelevantMemory(assistant, query, maxItems = 5) {
// Generate embedding for the query
const queryEmbedding = await generateEmbedding(query);
// Find items with similar embeddings
const itemsWithScores = assistant.memory.shortTerm
.filter(item => item.embedding)
.map(item => ({
item,
score: cosineSimilarity(queryEmbedding, item.embedding)
}))
.sort((a, b) => b.score - a.score)
.slice(0, maxItems);
return itemsWithScores.map(({ item }) => item);
}Now that you understand memory management in MCP, you can: