Learn how to manage short-term and long-term memory in your MCP applications.
Memory is a crucial component of any AI assistant. It allows the assistant to:
MCP provides a structured approach to memory management, dividing memory into short-term and long-term components.
In MCP, memory is typically structured as follows:
memory: { shortTerm: [ // Recent interactions, conversation context ], longTerm: { // User preferences, history, and persistent information } }
Short-term memory holds recent interactions and conversation context. It's typically implemented as an array of messages or interactions.
shortTerm: [ { type: "interaction", timestamp: "2025-05-08T14:30:00Z", user: "I'm looking for waterproof sneakers.", assistant: "What style and price range are you looking for?" }, { type: "interaction", timestamp: "2025-05-08T14:31:00Z", user: "Minimalist style, under €150.", assistant: "I'll find some options for you." } ]
Long-term memory stores persistent information about the user, such as preferences, history, and other data that should be remembered across sessions.
longTerm: { preferences: { style: ["minimalist", "neutral"], priceRange: "100-150", size: "EU 43" }, purchaseHistory: [ { product: "Nike Air Max", date: "2024-12-15", satisfaction: "high" } ], topics: ["running", "hiking", "casual wear"] }
MCP provides several methods for working with memory:
When creating a new MCP context, you can initialize the memory:
import { MCPContext } from '@modl/mcp'; const assistant = new MCPContext({ systemInstruction: "You are a helpful assistant.", userGoal: "Find waterproof sneakers.", memory: { shortTerm: [], longTerm: { preferences: { style: ["minimalist", "neutral"], priceRange: "100-150" } } } });
After each interaction, you can update the memory to reflect new information:
// Add a new interaction to short-term memory assistant.updateMemory({ shortTerm: [ ...assistant.memory.shortTerm, { type: "interaction", timestamp: new Date().toISOString(), user: "I prefer shoes with good arch support.", assistant: "I'll keep that in mind when recommending options." } ] }); // Update long-term preferences assistant.updateMemory({ longTerm: { ...assistant.memory.longTerm, preferences: { ...assistant.memory.longTerm.preferences, features: ["arch support", "cushioning"] } } });
To prevent the short-term memory from growing too large, you can implement a windowing strategy:
// Keep only the last 10 interactions const MAX_INTERACTIONS = 10; assistant.updateMemory({ shortTerm: assistant.memory.shortTerm .slice(-MAX_INTERACTIONS) // Keep only the most recent interactions .concat([newInteraction]) // Add the new interaction });
For longer conversations, you can summarize older interactions to save space while preserving context:
// Summarize older interactions async function summarizeOlderInteractions(assistant) { if (assistant.memory.shortTerm.length > 20) { // Get the older interactions to summarize const olderInteractions = assistant.memory.shortTerm.slice(0, 15); // Use the LLM to generate a summary const summary = await generateSummary(olderInteractions); // Update the memory with the summary and recent interactions assistant.updateMemory({ shortTerm: [ { type: "summary", content: summary }, ...assistant.memory.shortTerm.slice(15) ] }); } }
To maintain memory across sessions, you'll need to persist it to a database or other storage:
// Save memory to database async function saveMemory(userId, memory) { await db.collection('users').updateOne( { userId }, { $set: { memory } }, { upsert: true } ); } // Load memory from database async function loadMemory(userId) { const user = await db.collection('users').findOne({ userId }); return user?.memory || { shortTerm: [], longTerm: {} }; } // Example usage async function handleUserMessage(userId, message) { // Load existing memory const memory = await loadMemory(userId); // Create or update the assistant with the loaded memory const assistant = new MCPContext({ systemInstruction: "You are a helpful assistant.", userGoal: message, memory }); // Generate response const response = await generateResponse(assistant); // Update memory with the new interaction assistant.updateMemory({ shortTerm: [ ...assistant.memory.shortTerm, { type: "interaction", timestamp: new Date().toISOString(), user: message, assistant: response } ] }); // Save updated memory await saveMemory(userId, assistant.memory); return response; }
Beyond basic memory management, MCP supports several advanced techniques:
You can tag memory items to make them easier to retrieve and filter:
// Add tags to memory items assistant.updateMemory({ shortTerm: [ ...assistant.memory.shortTerm, { type: "interaction", tags: ["product_inquiry", "price_sensitive"], user: "Do you have any budget-friendly options?", assistant: "Yes, we have several options under €100." } ] }); // Filter memory by tags const priceSensitiveInteractions = assistant.memory.shortTerm .filter(item => item.tags?.includes("price_sensitive"));
You can assign importance levels to memory items to prioritize what should be included in the context:
// Add importance level to memory items assistant.updateMemory({ shortTerm: [ ...assistant.memory.shortTerm, { type: "interaction", importance: "high", user: "I have a latex allergy, so I need shoes without latex.", assistant: "I'll make sure to only recommend latex-free options." } ] }); // When compiling context, prioritize high-importance items function getContextMemory(assistant, maxItems = 10) { // Sort by importance and recency const sortedMemory = [...assistant.memory.shortTerm] .sort((a, b) => { // First by importance const importanceOrder = { high: 3, medium: 2, low: 1, undefined: 0 }; const importanceDiff = importanceOrder[b.importance] - importanceOrder[a.importance]; if (importanceDiff !== 0) return importanceDiff; // Then by recency (assuming items have timestamps) return new Date(b.timestamp) - new Date(a.timestamp); }); // Return the top items return sortedMemory.slice(0, maxItems); }
For more sophisticated retrieval, you can use embeddings to find relevant memory items:
// Store embeddings with memory items async function addMemoryWithEmbedding(assistant, interaction) { // Generate embedding for the interaction const text = interaction.user + " " + interaction.assistant; const embedding = await generateEmbedding(text); // Add to memory with embedding assistant.updateMemory({ shortTerm: [ ...assistant.memory.shortTerm, { ...interaction, embedding } ] }); } // Retrieve relevant memory items based on query async function getRelevantMemory(assistant, query, maxItems = 5) { // Generate embedding for the query const queryEmbedding = await generateEmbedding(query); // Find items with similar embeddings const itemsWithScores = assistant.memory.shortTerm .filter(item => item.embedding) .map(item => ({ item, score: cosineSimilarity(queryEmbedding, item.embedding) })) .sort((a, b) => b.score - a.score) .slice(0, maxItems); return itemsWithScores.map(({ item }) => item); }
Now that you understand memory management in MCP, you can: