Memory Management

Learn how to manage short-term and long-term memory in your MCP applications.

Why Memory Matters

Memory is a crucial component of any AI assistant. It allows the assistant to:

  • Remember previous interactions in a conversation
  • Maintain user preferences and history over time
  • Provide personalized responses based on past context
  • Build long-term relationships with users through consistent interactions

MCP provides a structured approach to memory management, dividing memory into short-term and long-term components.

Memory Structure in MCP

In MCP, memory is typically structured as follows:

memory: {
  shortTerm: [
    // Recent interactions, conversation context
  ],
  longTerm: {
    // User preferences, history, and persistent information
  }
}

Short-Term Memory

Short-term memory holds recent interactions and conversation context. It's typically implemented as an array of messages or interactions.

shortTerm: [
  {
    type: "interaction",
    timestamp: "2025-05-08T14:30:00Z",
    user: "I'm looking for waterproof sneakers.",
    assistant: "What style and price range are you looking for?"
  },
  {
    type: "interaction",
    timestamp: "2025-05-08T14:31:00Z",
    user: "Minimalist style, under €150.",
    assistant: "I'll find some options for you."
  }
]

Long-Term Memory

Long-term memory stores persistent information about the user, such as preferences, history, and other data that should be remembered across sessions.

longTerm: {
  preferences: {
    style: ["minimalist", "neutral"],
    priceRange: "100-150",
    size: "EU 43"
  },
  purchaseHistory: [
    {
      product: "Nike Air Max",
      date: "2024-12-15",
      satisfaction: "high"
    }
  ],
  topics: ["running", "hiking", "casual wear"]
}

Working with Memory

MCP provides several methods for working with memory:

Initializing Memory

When creating a new MCP context, you can initialize the memory:

import { MCPContext } from '@modl/mcp';

const assistant = new MCPContext({
  systemInstruction: "You are a helpful assistant.",
  userGoal: "Find waterproof sneakers.",
  memory: {
    shortTerm: [],
    longTerm: {
      preferences: {
        style: ["minimalist", "neutral"],
        priceRange: "100-150"
      }
    }
  }
});

Updating Memory

After each interaction, you can update the memory to reflect new information:

// Add a new interaction to short-term memory
assistant.updateMemory({
  shortTerm: [
    ...assistant.memory.shortTerm,
    {
      type: "interaction",
      timestamp: new Date().toISOString(),
      user: "I prefer shoes with good arch support.",
      assistant: "I'll keep that in mind when recommending options."
    }
  ]
});

// Update long-term preferences
assistant.updateMemory({
  longTerm: {
    ...assistant.memory.longTerm,
    preferences: {
      ...assistant.memory.longTerm.preferences,
      features: ["arch support", "cushioning"]
    }
  }
});

Memory Windowing

To prevent the short-term memory from growing too large, you can implement a windowing strategy:

// Keep only the last 10 interactions
const MAX_INTERACTIONS = 10;

assistant.updateMemory({
  shortTerm: assistant.memory.shortTerm
    .slice(-MAX_INTERACTIONS) // Keep only the most recent interactions
    .concat([newInteraction]) // Add the new interaction
});

Summarizing Memory

For longer conversations, you can summarize older interactions to save space while preserving context:

// Summarize older interactions
async function summarizeOlderInteractions(assistant) {
  if (assistant.memory.shortTerm.length > 20) {
    // Get the older interactions to summarize
    const olderInteractions = assistant.memory.shortTerm.slice(0, 15);
    
    // Use the LLM to generate a summary
    const summary = await generateSummary(olderInteractions);
    
    // Update the memory with the summary and recent interactions
    assistant.updateMemory({
      shortTerm: [
        { type: "summary", content: summary },
        ...assistant.memory.shortTerm.slice(15)
      ]
    });
  }
}

Persisting Memory

To maintain memory across sessions, you'll need to persist it to a database or other storage:

// Save memory to database
async function saveMemory(userId, memory) {
  await db.collection('users').updateOne(
    { userId },
    { $set: { memory } },
    { upsert: true }
  );
}

// Load memory from database
async function loadMemory(userId) {
  const user = await db.collection('users').findOne({ userId });
  return user?.memory || { shortTerm: [], longTerm: {} };
}

// Example usage
async function handleUserMessage(userId, message) {
  // Load existing memory
  const memory = await loadMemory(userId);
  
  // Create or update the assistant with the loaded memory
  const assistant = new MCPContext({
    systemInstruction: "You are a helpful assistant.",
    userGoal: message,
    memory
  });
  
  // Generate response
  const response = await generateResponse(assistant);
  
  // Update memory with the new interaction
  assistant.updateMemory({
    shortTerm: [
      ...assistant.memory.shortTerm,
      {
        type: "interaction",
        timestamp: new Date().toISOString(),
        user: message,
        assistant: response
      }
    ]
  });
  
  // Save updated memory
  await saveMemory(userId, assistant.memory);
  
  return response;
}

Advanced Memory Techniques

Beyond basic memory management, MCP supports several advanced techniques:

Memory Tagging

You can tag memory items to make them easier to retrieve and filter:

// Add tags to memory items
assistant.updateMemory({
  shortTerm: [
    ...assistant.memory.shortTerm,
    {
      type: "interaction",
      tags: ["product_inquiry", "price_sensitive"],
      user: "Do you have any budget-friendly options?",
      assistant: "Yes, we have several options under €100."
    }
  ]
});

// Filter memory by tags
const priceSensitiveInteractions = assistant.memory.shortTerm
  .filter(item => item.tags?.includes("price_sensitive"));

Memory Importance

You can assign importance levels to memory items to prioritize what should be included in the context:

// Add importance level to memory items
assistant.updateMemory({
  shortTerm: [
    ...assistant.memory.shortTerm,
    {
      type: "interaction",
      importance: "high",
      user: "I have a latex allergy, so I need shoes without latex.",
      assistant: "I'll make sure to only recommend latex-free options."
    }
  ]
});

// When compiling context, prioritize high-importance items
function getContextMemory(assistant, maxItems = 10) {
  // Sort by importance and recency
  const sortedMemory = [...assistant.memory.shortTerm]
    .sort((a, b) => {
      // First by importance
      const importanceOrder = { high: 3, medium: 2, low: 1, undefined: 0 };
      const importanceDiff = importanceOrder[b.importance] - importanceOrder[a.importance];
      
      if (importanceDiff !== 0) return importanceDiff;
      
      // Then by recency (assuming items have timestamps)
      return new Date(b.timestamp) - new Date(a.timestamp);
    });
  
  // Return the top items
  return sortedMemory.slice(0, maxItems);
}

Memory Embeddings

For more sophisticated retrieval, you can use embeddings to find relevant memory items:

// Store embeddings with memory items
async function addMemoryWithEmbedding(assistant, interaction) {
  // Generate embedding for the interaction
  const text = interaction.user + " " + interaction.assistant;
  const embedding = await generateEmbedding(text);
  
  // Add to memory with embedding
  assistant.updateMemory({
    shortTerm: [
      ...assistant.memory.shortTerm,
      {
        ...interaction,
        embedding
      }
    ]
  });
}

// Retrieve relevant memory items based on query
async function getRelevantMemory(assistant, query, maxItems = 5) {
  // Generate embedding for the query
  const queryEmbedding = await generateEmbedding(query);
  
  // Find items with similar embeddings
  const itemsWithScores = assistant.memory.shortTerm
    .filter(item => item.embedding)
    .map(item => ({
      item,
      score: cosineSimilarity(queryEmbedding, item.embedding)
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, maxItems);
  
  return itemsWithScores.map(({ item }) => item);
}

Next Steps

Now that you understand memory management in MCP, you can: