MemoryStack is an AI memory system that provides persistent memory for intelligent agents. It enables AI agents to remember context, learn from interactions, and build knowledge graphs over time.

How does MemoryStack integrate with AI frameworks?

MemoryStack seamlessly integrates with popular AI frameworks including OpenAI, LangChain, CrewAI, LlamaIndex, LangGraph, and more through our Python and Node.js SDKs.

What are the key features of MemoryStack?

Key features include persistent memory storage, adaptive memory consolidation, knowledge graph visualization, semantic search, multi-agent support, RAG integration, and comprehensive API access.

Back to Documentation

Multimodal Support

Memory OS goes beyond text - store and retrieve memories from images, audio, documents, and more. Build AI that truly understands the world.

What is Multimodal Memory?

Multimodal memory means your AI can remember and understand information from multiple formats - not just text, but images, audio, documents, and video. This creates richer, more contextual AI experiences.

Just like humans remember faces, voices, and visual scenes, your AI can too.

Text-Only Memory

"User showed me a product screenshot"

Limited context - the AI can't actually see what was in the screenshot.

Multimodal Memory

"User showed product screenshot"

+ Image: Blue dashboard with analytics charts

Full context - the AI can see and understand the actual image content.

Supported Formats

Images

Store screenshots, photos, diagrams, and visual content with automatic vision analysis.

Formats:

JPG, PNG, GIF, WebP, SVG

Audio

Remember voice conversations, audio notes, and spoken content with transcription.

Formats:

MP3, WAV, M4A, OGG

Documents

Store PDFs, Word docs, spreadsheets with full text extraction and understanding.

Formats:

PDF, DOCX, TXT, CSV, XLSX

Video

Extract key frames and transcribe audio from video content for searchable memories.

Formats:

MP4, WebM, MOV

How It Works

Upload & Process

Upload any supported format. Memory OS automatically processes it using AI vision, speech-to-text, or document parsing.

Image → Vision AI → "Blue dashboard with 3 charts showing upward trends"

Extract & Embed

Content is extracted and converted into semantic embeddings, making it searchable by meaning.

Text + Image Description → Embeddings → Searchable Memory

Search & Retrieve

Search across all formats using natural language. Find images by describing what's in them, audio by what was said.

Query: "dashboard screenshots" → Finds all relevant images

Code Examples

Storing an Image Memory

from memorystack import MemoryStackClient

client = MemoryStackClient(api_key="your_api_key")

# Store image with context
result = client.add(
    "Product dashboard showing Q4 analytics",
    user_id="user_123",
    metadata={
        "type": "visual",
        "attachments": [{
            "type": "image",
            "url": "https://example.com/dashboard.png",
            "description": "Blue dashboard with 3 charts"
        }],
        "category": "product_design",
        "date": "2024-01-15"
    }
)

print(f"Stored memory: {result['id']}")
# Vision AI automatically analyzes the image

Storing Audio Memory

# Store voice note
result = client.add(
    "Meeting notes from product review",
    user_id="user_123",
    metadata={
        "type": "audio",
        "attachments": [{
            "type": "audio",
            "url": "https://example.com/meeting.mp3",
            "duration": 1800  # 30 minutes
        }],
        "participants": ["Alice", "Bob"],
        "meeting_type": "product_review"
    }
)

# Audio is automatically transcribed
# Transcription is embedded and searchable

Searching Multimodal Memories

# Search across all formats
results = client.search(
    "dashboard designs with charts",
    user_id="user_123"
)

# Filter by attachment type in results
image_memories = [m for m in results['results']
                  if 'image' in str(m.get('metadata', {}).get('attachments', []))]

# Search audio transcriptions
meeting_results = client.search(
    "API discussion",
    user_id="user_123"
)

# Process results
for mem in results['results']:
    print(f"Content: {mem['content']}")
    attachments = mem.get('metadata', {}).get('attachments', [])
    for att in attachments:
        print(f"  - {att['type']}: {att['url']}")
        if att.get('analysis'):
            print(f"    Analysis: {att['analysis']}")

Real-World Use Cases

🎨Design Assistant

Remember all design iterations, mockups, and feedback. Search by visual similarity or description.

Example:

"Show me all dashboard designs with dark mode" → Finds relevant screenshots

🎙️Meeting Assistant

Record and transcribe meetings automatically. Search through all past discussions by topic.

Example:

"What did we decide about the API pricing?" → Finds relevant meeting segments

📚Research Assistant

Store PDFs, papers, and documents. Search across all your research materials by concept.

Example:

"Papers about transformer architectures" → Finds relevant documents and excerpts

Best Practices

✅ Do

• Add descriptive text context with attachments
• Use appropriate memory types for different formats
• Include relevant metadata (date, category, etc.)
• Optimize image sizes before uploading
• Use transcription for searchable audio

❌ Don't

• Upload without any text description
• Store extremely large files (> 50MB)
• Forget to add searchable metadata
• Mix unrelated content in one memory
• Skip format validation

Next Steps

Memories

Learn about memory fundamentals

See Examples

Research assistant with documents

Start Building

Get started with the SDK