MemoryStackMemoryStack/Documentation
    Back to Documentation

    Dr. Chen's Research Assistant: AI That Learns & Synthesizes

    Follow Dr. Chen's journey from drowning in 200+ research papers to having an AI assistant that reads, remembers, connects ideas, and builds a living knowledge graph.

    The PhD Student's Nightmare

    Dr. Chen is 6 months into her PhD in machine learning. She has 237 papers in her reading list. Every paper references 20 more papers. Some papers contradict each other. She can't remember which paper said what. Her literature review is due in 3 weeks.

    āŒ
    Information overload: Can't remember which paper contained which finding
    āŒ
    Contradictory findings: Paper A says X, Paper B says NOT X - which is right?
    āŒ
    Missing connections: Can't see how ideas from different papers relate
    āŒ
    Re-reading constantly: Wastes hours finding that one quote she needs

    The Solution: A Research Assistant That Learns

    An AI that reads every paper, extracts key findings, detects contradictions, builds connections, and gets smarter with each paper it reads. It's like having a photographic memory combined with a genius for synthesis.

    What Makes This Assistant Special

    Perfect Recall

    Never forgets a single finding, citation, or methodology from any paper it reads.

    Contradiction Detection

    Automatically spots when new papers contradict previous findings and flags them for review.

    Knowledge Graph

    Builds connections between ideas, showing how concepts from different papers relate.

    šŸ”‘ The Secret: Evolving Memory

    Unlike a static database, this assistant's understanding evolves. When it reads Paper 50, it might realize that Papers 12 and 35 are actually talking about the same concept. It updates its beliefs, strengthens connections, and gets smarter over time.

    Dr. Chen's Journey: Week by Week

    W1

    Week 1: First 20 Papers

    Building the foundation

    Dr. Chen: "Read this paper on transformer architectures and extract the key findings."

    Assistant: "I've extracted 7 key findings, including the attention mechanism innovation. I've also noted 12 citations that might be relevant to your research."

    🧠 What's Stored in Memory:

    • • Paper: "Attention Is All You Need" (Vaswani et al., 2017)
    • • Key Finding: Self-attention mechanism eliminates recurrence
    • • Methodology: Encoder-decoder architecture
    • • Results: SOTA on WMT translation tasks
    • • 12 related papers flagged for reading
    W2

    Week 2: The First Contradiction

    Memory detects conflicting findings

    Assistant: "āš ļø Contradiction detected! This paper claims that larger batch sizes always improve training, but Paper #15 (Smith et al.) found that batch sizes above 512 hurt generalization. Would you like me to analyze both papers in detail?"

    Dr. Chen: "Yes! This is exactly what I needed to know!"

    🧠 Contradiction Resolution:

    • • Detected conflicting claims about batch size
    • • Analyzed contexts: Paper A used CNNs, Paper B used Transformers
    • • Updated belief: "Optimal batch size depends on architecture"
    • • Created connection between both papers
    • • Flagged for inclusion in literature review
    W3

    Week 3: Synthesis & Insights

    The assistant connects the dots

    Dr. Chen: "What are the main approaches to improving transformer efficiency?"

    Assistant: "Based on 47 papers I've read, there are 4 main approaches: 1) Sparse attention (Papers #23, #31, #44), 2) Linear attention (Papers #18, #29), 3) Low-rank approximations (Papers #12, #35, #41), 4) Mixture of Experts (Papers #8, #27). I've noticed that approaches 1 and 3 are often combined for best results."

    🧠 Knowledge Synthesis:

    • • Identified 4 major research directions
    • • Grouped 47 papers by approach
    • • Discovered that sparse + low-rank is common pattern
    • • Generated insight: "Hybrid approaches show 30% better results"
    • • This insight wasn't in any single paper - it emerged from synthesis!

    ✨ The Result

    Dr. Chen completed her literature review in 3 weeks instead of 3 months. More importantly, her review contained insights that emerged from synthesizing 200+ papers - insights that no single paper contained.

    237
    Papers processed
    23
    Contradictions detected
    156
    Connections discovered

    Building Your Research Assistant

    Step 1: The Core Assistant Class

    First, let's create the foundation that handles paper ingestion and memory:

    from memorystack import MemoryStackClient
    from openai import OpenAI
    import os
    from datetime import datetime
    from typing import List, Dict, Optional
    import json
    
    class ResearchAssistant:
        """An AI research assistant that reads, remembers, and synthesizes"""
        
        def __init__(self, researcher_id: str, research_topic: str):
            self.researcher_id = researcher_id
            self.research_topic = research_topic
            
            # Initialize Memorystack
            self.memory = MemoryStackClient(
                api_key=os.getenv("MEMORYSTACK_API_KEY"),
                user_id=researcher_id
            )
            
            # Initialize AI
            self.ai = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
            
            print(f"šŸ“š Research Assistant initialized for: {research_topic}")
        
        def ingest_paper(self, paper_text: str, paper_metadata: Dict) -> Dict:
            """Read and process a research paper"""
            print(f"\nšŸ“– Reading: {paper_metadata.get('title', 'Untitled')}")
            
            # Step 1: Extract key findings
            findings = self._extract_findings(paper_text, paper_metadata)
            
            # Step 2: Store findings in memory
            self._store_findings(findings, paper_metadata)
            
            # Step 3: Detect contradictions with existing knowledge
            contradictions = self._detect_contradictions(findings)
            
            # Step 4: Build connections with other papers
            connections = self._build_connections(findings)
            
            return {
                "findings": findings,
                "contradictions": contradictions,
                "connections": connections,
                "papers_read": self._get_paper_count()
            }

    Step 2: Extracting Key Findings

    Use AI to extract structured information from papers:

        def _extract_findings(self, paper_text: str, metadata: Dict) -> List[Dict]:
            """Extract key findings from a paper"""
            
            extraction_prompt = f"""Analyze this research paper and extract:
    1. Main hypothesis/research question
    2. Key findings (3-5 most important)
    3. Methodology used
    4. Results/conclusions
    5. Limitations mentioned
    6. Future work suggested
    
    Paper Title: {metadata.get('title', 'Unknown')}
    Authors: {metadata.get('authors', 'Unknown')}
    Year: {metadata.get('year', 'Unknown')}
    
    Paper Text:
    {paper_text[:4000]}  # First 4000 chars
    
    Return as JSON with structure:
    {{
      "hypothesis": "...",
      "findings": ["finding1", "finding2", ...],
      "methodology": "...",
      "results": "...",
      "limitations": ["limitation1", ...],
      "future_work": ["direction1", ...]
    }}"""
            
            response = self.ai.chat.completions.create(
                model="gpt-4",
                messages=[{
                    "role": "system",
                    "content": "You are a research paper analyzer. Extract structured information."
                }, {
                    "role": "user",
                    "content": extraction_prompt
                }],
                response_format={"type": "json_object"}
            )
            
            try:
                findings = json.loads(response.choices[0].message.content)
                print(f"āœ“ Extracted {len(findings.get('findings', []))} key findings")
                return findings
            except:
                return {"findings": [], "error": "Extraction failed"}
        
        def _store_findings(self, findings: Dict, metadata: Dict):
            """Store findings in memory with proper structure"""
            
            paper_id = f"{metadata.get('authors', 'Unknown')}_{metadata.get('year', '0000')}"
            
            # Store the paper itself
            self.memory.create_memory(
                content=f"Paper: {metadata.get('title')} by {metadata.get('authors')} ({metadata.get('year')})",
                memory_type="paper",
                metadata={
                    "paper_id": paper_id,
                    "title": metadata.get('title'),
                    "authors": metadata.get('authors'),
                    "year": metadata.get('year'),
                    "venue": metadata.get('venue', 'Unknown')
                }
            )
            
            # Store each finding separately
            for i, finding in enumerate(findings.get('findings', [])):
                self.memory.create_memory(
                    content=f"Finding from {paper_id}: {finding}",
                    memory_type="finding",
                    metadata={
                        "paper_id": paper_id,
                        "finding_index": i,
                        "confidence": "high"  # Could be extracted from paper
                    }
                )
            
            # Store methodology
            if findings.get('methodology'):
                self.memory.create_memory(
                    content=f"Methodology in {paper_id}: {findings['methodology']}",
                    memory_type="methodology",
                    metadata={"paper_id": paper_id}
                )
            
            print(f"āœ“ Stored {len(findings.get('findings', []))} findings in memory")

    Step 3: Detecting Contradictions

    The magic happens here - automatically detecting conflicting findings:

        def _detect_contradictions(self, new_findings: Dict) -> List[Dict]:
            """Detect if new findings contradict existing knowledge"""
            
            contradictions = []
            
            for finding in new_findings.get('findings', []):
                # Search for related findings in memory
                related = self.memory.search_memories(
                    query=finding,
                    memory_type="finding",
                    limit=10
                )
                
                # Use AI to detect contradictions
                for existing in related.get('results', []):
                    existing_content = existing.get('content', '')
                    
                    analysis = self.ai.chat.completions.create(
                        model="gpt-4",
                        messages=[{
                            "role": "system",
                            "content": """Analyze if these two findings contradict each other.
    Return JSON: {{"contradicts": true/false, "explanation": "...", "severity": "high/medium/low"}}"""
                        }, {
                            "role": "user",
                            "content": f"Finding 1: {finding}\nFinding 2: {existing_content}"
                        }],
                        response_format={"type": "json_object"}
                    )
                    
                    try:
                        result = json.loads(analysis.choices[0].message.content)
                        if result.get('contradicts'):
                            contradiction = {
                                "new_finding": finding,
                                "existing_finding": existing_content,
                                "explanation": result.get('explanation'),
                                "severity": result.get('severity', 'medium')
                            }
                            contradictions.append(contradiction)
                            
                            # Store the contradiction
                            self.memory.create_memory(
                                content=f"Contradiction: {finding} vs {existing_content}",
                                memory_type="contradiction",
                                metadata={
                                    "severity": result.get('severity'),
                                    "requires_review": True
                                }
                            )
                            
                            print(f"āš ļø  Contradiction detected: {result.get('severity')} severity")
                    except:
                        pass
            
            return contradictions

    Step 4: Building Connections

    Connect ideas across papers to build a knowledge graph:

        def _build_connections(self, new_findings: Dict) -> List[Dict]:
            """Build connections between this paper and existing knowledge"""
            
            connections = []
            
            for finding in new_findings.get('findings', []):
                # Find semantically similar findings
                similar = self.memory.search_memories(
                    query=finding,
                    memory_type="finding",
                    limit=5
                )
                
                for related in similar.get('results', []):
                    # Analyze the relationship
                    relationship = self.ai.chat.completions.create(
                        model="gpt-4",
                        messages=[{
                            "role": "system",
                            "content": """Analyze the relationship between these findings.
    Return JSON: {{
      "relationship_type": "supports/extends/applies/contradicts/unrelated",
      "strength": "strong/medium/weak",
      "explanation": "..."
    }}"""
                        }, {
                            "role": "user",
                            "content": f"Finding 1: {finding}\nFinding 2: {related.get('content')}"
                        }],
                        response_format={"type": "json_object"}
                    )
                    
                    try:
                        rel = json.loads(relationship.choices[0].message.content)
                        
                        if rel.get('relationship_type') != 'unrelated':
                            connection = {
                                "from": finding,
                                "to": related.get('content'),
                                "type": rel.get('relationship_type'),
                                "strength": rel.get('strength'),
                                "explanation": rel.get('explanation')
                            }
                            connections.append(connection)
                            
                            # Store the connection
                            self.memory.create_memory(
                                content=f"Connection: {finding} {rel.get('relationship_type')} {related.get('content')}",
                                memory_type="connection",
                                metadata={
                                    "type": rel.get('relationship_type'),
                                    "strength": rel.get('strength')
                                }
                            )
                            
                            print(f"šŸ”— Connection found: {rel.get('relationship_type')} ({rel.get('strength')})")
                    except:
                        pass
            
            return connections
        
        def _get_paper_count(self) -> int:
            """Get total number of papers read"""
            papers = self.memory.search_memories(
                query="",
                memory_type="paper",
                limit=1000
            )
            return len(papers.get('results', []))

    Step 5: Synthesis & Querying

    Answer complex questions by synthesizing across all papers:

        def answer_research_question(self, question: str) -> str:
            """Answer a research question by synthesizing knowledge"""
            
            print(f"\nšŸ¤” Question: {question}")
            
            # Step 1: Retrieve relevant findings
            relevant_findings = self.memory.search_memories(
                query=question,
                memory_type="finding",
                limit=20
            )
            
            # Step 2: Get related papers
            relevant_papers = self.memory.search_memories(
                query=question,
                memory_type="paper",
                limit=10
            )
            
            # Step 3: Check for contradictions on this topic
            contradictions = self.memory.search_memories(
                query=question,
                memory_type="contradiction",
                limit=5
            )
            
            # Step 4: Build context
            context = {
                "findings": [f.get('content') for f in relevant_findings.get('results', [])],
                "papers": [p.get('content') for p in relevant_papers.get('results', [])],
                "contradictions": [c.get('content') for c in contradictions.get('results', [])]
            }
            
            # Step 5: Generate synthesis
            synthesis_prompt = f"""You are a research assistant with deep knowledge of {self.research_topic}.
    
    Question: {question}
    
    Relevant Findings ({len(context['findings'])}):
    {chr(10).join(context['findings'][:15])}
    
    Papers Consulted ({len(context['papers'])}):
    {chr(10).join(context['papers'])}
    
    Known Contradictions:
    {chr(10).join(context['contradictions']) if context['contradictions'] else 'None'}
    
    Provide a comprehensive answer that:
    1. Synthesizes findings from multiple papers
    2. Notes any contradictions or debates
    3. Cites specific papers
    4. Identifies gaps in current knowledge
    5. Suggests future research directions"""
            
            response = self.ai.chat.completions.create(
                model="gpt-4",
                messages=[{
                    "role": "system",
                    "content": "You are a research synthesis expert. Provide detailed, well-cited answers."
                }, {
                    "role": "user",
                    "content": synthesis_prompt
                }]
            )
            
            answer = response.choices[0].message.content
            
            # Store the Q&A for future reference
            self.memory.create_memory(
                content=f"Q: {question}\nA: {answer}",
                memory_type="synthesis",
                metadata={
                    "papers_consulted": len(context['papers']),
                    "findings_used": len(context['findings'])
                }
            )
            
            return answer
        
        def get_research_summary(self) -> Dict:
            """Get a summary of all research knowledge"""
            
            # Get counts
            papers = self.memory.search_memories(query="", memory_type="paper", limit=1000)
            findings = self.memory.search_memories(query="", memory_type="finding", limit=1000)
            contradictions = self.memory.search_memories(query="", memory_type="contradiction", limit=100)
            connections = self.memory.search_memories(query="", memory_type="connection", limit=1000)
            
            return {
                "papers_read": len(papers.get('results', [])),
                "findings_extracted": len(findings.get('results', [])),
                "contradictions_found": len(contradictions.get('results', [])),
                "connections_made": len(connections.get('results', [])),
                "research_topic": self.research_topic
            }
        
        def get_contradictions_report(self) -> List[Dict]:
            """Get all unresolved contradictions"""
            
            contradictions = self.memory.search_memories(
                query="",
                memory_type="contradiction",
                limit=100
            )
            
            report = []
            for c in contradictions.get('results', []):
                if c.get('metadata', {}).get('requires_review'):
                    report.append({
                        "contradiction": c.get('content'),
                        "severity": c.get('metadata', {}).get('severity', 'unknown'),
                        "id": c.get('id')
                    })
            
            return report
        
        def generate_literature_review(self, section: str) -> str:
            """Generate a literature review section"""
            
            print(f"\nšŸ“ Generating literature review for: {section}")
            
            # Get all relevant content
            findings = self.memory.search_memories(
                query=section,
                memory_type="finding",
                limit=50
            )
            
            papers = self.memory.search_memories(
                query=section,
                memory_type="paper",
                limit=30
            )
            
            connections = self.memory.search_memories(
                query=section,
                memory_type="connection",
                limit=20
            )
            
            # Generate review
            review_prompt = f"""Generate a literature review section on: {section}
    
    Available Research:
    - {len(findings.get('results', []))} relevant findings
    - {len(papers.get('results', []))} papers
    - {len(connections.get('results', []))} connections between ideas
    
    Findings:
    {chr(10).join([f.get('content', '') for f in findings.get('results', [])][:30])}
    
    Papers:
    {chr(10).join([p.get('content', '') for p in papers.get('results', [])][:20])}
    
    Write a comprehensive literature review that:
    1. Organizes findings thematically
    2. Shows evolution of ideas
    3. Identifies key debates
    4. Synthesizes across papers
    5. Uses proper academic citations"""
            
            response = self.ai.chat.completions.create(
                model="gpt-4",
                messages=[{
                    "role": "system",
                    "content": "You are an academic writer specializing in literature reviews."
                }, {
                    "role": "user",
                    "content": review_prompt
                }]
            )
            
            return response.choices[0].message.content

    Complete Usage Example

    Dr. Chen's Complete Workflow

    # Initialize the research assistant
    assistant = ResearchAssistant(
        researcher_id="dr_chen_phd",
        research_topic="Efficient Transformer Architectures"
    )
    
    # Ingest papers
    papers = [
        {
            "text": "... full paper text ...",
            "metadata": {
                "title": "Attention Is All You Need",
                "authors": "Vaswani et al.",
                "year": 2017,
                "venue": "NeurIPS"
            }
        },
        # ... 236 more papers
    ]
    
    print("šŸ“š Starting paper ingestion...")
    for i, paper in enumerate(papers):
        result = assistant.ingest_paper(
            paper_text=paper["text"],
            paper_metadata=paper["metadata"]
        )
        
        print(f"\nPaper {i+1}/{len(papers)}")
        print(f"  Findings: {len(result['findings'].get('findings', []))}")
        print(f"  Contradictions: {len(result['contradictions'])}")
        print(f"  Connections: {len(result['connections'])}")
        
        # Alert on contradictions
        if result['contradictions']:
            print(f"  āš ļø  {len(result['contradictions'])} contradictions detected!")
            for c in result['contradictions']:
                print(f"     - {c['explanation']}")
    
    # Get research summary
    summary = assistant.get_research_summary()
    print(f"\nšŸ“Š Research Summary:")
    print(f"  Papers Read: {summary['papers_read']}")
    print(f"  Findings Extracted: {summary['findings_extracted']}")
    print(f"  Contradictions Found: {summary['contradictions_found']}")
    print(f"  Connections Made: {summary['connections_made']}")
    
    # Answer research questions
    questions = [
        "What are the main approaches to improving transformer efficiency?",
        "How does batch size affect training of large language models?",
        "What are the trade-offs between sparse and dense attention?"
    ]
    
    for question in questions:
        answer = assistant.answer_research_question(question)
        print(f"\nQ: {question}")
        print(f"A: {answer}")
    
    # Check contradictions
    contradictions = assistant.get_contradictions_report()
    print(f"\nāš ļø  Unresolved Contradictions: {len(contradictions)}")
    for c in contradictions:
        print(f"  [{c['severity']}] {c['contradiction']}")
    
    # Generate literature review sections
    sections = [
        "Attention Mechanisms",
        "Efficiency Improvements",
        "Training Dynamics"
    ]
    
    for section in sections:
        review = assistant.generate_literature_review(section)
        print(f"\nšŸ“ Literature Review - {section}")
        print(review)
        print("\n" + "="*60)

    Expected Output

    šŸ“š Starting paper ingestion...

    šŸ“– Reading: Attention Is All You Need

    āœ“ Extracted 7 key findings

    āœ“ Stored 7 findings in memory

    šŸ”— Connection found: extends (strong)

    ...

    šŸ“– Reading: Efficient Transformers: A Survey

    āš ļø Contradiction detected: medium severity

    - This paper claims linear attention is always faster, but Paper #23 found quadratic attention faster for sequences < 512 tokens

    ...

    šŸ“Š Research Summary:

    Papers Read: 237

    Findings Extracted: 1,456

    Contradictions Found: 23

    Connections Made: 156

    Why Memory Makes This Possible

    1. Evolving Understanding

    The assistant's understanding evolves as it reads more papers. Paper 100 might reveal that Papers 12 and 45 are actually describing the same concept with different terminology. Memory allows it to update and strengthen these connections.

    2. Contradiction Detection

    Without memory, each paper would be processed in isolation. With memory, the assistant can compare new findings against everything it's learned, automatically flagging contradictions that would take humans weeks to discover.

    3. Emergent Insights

    The most powerful feature: insights that emerge from synthesis. No single paper says "hybrid approaches combining sparse and low-rank attention show 30% better results" - but by analyzing 47 papers, the assistant discovers this pattern.

    4. Perfect Recall with Context

    When answering "What did Paper X say about Y?", the assistant doesn't just retrieve the text - it retrieves it with full context: related findings, contradictions, connections to other papers, and the evolution of understanding over time.

    Real-World Impact

    Before: Traditional Literature Review

    āŒ
    3-6 months to review 200+ papers
    āŒ
    Miss contradictions across papers
    āŒ
    Forget details from early papers
    āŒ
    Limited synthesis across domains

    After: Memory-Powered Assistant

    āœ“
    3 weeks to process 200+ papers
    āœ“
    Automatically detect all contradictions
    āœ“
    Perfect recall of every detail
    āœ“
    Discover emergent insights

    Time Saved: 10x Faster

    90%
    Faster literature review
    100%
    Contradiction detection
    āˆž
    Perfect recall

    Build Your Research Assistant