Case Study··12 min read

    Leveraging MemoryStack for Autonomous Scientific Discovery

    How persistent AI memory powered the Nikolas agent to autonomously discover a novel CO₂ reduction catalyst, and how you can integrate MemoryStack into your own research workflows.

    Nikolas × MemoryStack

    Scientific discovery has always been a deeply iterative process. A researcher formulates a hypothesis, runs an experiment, examines the results, adjusts their thinking, and tries again. This cycle can take years, sometimes decades, because each step depends on accumulated knowledge from every step before it.

    AI agents today have a fatal flaw for this kind of work: they have no memory.

    A large language model can reason about chemistry, parse papers, and even generate molecular structures. But the moment a new prompt begins, everything it learned from the previous interaction is gone. It can't build intuition. It can't recall that "last time, iron-based ligands with nitrogen donors showed promising binding." It starts over, every single time.

    What if your research agent could remember?

    Imagine an AI that doesn't just process your data — it learns from it. Not in the fine-tuning sense, but in the practical sense: it remembers what worked, what didn't, and why. It builds up domain knowledge over the course of an experiment, much like a postdoc growing into an expert over years of lab work, except it does it in hours.

    That's exactly what we built with MemoryStack. And to prove it works in one of the hardest domains imaginable (computational chemistry) we partnered with a research team building Nikolas, a semi-autonomous agent for catalyst discovery.

    Case Study: Nikolas discovers a CO₂ reduction catalyst

    The challenge was deceptively simple: find a transition-metal catalyst that binds CO₂ with the right energy: not too strong (it poisons the catalyst), not too weak (no reaction happens). This is the Sabatier principle, and it's a needle-in-a-haystack problem across an infinite chemical space.

    The PRISM Architecture + MemoryStack

    Nikolas uses the PRISM (Persistent Recursive Intelligence with Structured Metacognition) framework, a 5-layer meta-cognitive architecture. MemoryStack serves as its memory backbone, handling:

    1

    Experimental memory

    Every candidate screened, every binding energy computed, stored with full context (structure, method, result)

    2

    Strategic memory

    High-level insights like "Fe-based complexes with pyridine ligands show strong CO₂ affinity" are automatically extracted and stored

    3

    Failure memory

    Candidates that failed validation (unstable structures, energies outside the Sabatier window) are remembered so the agent never repeats mistakes

    4

    Cross-session continuity

    The agent can be stopped and resumed without losing any knowledge, even days, weeks, or months later

    The pipeline worked like this: Nikolas, powered by Google's Gemini 2.5 Flash, would propose candidate catalyst structures. Each was validated with RDKit (a molecular informatics library), and if structurally valid, submitted to GFN2-xTB for quantum-mechanical energy calculations.

    But the critical difference wasn't the compute, it was the memory. After each batch of candidates, MemoryStack stored the results. When the next batch was generated, the LLM was prompted with its own accumulated knowledge: "In previous screens, Mn-based catalysts showed weak binding (-4 to -6 kcal/mol). Fe-based complexes with N-donor ligands performed better (-12 to -16 kcal/mol)."

    This created a virtuous cycle. The agent developed what the paper calls "emergent domain intuition". It wasn't just screening randomly; it was learning which chemical features mattered.

    The result

    Fe(2-pyridine)₂
    Champion catalyst identified
    −15.45
    kcal/mol CO₂ binding energy
    Zero
    Human intervention required

    Nikolas autonomously identified Fe(2-pyridine)₂(PPh₃)₂ as a champion catalyst, with an optimal CO₂ binding energy of -15.45 kcal/mol, right in the Sabatier sweet spot. The entire discovery happened without a single human tweak to the screening process.

    Without memory, the same agent would have had to rely on whatever knowledge was baked into the base model. With MemoryStack, it built a compounding advantage with every experiment it ran.

    Why this matters beyond chemistry

    The Nikolas case study is about catalysts, but the pattern is universal. Any multi-step research workflow (drug discovery, materials science, genomics, climate modeling) suffers from the same bottleneck: AI agents that can't accumulate knowledge across runs.

    Drug Discovery

    Screen thousands of compounds, remember which scaffolds showed activity, automatically narrow the search space over iterations.

    Materials Science

    Design alloys or polymers where the agent retains knowledge of prior composition-property relationships across experiment campaigns.

    Genomics

    Annotate gene functions, remember prior annotations and their confidence levels, build up a coherent knowledge base over time.

    Climate Modeling

    Run sensitivity analyses where the agent recalls prior parameter sweeps and focuses on the most impactful variables.

    How to integrate MemoryStack into your research workflow

    You don't need to rebuild your entire pipeline. MemoryStack is designed to drop into existing workflows with minimal code. Here's a practical guide.

    Step 1: Install the SDK

    terminal
    pip install memorystack
    # or
    npm install @memorystack/sdk

    Step 2: Store experiment results as memories

    After each experiment iteration, store the results. MemoryStack automatically extracts key facts and indexes them for semantic search.

    research_agent.py
    from memorystack import MemoryStack
    
    client = MemoryStack(api_key="your-api-key")
    
    # After running a simulation
    result = run_xtb_calculation(candidate_smiles)
    
    # Store the result as a memory
    client.memories.create(
        content=f"""
        Candidate: {candidate_smiles}
        Method: GFN2-xTB optimization + CO2 binding
        Binding Energy: {result.binding_energy} kcal/mol
        Stability: {result.is_stable}
        Verdict: {'Promising' if -20 < result.binding_energy < -10 else 'Rejected'}
        """,
        user_id="nikolas-agent",
        metadata={
            "experiment": "co2-catalyst-screen",
            "iteration": current_iteration,
            "candidate": candidate_smiles,
        }
    )

    Step 3: Recall relevant knowledge before generating new candidates

    Before asking the LLM to propose new candidates, search MemoryStack for relevant prior results. This is the key step that gives your agent context.

    research_agent.py
    # Before generating the next batch of candidates
    prior_knowledge = client.memories.search(
        query="What catalyst structures showed optimal CO2 binding energies?",
        user_id="nikolas-agent",
        limit=20,
    )
    
    # Inject into your LLM prompt
    prompt = f"""
    You are a computational chemist designing CO2 reduction catalysts.
    
    Here is what you have learned from previous experiments:
    {chr(10).join([m.content for m in prior_knowledge])}
    
    Based on these findings, propose 5 new candidate structures
    that are likely to have binding energies in the -10 to -20 kcal/mol range.
    Explain your reasoning for each.
    """
    
    response = llm.generate(prompt)

    Step 4: Let the loop run

    That's it. Steps 2 and 3 repeat in a loop. Each iteration, the agent gets smarter because it has more memories to draw from. Over time, the proposals become increasingly targeted and the hit rate goes up, which is exactly what happened with Nikolas.

    Beyond basic storage: what makes MemoryStack different

    You might think "I can just dump results into a vector database." You can. But scientific workflows need more than storage. Here's what MemoryStack handles that a raw vector DB doesn't:

    1

    Automatic fact extraction

    You send raw text ("Fe-pyridine complex showed -15.45 kcal/mol binding"). MemoryStack extracts the structured facts and relationships. No manual parsing.

    2

    Knowledge graph

    Entities like "Fe-pyridine", "CO₂ binding", and "Sabatier principle" are linked in a graph. Your agent can traverse relationships, not just search by similarity.

    3

    Contradiction detection

    If a new result contradicts a prior one (e.g., a re-calculated binding energy is different), MemoryStack flags the conflict and updates accordingly.

    4

    Memory consolidation

    Over time, many granular memories are consolidated into higher-level insights. 50 individual screening results become "Iron complexes with nitrogen donors consistently outperform cobalt-based alternatives."

    5

    Multi-agent support

    If you're running multiple specialized agents (one for structure generation, one for energy computation, one for literature search), they can share a memory space and hand off context seamlessly.

    A note on reproducibility

    One concern in AI-driven research is reproducibility. Because MemoryStack stores every memory with timestamps, metadata, and provenance, you get a complete audit trail. You can answer: "What did the agent know at step 42 when it proposed this candidate?" and replay the exact same prompt with the exact same context.

    This is fundamentally different from fine-tuning (where knowledge is baked into weights and can't be inspected) or raw context-window stuffing (where what's included depends on prompt engineering). MemoryStack gives you transparent, queryable, versioned knowledge.

    Start building today

    Whether you're working on catalyst design, protein folding, or climate simulation, the pattern is the same: give your agent memory, and watch it develop intuition.

    MemoryStack's free tier (500 memories + 2,000 search queries/month) is more than enough to prototype a research agent. No credit card required!

    Ready to give your research agent a memory?

    Start with our quickstart guide, you'll have your first memory stored in under 5 minutes.