AI Engineering Part 1 of 3

Two Kinds of Memory: Why Your AI Agent Needs Thread State and Profile State

Baljeet Dogra Baljeet Dogra
8 min read

Most teams building a conversational AI agent start with one memory store. Every fact the agent has ever learned about a person goes into the same bucket. It works fine in a demo. It degrades fast in production—and the fix is recognising that memory is actually two different jobs with different lifespans, update frequencies, and failure tolerances.

Most teams building a conversational AI agent start with one memory store. Every fact the agent has ever learned about a person—what they asked last week, what they're asking right now, their account tier, their tone in a complaint six months ago—goes into the same bucket. It works fine in a demo. It degrades fast in production, and it degrades in three specific, predictable ways.

How a single memory bucket fails

It grows without bound

Every turn adds to the context you have to send the model. By turn fifty, you're paying to re-read turn three even though nothing in turn three is relevant anymore.

It loses precision over time

If a user tells you their shipping address twice—once correctly, once with a typo—an undifferentiated store has no concept of "current" versus "historical." It just has two facts and no way to rank them.

It leaks state across contexts that shouldn't touch

A user who was frustrated in last month's billing dispute shouldn't have that frustration colour the agent's tone when they ask an unrelated product question today. But if everything lives in one memory, the model has no structural signal telling it which facts belong to this conversation and which are background.

The fix isn't a bigger context window or a smarter retrieval query over the single bucket. It's recognising that "memory" is actually two different jobs, with different lifespans, different update frequencies, and different failure tolerances—and giving each one its own store.

The two layers

Thread state

Ephemeral and scoped to a single conversation. Exists for as long as that conversation is active and can be thrown away (or archived cold) once the thread closes.

  • A rolling summary of what's been discussed in this thread
  • Open action items or commitments made during this conversation
  • The current inferred intent
  • Metadata about the most recent message

Profile state

Durable and scoped to the entity—a person, an account, an organisation—not to any one conversation. Updated conservatively and survives across every thread that entity ever has with the agent.

  • Stable facts: name, account tier, verified contact details, preferences
  • Pointers to past threads (IDs, dates, one-line topic labels)
  • Commitments fulfilled or outstanding across threads

The distinction that matters isn't "short vs long" in some vague sense—it's lifespan and confidence. Thread state is allowed to be wrong, incomplete, or thrown away. Profile state is supposed to be the small set of things you're confident enough about to carry forward indefinitely.

What this looks like as data

// thread_state — keyed by thread_id, ephemeral
{
  "thread_id": "t_8f12a3",
  "entity_id": "e_44210",
  "summary": "User reported a billing discrepancy on invoice #4471. Confirmed refund eligibility. Awaiting bank details to process refund.",
  "open_items": ["awaiting bank details from user"],
  "last_message_at": "2026-06-15T09:12:00Z",
  "current_intent": "complete_refund"
}

// profile_state — keyed by entity_id, durable
{
  "entity_id": "e_44210",
  "name": "J. Okafor",
  "account_tier": "premium",
  "preferred_contact_time": "evenings",
  "past_threads": [
    { "thread_id": "t_7a01c2", "topic": "shipping delay", "closed_at": "2026-04-02" },
    { "thread_id": "t_8f12a3", "topic": "billing discrepancy", "closed_at": null }
  ],
  "outstanding_commitments": []
}

Notice what's not in profile state: the full text of the billing conversation, the user's tone, the back-and-forth about which invoice was correct. That detail lived in thread state and either got summarised into a one-line topic label or discarded entirely once the thread closed.

How the two layers combine at inference time

When a new message comes in, the agent doesn't replay everything it has ever stored. It assembles a much smaller context:

  1. Load profile_state for the identified entity (one read, rarely changes within a session).
  2. Load thread_state.summary for the active thread (one read, updated every turn).
  3. Append the latest incoming message.

That's the entire context for the model call—not the full transcript, not the full profile history, just the current distillation of each layer plus whatever just arrived. The mechanics of how that summary stays current without ballooning is its own problem (covered in the next post in this series); the point here is architectural: the model never needs to see raw history from either layer, only its current compressed state.

Writes go the other direction. After each turn, thread_state.summary gets updated. When a thread closes—or on a periodic basis for long-running threads—a small set of facts get promoted from thread state into profile state. Everything else in that thread state record is discarded or archived without being loaded into context again.

Promotion rules: what crosses the boundary

Not every fact that's true in a thread deserves to live forever in profile state. A useful filter is three questions:

Question Example: promote Example: don't promote
Is it durable beyond this conversation? "Prefers evening contact" "Is currently on hold"
Is it low-volatility? "Account tier: premium" "Mentioned considering cancelling"
Was it stated explicitly, not inferred from one ambiguous remark? "Confirmed new address: 14 Elm St" "Seemed annoyed about the wait time"

Transient emotional state, in-the-moment context, and single ambiguous remarks stay in thread state and die with the thread. Stable, explicit, low-volatility facts get promoted. This is what prevents the cross-contamination failure from the single-bucket approach: a user's frustration in one thread doesn't follow them into the next, because frustration almost never passes the promotion filter.

A design checklist

Before building either store, answer four questions:

  1. What's the natural boundary for a thread in your domain? (Part 2 covers this in detail for email-based agents specifically.)
  2. What's the entity key for profile state—a person, an account, an email address, something else—and how do you resolve it reliably across channels?
  3. What's your promotion policy? Write it down as explicit rules, not "whatever seems important." Vague promotion policies are how profile state quietly fills up with stale or low-confidence noise.
  4. How often does each layer actually get read? Profile state should be read far less often than it's available—usually once per session, not once per turn—because re-reading it every turn is exactly the kind of bloat this two-layer split exists to avoid.

Getting this split right is the foundation everything else depends on. Part 2 covers how to define the thread boundary itself in practice—specifically, why you should use the threading headers email already gives you instead of inventing your own session logic.

Related reading: Prompt injection attacks (memory poisoning is a real vector), Model Context Protocol (tool and context boundaries for agents).

Building production AI agents?

I help teams design agent memory architecture that scales—thread boundaries, profile promotion policies, and context assembly that doesn't balloon with every turn.

Get in Touch