Don't Invent Your Own Session IDs: Using Email Headers to Scope Agent Memory
Baljeet Dogra
Series: Part 1 — Thread state and profile state · Part 2 (this article) · Part 3 — Running executive summary
Once you've decided thread state needs its own boundary, the obvious next move is tempting and wrong: invent your own. Hash the subject line. Start a new thread after a time gap. Group by sender plus day. Email already solved this decades ago—RFC 5322 defines a threading mechanism every mail client implements. Using it isn't just less work; it's structurally more correct.
Once you've decided that thread state needs its own boundary—a key that says "everything inside this fence belongs to one conversation"—the obvious next move is tempting and wrong: invent your own. Hash the subject line. Start a new thread after a time gap. Group by sender plus day.
All of these break in ways that are hard to notice until production traffic finds the edge case, because email already solved this exact problem decades ago. RFC 5322 (and its predecessor RFC 822) defines a threading mechanism that every mail client and server already implements. Using it instead of reinventing it isn't just less work—it's structurally more correct.
The headers that actually matter
Three headers carry the threading information you need:
Message-ID
A globally unique identifier assigned to every individual email, generated by the sending client. No two emails should ever share one.
In-Reply-To
The Message-ID of the single email this one is a direct reply to.
References
The full chain of Message-IDs from the root of the thread down to the message being replied to, space-separated, oldest first.
A reply three messages deep in a thread typically carries a References header listing all three prior Message-IDs, and an In-Reply-To pointing specifically at the most recent one. This is the chain you walk to recover the thread.
Why the Subject line is a trap
Subject is the field every naive implementation reaches for first, and it fails for reasons that are individually obvious but collectively easy to miss:
- Clients prepend "Re:", "RE:", "Fwd:", or localised equivalents ("Réf:", "回复:") inconsistently, so naive string matching on subject misses threads or merges unrelated ones.
- Users edit the subject mid-thread ("Re: Invoice question" becomes "Re: Invoice question — URGENT") without starting a new conversation.
- Two completely unrelated threads can legitimately share an identical subject line ("Quick question").
Microsoft's ecosystem adds its own signal—Thread-Topic and Thread-Index—which can help as a secondary heuristic inside Outlook-originated mail, but it's not a substitute for the standard headers, and it's absent entirely from non-Microsoft clients. Treat Subject and Thread-Topic as supporting evidence at best, never as the primary key.
Deriving a canonical thread ID
The algorithm is straightforward once you have the headers:
- If
Referencesis present, the root Message-ID is the first entry in that list. Use a stable hash of the root Message-ID asthread_id. - If
Referencesis absent butIn-Reply-Tois present, look up the message that In-Reply-To points to and inherit itsthread_id. This requires a small lookup table mappingmessage_id → thread_idfor every message you've ingested. - If neither header is present, this is the first message in a new thread. Its own Message-ID becomes the root, and
thread_idis derived from it.
def resolve_thread_id(message, thread_map):
"""
message: dict with 'message_id', 'in_reply_to', 'references' (list)
thread_map: persistent store of message_id -> thread_id
"""
if message.get("references"):
root_id = message["references"][0]
thread_id = hash_id(root_id)
elif message.get("in_reply_to"):
parent_id = message["in_reply_to"]
thread_id = thread_map.get(parent_id) or hash_id(parent_id)
else:
thread_id = hash_id(message["message_id"])
thread_map.set(message["message_id"], thread_id)
return thread_id
The thread_map table is the piece teams most often skip, and it's what makes step 2 reliable. Without it, any message whose References header got stripped or truncated by an intermediate mail server has no way to find its thread—you'd be relying on every message carrying the full chain, which isn't guaranteed.
Edge cases worth deciding up front
Forwarded mail starts a new Message-ID chain
A forwarded email is technically a brand-new message with no In-Reply-To pointing at the original thread—even though a human considers it "the same conversation, continued." Decide your policy explicitly: don't auto-merge based on body content matching (too fragile, too easy to false-positive), but consider a participant-plus-subject-similarity heuristic that flags likely continuations for human or secondary-model review rather than silently merging them.
New participants get CC'd mid-thread
This shouldn't change the thread ID at all—and that's a feature, not a bug. The thread is anchored to the Message-ID chain, not to the set of participants. Who's currently on the thread is information that belongs in profile state for each participant, not in the identity of the thread itself.
Subject line gets hijacked
Someone replies to an old thread but writes about something entirely unrelated, subject untouched. Header-based threading will (correctly, by the letter of the protocol) keep this in the original thread. If this matters for your use case, compare the new message's content against the thread's running summary and flag a divergence—but it's a refinement on top of header-based threading, not a reason to abandon it for subject matching.
Broken or stripped headers
Some older or misconfigured mail systems drop References entirely. Your fallback (step 3 above, treating the message as thread-starting) is the safe default. You'll occasionally get a "new" thread that a human would have recognised as a continuation—but that's a rare, visible failure mode, versus the silent, frequent failures of subject-based grouping.
What this buys you
Once thread_id is reliably derived from headers rather than guessed from content, it becomes the partition key for everything described in the rest of this series: the table you key thread_state on, the boundary that decides what gets summarised together, and the signal that tells you when one conversation ends and a new one—possibly with the same person—begins.
Part 3 covers what actually goes into that thread state once the boundary is settled: specifically, the running summary pattern that lets you avoid replaying full conversation history on every single turn.
Building email-based AI agents?
I help teams wire up production-grade thread scoping, memory promotion, and context assembly for agents that process real email traffic—not demo conversations.
Get in Touch