Applied Research

ContextEngine

Open-source context management platform for LLM agents: graph-based representation with tiered compression to prevent pre-rot degradation.

context-core

The foundation: a typed graph data structure where context nodes have relationships, types, and metadata. Everything else builds on this representation.

TypedGraphNode/Edge TypesSerializable

context-compress

Three-tier compression pipeline that prioritizes reversible operations. Lossless first, lossy only when necessary. Maintains information density metrics throughout.

Tiered StrategyReversible OpsDensity Tracking

context-ingest

Parsers for code, docs, conversations, and tool outputs. Converts raw text into typed graph nodes with automatic relationship detection.

Tree-sitterMarkdown ASTAuto-linking

context-retrieve

Semantic search over the context graph. Combines embedding similarity with graph-aware ranking to surface the most relevant context for a given query.

EmbeddingsGraph RankingHybrid Search

context-eval

Quality metrics for context windows: measures information density, relevance coverage, and "pre-rot" degradation score. Catches context decay before it impacts outputs.

Density ScoreCoverage MapPre-rot Index

Lossless Tier (~2x compression)

Whitespace normalization, exact deduplication, redundant preamble removal. Fully reversible — no information loss. Always runs first.

DedupNormalizeZero-loss

Compaction Tier (~5x compression)

Merges adjacent related nodes, removes stale/superseded context, collapses repeated patterns. Minimally lossy with tracked provenance.

Node MergeStaleness HeuristicProvenance

Summarization Tier (~20x compression)

LLM-powered distillation for when aggressive compression is needed. Preserves key facts and relationships while drastically reducing token count.

LLM DistillFact ExtractionLossy

Graph Model

Context as typed nodes with explicit relationships instead of a flat message tape.

Compression Strategy Stack

Reversible strategies first; lossy summarization only when lower tiers are insufficient.

ExternalizePayloads DeduplicateSemantically SchemaCompression EntityCentric

Memory + Retrieval Loop

Agent queries retriever, retriever fans across hot/warm/cold memory, returns context, then archives new artifacts.

Packages

Typed graph data structure where context nodes have relationships, types, and metadata. The foundation for everything else.

TypedGraphSerializable

Three-tier compression pipeline that prioritizes reversible operations. Lossless first, lossy only when necessary.

Tiered StrategyDensity Tracking

Parsers for code, docs, conversations, and tool outputs. Converts raw text into typed graph nodes.

Tree-sitterAuto-linking

Semantic search over the context graph. Combines embedding similarity with graph-aware ranking.

EmbeddingsHybrid Search

Quality metrics: information density, relevance coverage, and "pre-rot" degradation score.

Density ScorePre-rot Index

Compression Tiers

Whitespace normalization, exact deduplication, redundant preamble removal. Fully reversible.

DedupNormalizeZero-loss

Merges adjacent related nodes, removes stale context, collapses repeated patterns.

Node MergeProvenance

LLM-powered distillation. Preserves key facts and relationships while drastically reducing tokens.

LLM DistillFact Extraction

The Problem

Long-running agents fail before they hit hard context limits.

In production, agents often need 50+ tool calls to complete real workflows. By that point, quality starts degrading even when there is still room left in the context window. In ContextEngine, this failure mode is treated as a first-class systems problem called pre-rot: quality decay around 60-70% context usage, not at 100%.

The default pattern in most stacks is still:

Keep appending strings to chat history.
Truncate when things get large.
Summarize after quality has already dropped.

That approach loses structure, burns tokens, and makes recovery difficult.

The Approach

ContextEngine models context as a typed graph, not a flat transcript.

Messages, tool calls, artifacts, entities, and summaries become explicit nodes with relationships. That structure enables selective retrieval, targeted compression, and reversible reductions instead of blanket summarization.

Compression is applied in a strict hierarchy:

Lossless operations first.
Compaction second (partially reversible).
Summarization only as a last resort.

This keeps high-value context available longer while reducing token pressure.

Architecture

ContextEngine is split into modular packages so teams can adopt only what they need.

Package	Purpose	Status
`context-core`	Graph model, entity tracking, semantic index, token budget	Complete
`context-compression`	Pipeline orchestration + 9 compression strategies	Complete
`context-memory`	Backends, tiered storage, retrieval, artifact versioning	Complete
`context-tools`	Tool caching, pattern detection, result compression, prefetch	Complete
`context-observe`	OpenTelemetry traces, Prometheus metrics, event stream	Complete
`context-multiagent`	Broker/handoff/sync for multi-agent coordination	Planned

Current test footprint across completed packages: 1,259 tests.

Compression Model

Tier 1: Lossless (100% recoverable)

Externalize large payloads to storage and keep compact references.
Deduplicate semantically similar or repeated blocks.
Collapse repetitive tool chains into structured summaries of equivalent data.

Typical effect: 2-5x savings with zero information loss.

Tier 2: Compaction (80-95% recoverable)

Compress repeated schemas and structural boilerplate.
Keep entity-relevant context and remove low-value drift.
Filter by current task relevance while preserving provenance.

Typical effect: additional 2-4x savings.

Tier 3: Summarization (irreversible)

Hierarchical and task-aware LLM distillation.
Incremental summaries for long-running sessions.
Used only when lower tiers are insufficient.

Typical effect: 5-10x on residual context.

Combined end-to-end compression: 10-20x depending on workload.

Memory and Retrieval

ContextEngine separates working memory from persistent memory:

Hot/warm/cold tiering for cost and latency control.
Storage backends for local and production deployments (SQLite, Postgres, Redis, filesystem).
Retrieval strategies combining semantic, entity, and temporal signals.
Artifact management with versioned references for reproducibility.

This allows agents to keep active context lightweight while recalling detail on demand.

Tool Optimization Layer

Agent quality is tightly coupled to tool-call behavior. ContextEngine includes dedicated tool intelligence:

ToolCallCache for exact and semantic cache hits.
Pattern detection for repeated tool workflows.
Result compression for verbose structured outputs.
Predictive prefetch for likely next calls.

The goal is lower latency, lower token usage, and less repeated work.

Observability

ContextEngine is instrumented as a systems component, not a black box:

OpenTelemetry spans around context operations.
Metrics for compression ratios, budget health, and cache behavior.
Event streams for pre-rot triggers and compression decisions.

This makes context quality measurable and debuggable in production.

Why This Design

Graph representation beats flat history for selective retrieval and relationship-aware queries.
Compaction-first preserves information that naive summarization would destroy.
Proactive pre-rot handling prevents quality collapse rather than reacting after failure.
Modular package boundaries make incremental adoption practical.
Recovery manifests + observability make operations auditable.

Status and Roadmap

Phase 1-3 complete: foundation, compression, memory, and tool optimization.
Phase 4 planned: multi-agent broker, handoff protocol, and sync primitives.

Short-term focus is integration hardening and framework adapters; long-term focus is robust multi-agent context coordination.

Tech Stack

Python 3.12, Pydantic, NetworkX, sentence-transformers, spaCy, tiktoken, ChromaDB, OpenTelemetry, uv