SDK Usage Guide (Python / TypeScript)
This guide covers all SDK integration patterns. We recommend starting with the Interceptor pattern — it adds persistent memory to any OpenAI/Anthropic app with minimal code changes.
1. Installation
Python (PyPI):
pip install awareness-memory-cloud
TypeScript (npm):
npm install @awareness-sdk/memory-cloud
2. Client Setup
Python
import os
from memory_cloud import MemoryCloudClient
client = MemoryCloudClient(
base_url=os.getenv("AWARENESS_API_BASE_URL", "https://awareness.market/api/v1"),
api_key="YOUR_API_KEY",
)
TypeScript
import { MemoryCloudClient } from "@awareness-sdk/memory-cloud";
const client = new MemoryCloudClient({
baseUrl: process.env.AWARENESS_API_BASE_URL || "https://awareness.market/api/v1",
apiKey: "YOUR_API_KEY",
});
3. Interceptor Pattern (Recommended)
The Interceptor is the fastest way to add persistent memory to any LLM application. It wraps your existing OpenAI/Anthropic client to automatically:
- Pre-call: Retrieve relevant memory context and inject it into the conversation
- Post-call: Store the interaction back to memory for future recall
- Background: Extract structured insights (knowledge cards, decisions, risks) via your LLM
Zero changes to your business logic — just wrap the client and memory flows automatically.
Python — OpenAI Interceptor
from memory_cloud import MemoryCloudClient, AwarenessInterceptor
import openai
client = MemoryCloudClient(base_url="...", api_key="...")
interceptor = AwarenessInterceptor(
client=client,
memory_id="mem-xxx",
min_relevance_score=0.5, # Filter low-relevance results (default 0.5)
max_inject_items=5, # Cap injected context items (default 5)
query_rewrite="rule", # Query rewrite mode (default "rule")
)
oai = openai.OpenAI()
interceptor.wrap_openai(oai)
# Now all oai.chat.completions.create() calls get memory injection automatically
response = oai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What auth approach did we decide on?"}]
)
# Memory context was automatically injected before this call
# The conversation was automatically stored after this call
Python — Anthropic Interceptor
from memory_cloud import MemoryCloudClient, AwarenessInterceptor
import anthropic
client = MemoryCloudClient(base_url="...", api_key="...")
interceptor = AwarenessInterceptor(client=client, memory_id="mem-xxx")
claude = anthropic.Anthropic()
interceptor.wrap_anthropic(claude)
# All claude.messages.create() calls now have memory injection
TypeScript — OpenAI Interceptor
import { MemoryCloudClient, AwarenessInterceptor } from "@awareness-sdk/memory-cloud";
import OpenAI from "openai";
const client = new MemoryCloudClient({ baseUrl: "...", apiKey: "..." });
const interceptor = await AwarenessInterceptor.create({
client,
memoryId: "mem-xxx",
minRelevanceScore: 0.5,
maxInjectItems: 5,
queryRewrite: "rule",
});
const oai = new OpenAI();
interceptor.wrapOpenAI(oai);
// All oai.chat.completions.create() calls now have memory injection
Interceptor Options
| Option | Default | Description |
|---|---|---|
retrieve_limit / retrieveLimit | 8 | Max vector results to retrieve |
max_context_chars / maxContextChars | 4000 | Max injected context size |
min_relevance_score / minRelevanceScore | 0.5 | Filter threshold for recall results |
max_inject_items / maxInjectItems | 5 | Cap on injected items |
auto_remember / autoRemember | true | Auto-store conversations post-call |
enable_extraction / enableExtraction | true | Auto-extract insights from stored events |
extraction_model / extractionModel | "gpt-4o-mini" | Model for insight extraction |
extraction_max_tokens / extractionMaxTokens | 16384 | Max tokens for extraction |
query_rewrite / queryRewrite | "rule" | Query rewrite mode |
Query Rewrite Modes
The interceptor uses context-aware query rewriting to improve recall accuracy:
"rule"(default): Layer 1 (context-aware query from recent conversation turns) + Layer 2 (structural keyword extraction for full-text search). Zero additional latency or token cost."llm": Uses the wrapped LLM to generate optimal semantic + keyword queries. Best for ambiguous queries like "continue yesterday's work". Adds ~200-500ms latency."none": Disables query rewriting. Uses the raw last user message (legacy behavior).
4. Client Auto-Extraction
Pass an OpenAI or Anthropic client as extraction_llm / extractionLlm to enable automatic insight extraction. When record returns an extraction_request, the SDK automatically calls your LLM in the background and submits the results.
# Python
client = MemoryCloudClient(base_url="...", api_key="...", extraction_llm=openai.OpenAI())
// TypeScript
const client = new MemoryCloudClient({ baseUrl: "...", apiKey: "...", extractionLlm: new OpenAI() });
5. MCP-aligned Helpers (same semantics as MCP tools)
recall_for_task/recallForTaskrecord/record— unified write (single event, batch, or insights)
Session management is automatic — no need to call begin_memory_session / beginMemorySession explicitly.
v1.x methods removed in v2.0: remember_step/rememberStep, remember_batch/rememberBatch, backfill_conversation_history/backfillConversationHistory, begin_memory_session/beginMemorySession, submit_insights/submitInsights, ingest_content/ingestContent. Use record() instead.
These helpers mirror MCP tool semantics for programmatic use.
6. API-parity Methods
Both SDKs expose aligned methods for:
- Memory CRUD: create/list/get/update/delete
- Content: write/list/delete
- Retrieve/Chat/Timeline
- MCP ingest: events + content
- Export package download
- Upload + async job polling
- Insights
- API key create/list/revoke
- Memory wizard
7. Export + Read Package
Export (safetensors)
{
"package_type": "vector_only",
"vector_binary_format": "safetensors",
"regenerate_vectors_if_missing": true
}
Read in Python
from memory_cloud import read_export_package
parsed = read_export_package("memory_export.zip")
print(parsed["manifest"])
print(parsed["chunks"])
print(parsed["safetensors"])
Read in TypeScript
import { readExportPackage } from "@awareness-sdk/memory-cloud";
const parsed = await readExportPackage(zipBytes);
console.log(parsed.manifest);
console.log(parsed.chunks);
console.log(parsed.safetensors);
8. Framework Integrations
See the individual framework guides for deep integrations:
- LangChain Integration — Retriever, tool registration, RAG chains
- CrewAI Integration — Crew agents with persistent cross-session memory
- PraisonAI Integration — Tool-based memory in agent workflows
- AutoGen Integration — Multi-agent conversations with memory
9. Front-end Connect
Each memory's Connect page in the dashboard provides ready-to-use config snippets for MCP, Python SDK, and TypeScript SDK with your memory ID pre-filled. This is the recommended way to get started — it generates the exact configuration you need.
10. Multi-User Memory Mode
A single Memory can hold data for millions of users. Pass user_id to scope reads and writes per user.
Python
# Write scoped to a user
client.record(memory_id="mid", content="Alice fixed auth bug", user_id="alice")
client.record(memory_id="mid", content=["Step 1", "Step 2"], user_id="alice")
client.ingest_events(memory_id="mid", events=[...], user_id="alice")
# Read scoped to a user
ctx = client.get_session_context(memory_id="mid", user_id="alice")
tasks = client.get_pending_tasks(memory_id="mid", user_id="alice")
cards = client.get_knowledge_base(memory_id="mid", user_id="alice")
# Semantic search with metadata filter
results = client.retrieve(
memory_id="mid",
query="authentication decisions",
metadata_filter={"user_id": "alice"},
)
TypeScript
// Write scoped to a user
await client.record({ memoryId: "mid", content: "Alice fixed auth bug", userId: "alice" });
await client.record({ memoryId: "mid", content: ["Step 1", "Step 2"], userId: "alice" });
await client.ingestEvents({ memoryId: "mid", events: [...], userId: "alice" });
// Read scoped to a user
const ctx = await client.getSessionContext({ memoryId: "mid", userId: "alice" });
const tasks = await client.getPendingTasks({ memoryId: "mid", userId: "alice" });
const cards = await client.getKnowledgeBase({ memoryId: "mid", userId: "alice" });
Personal mode (default): Do not pass user_id. All behavior is identical to before — existing setups are not affected.
11. Task Management
Use update_task_status to mark tasks complete after finishing work.
Python
# Get open tasks
result = client.get_pending_tasks(memory_id="mid", priority="high")
task_id = result["tasks"][0]["id"]
# Always record what you did first
client.record(memory_id="mid", content="Fixed the N+1 query by adding .include() in Prisma.")
# Then mark the task done
client.update_task_status(memory_id="mid", task_id=task_id, status="completed")
TypeScript
const { tasks } = await client.getPendingTasks({ memoryId: "mid", priority: "high" });
const taskId = tasks![0].id!;
await client.record({ memoryId: "mid", content: "Fixed the N+1 query." });
await client.updateTaskStatus({ memoryId: "mid", taskId, status: "completed" });
12. Session Context
Load full project state at the start of every session.
Python
ctx = client.get_session_context(memory_id="mid", days=7, max_cards=10, max_tasks=20)
print("Recent days:")
for day in ctx.get("recent_days", []):
print(f" {day['date']}: {day['narrative']}")
print("Open tasks:")
for task in ctx.get("open_tasks", []):
print(f" [{task['priority']}] {task['title']}")
print("Knowledge cards:")
for card in ctx.get("knowledge_cards", []):
print(f" [{card['category']}] {card['title']}")
TypeScript
const ctx = await client.getSessionContext({ memoryId: "mid", days: 7 });
for (const day of ctx.recent_days ?? []) {
console.log(`${day.date}: ${day.narrative}`);
}
for (const task of ctx.open_tasks ?? []) {
console.log(`[${task.priority}] ${task.title}`);
}
13. Advanced Retrieval
Both retrieve and recall_for_task support advanced vector enhancement parameters:
| Parameter (Python) | Parameter (TypeScript) | Type | Default | Description |
|---|---|---|---|---|
multi_level | multiLevel | bool | false | Enable multi-level retrieval (session and time-range context) for broader context |
cluster_expand | clusterExpand | bool | false | Enable topic-based context expansion for comprehensive hierarchical recall |
Python
results = client.retrieve(
memory_id="mid",
query="authentication architecture",
multi_level=True,
cluster_expand=True,
)
ctx = client.recall_for_task(
memory_id="mid",
task="summarize all auth decisions",
multi_level=True,
cluster_expand=True,
)
TypeScript
const results = await client.retrieve({
memoryId: "mid",
query: "authentication architecture",
multiLevel: true,
clusterExpand: true,
});
const ctx = await client.recallForTask({
memoryId: "mid",
task: "summarize all auth decisions",
multiLevel: true,
clusterExpand: true,
});
These parameters can be combined with existing options like reconstruct_chunks, use_hybrid_search, and recall_mode.
14. Validation Coverage (Python + TypeScript)
Recent SDK validation includes:
- Conversation compaction before extraction (both SDKs)
- MCP-style recall helpers (
recall_for_task/recallForTask) request-shape checks - Streaming chat parsing (
chat_stream/chatStream) callback flow checks - Injected demo full journey (write -> extraction submit -> recall in fresh session)
Reference test paths:
sdks/python/tests/test_client_recall_and_compaction.pysdks/typescript/tests/client.test.cjssdks/typescript/tests/interceptor.test.cjs