Usage Guide

SDK Usage Guide (Python / TypeScript)

This guide covers all SDK integration patterns. We recommend starting with the Interceptor pattern — it adds persistent memory to any OpenAI/Anthropic app with minimal code changes.

1. Installation

Python (PyPI):

pip install awareness-memory-cloud

TypeScript (npm):

npm install @awareness-sdk/memory-cloud

2. Client Setup

Python

import os
from memory_cloud import MemoryCloudClient

client = MemoryCloudClient(
    base_url=os.getenv("AWARENESS_API_BASE_URL", "https://awareness.market/api/v1"),
    api_key="YOUR_API_KEY",
)

TypeScript

import { MemoryCloudClient } from "@awareness-sdk/memory-cloud";

const client = new MemoryCloudClient({
  baseUrl: process.env.AWARENESS_API_BASE_URL || "https://awareness.market/api/v1",
  apiKey: "YOUR_API_KEY",
});

The Interceptor is the fastest way to add persistent memory to any LLM application. It wraps your existing OpenAI/Anthropic client to automatically:

  • Pre-call: Retrieve relevant memory context and inject it into the conversation
  • Post-call: Store the interaction back to memory for future recall
  • Background: Extract structured insights (knowledge cards, decisions, risks) via your LLM

Zero changes to your business logic — just wrap the client and memory flows automatically.

Python — OpenAI Interceptor

from memory_cloud import MemoryCloudClient, AwarenessInterceptor
import openai

client = MemoryCloudClient(base_url="...", api_key="...")
interceptor = AwarenessInterceptor(
    client=client,
    memory_id="mem-xxx",
    min_relevance_score=0.5,  # Filter low-relevance results (default 0.5)
    max_inject_items=5,        # Cap injected context items (default 5)
    query_rewrite="rule",      # Query rewrite mode (default "rule")
)

oai = openai.OpenAI()
interceptor.wrap_openai(oai)

# Now all oai.chat.completions.create() calls get memory injection automatically
response = oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What auth approach did we decide on?"}]
)
# Memory context was automatically injected before this call
# The conversation was automatically stored after this call

Python — Anthropic Interceptor

from memory_cloud import MemoryCloudClient, AwarenessInterceptor
import anthropic

client = MemoryCloudClient(base_url="...", api_key="...")
interceptor = AwarenessInterceptor(client=client, memory_id="mem-xxx")

claude = anthropic.Anthropic()
interceptor.wrap_anthropic(claude)

# All claude.messages.create() calls now have memory injection

TypeScript — OpenAI Interceptor

import { MemoryCloudClient, AwarenessInterceptor } from "@awareness-sdk/memory-cloud";
import OpenAI from "openai";

const client = new MemoryCloudClient({ baseUrl: "...", apiKey: "..." });
const interceptor = await AwarenessInterceptor.create({
  client,
  memoryId: "mem-xxx",
  minRelevanceScore: 0.5,
  maxInjectItems: 5,
  queryRewrite: "rule",
});

const oai = new OpenAI();
interceptor.wrapOpenAI(oai);

// All oai.chat.completions.create() calls now have memory injection

Interceptor Options

OptionDefaultDescription
retrieve_limit / retrieveLimit8Max vector results to retrieve
max_context_chars / maxContextChars4000Max injected context size
min_relevance_score / minRelevanceScore0.5Filter threshold for recall results
max_inject_items / maxInjectItems5Cap on injected items
auto_remember / autoRemembertrueAuto-store conversations post-call
enable_extraction / enableExtractiontrueAuto-extract insights from stored events
extraction_model / extractionModel"gpt-4o-mini"Model for insight extraction
extraction_max_tokens / extractionMaxTokens16384Max tokens for extraction
query_rewrite / queryRewrite"rule"Query rewrite mode

Query Rewrite Modes

The interceptor uses context-aware query rewriting to improve recall accuracy:

  • "rule" (default): Layer 1 (context-aware query from recent conversation turns) + Layer 2 (structural keyword extraction for full-text search). Zero additional latency or token cost.
  • "llm": Uses the wrapped LLM to generate optimal semantic + keyword queries. Best for ambiguous queries like "continue yesterday's work". Adds ~200-500ms latency.
  • "none": Disables query rewriting. Uses the raw last user message (legacy behavior).

4. Client Auto-Extraction

Pass an OpenAI or Anthropic client as extraction_llm / extractionLlm to enable automatic insight extraction. When record returns an extraction_request, the SDK automatically calls your LLM in the background and submits the results.

# Python
client = MemoryCloudClient(base_url="...", api_key="...", extraction_llm=openai.OpenAI())
// TypeScript
const client = new MemoryCloudClient({ baseUrl: "...", apiKey: "...", extractionLlm: new OpenAI() });

5. MCP-aligned Helpers (same semantics as MCP tools)

  • recall_for_task / recallForTask
  • record / record — unified write (single event, batch, or insights)

Session management is automatic — no need to call begin_memory_session / beginMemorySession explicitly.

v1.x methods removed in v2.0: remember_step/rememberStep, remember_batch/rememberBatch, backfill_conversation_history/backfillConversationHistory, begin_memory_session/beginMemorySession, submit_insights/submitInsights, ingest_content/ingestContent. Use record() instead.

These helpers mirror MCP tool semantics for programmatic use.


6. API-parity Methods

Both SDKs expose aligned methods for:

  • Memory CRUD: create/list/get/update/delete
  • Content: write/list/delete
  • Retrieve/Chat/Timeline
  • MCP ingest: events + content
  • Export package download
  • Upload + async job polling
  • Insights
  • API key create/list/revoke
  • Memory wizard

7. Export + Read Package

Export (safetensors)

{
  "package_type": "vector_only",
  "vector_binary_format": "safetensors",
  "regenerate_vectors_if_missing": true
}

Read in Python

from memory_cloud import read_export_package

parsed = read_export_package("memory_export.zip")
print(parsed["manifest"])
print(parsed["chunks"])
print(parsed["safetensors"])

Read in TypeScript

import { readExportPackage } from "@awareness-sdk/memory-cloud";

const parsed = await readExportPackage(zipBytes);
console.log(parsed.manifest);
console.log(parsed.chunks);
console.log(parsed.safetensors);

8. Framework Integrations

See the individual framework guides for deep integrations:

9. Front-end Connect

Each memory's Connect page in the dashboard provides ready-to-use config snippets for MCP, Python SDK, and TypeScript SDK with your memory ID pre-filled. This is the recommended way to get started — it generates the exact configuration you need.


10. Multi-User Memory Mode

A single Memory can hold data for millions of users. Pass user_id to scope reads and writes per user.

Python

# Write scoped to a user
client.record(memory_id="mid", content="Alice fixed auth bug", user_id="alice")
client.record(memory_id="mid", content=["Step 1", "Step 2"], user_id="alice")
client.ingest_events(memory_id="mid", events=[...], user_id="alice")

# Read scoped to a user
ctx = client.get_session_context(memory_id="mid", user_id="alice")
tasks = client.get_pending_tasks(memory_id="mid", user_id="alice")
cards = client.get_knowledge_base(memory_id="mid", user_id="alice")

# Semantic search with metadata filter
results = client.retrieve(
    memory_id="mid",
    query="authentication decisions",
    metadata_filter={"user_id": "alice"},
)

TypeScript

// Write scoped to a user
await client.record({ memoryId: "mid", content: "Alice fixed auth bug", userId: "alice" });
await client.record({ memoryId: "mid", content: ["Step 1", "Step 2"], userId: "alice" });
await client.ingestEvents({ memoryId: "mid", events: [...], userId: "alice" });

// Read scoped to a user
const ctx = await client.getSessionContext({ memoryId: "mid", userId: "alice" });
const tasks = await client.getPendingTasks({ memoryId: "mid", userId: "alice" });
const cards = await client.getKnowledgeBase({ memoryId: "mid", userId: "alice" });

Personal mode (default): Do not pass user_id. All behavior is identical to before — existing setups are not affected.


11. Task Management

Use update_task_status to mark tasks complete after finishing work.

Python

# Get open tasks
result = client.get_pending_tasks(memory_id="mid", priority="high")
task_id = result["tasks"][0]["id"]

# Always record what you did first
client.record(memory_id="mid", content="Fixed the N+1 query by adding .include() in Prisma.")

# Then mark the task done
client.update_task_status(memory_id="mid", task_id=task_id, status="completed")

TypeScript

const { tasks } = await client.getPendingTasks({ memoryId: "mid", priority: "high" });
const taskId = tasks![0].id!;

await client.record({ memoryId: "mid", content: "Fixed the N+1 query." });
await client.updateTaskStatus({ memoryId: "mid", taskId, status: "completed" });

12. Session Context

Load full project state at the start of every session.

Python

ctx = client.get_session_context(memory_id="mid", days=7, max_cards=10, max_tasks=20)

print("Recent days:")
for day in ctx.get("recent_days", []):
    print(f"  {day['date']}: {day['narrative']}")

print("Open tasks:")
for task in ctx.get("open_tasks", []):
    print(f"  [{task['priority']}] {task['title']}")

print("Knowledge cards:")
for card in ctx.get("knowledge_cards", []):
    print(f"  [{card['category']}] {card['title']}")

TypeScript

const ctx = await client.getSessionContext({ memoryId: "mid", days: 7 });

for (const day of ctx.recent_days ?? []) {
  console.log(`${day.date}: ${day.narrative}`);
}
for (const task of ctx.open_tasks ?? []) {
  console.log(`[${task.priority}] ${task.title}`);
}

13. Advanced Retrieval

Both retrieve and recall_for_task support advanced vector enhancement parameters:

Parameter (Python)Parameter (TypeScript)TypeDefaultDescription
multi_levelmultiLevelboolfalseEnable multi-level retrieval (session and time-range context) for broader context
cluster_expandclusterExpandboolfalseEnable topic-based context expansion for comprehensive hierarchical recall

Python

results = client.retrieve(
    memory_id="mid",
    query="authentication architecture",
    multi_level=True,
    cluster_expand=True,
)

ctx = client.recall_for_task(
    memory_id="mid",
    task="summarize all auth decisions",
    multi_level=True,
    cluster_expand=True,
)

TypeScript

const results = await client.retrieve({
  memoryId: "mid",
  query: "authentication architecture",
  multiLevel: true,
  clusterExpand: true,
});

const ctx = await client.recallForTask({
  memoryId: "mid",
  task: "summarize all auth decisions",
  multiLevel: true,
  clusterExpand: true,
});

These parameters can be combined with existing options like reconstruct_chunks, use_hybrid_search, and recall_mode.


14. Validation Coverage (Python + TypeScript)

Recent SDK validation includes:

  • Conversation compaction before extraction (both SDKs)
  • MCP-style recall helpers (recall_for_task / recallForTask) request-shape checks
  • Streaming chat parsing (chat_stream / chatStream) callback flow checks
  • Injected demo full journey (write -> extraction submit -> recall in fresh session)

Reference test paths:

  • sdks/python/tests/test_client_recall_and_compaction.py
  • sdks/typescript/tests/client.test.cjs
  • sdks/typescript/tests/interceptor.test.cjs