Awareness Memory Cloud Python SDK

Python SDK for Awareness Memory Cloud APIs and MCP-style memory workflows.

Install

pip install awareness-memory-cloud

Framework extras:

pip install awareness-memory-cloud[langchain]
pip install awareness-memory-cloud[crewai]
pip install awareness-memory-cloud[frameworks]

Interceptor (Recommended — Zero-Code Memory)

The fastest way to add persistent memory to any Python LLM app. Wrap your existing OpenAI/Anthropic client — memory recall and storage happen automatically:

from memory_cloud import MemoryCloudClient, AwarenessInterceptor
import openai

# Local mode (no API key or memory ID needed)
client = MemoryCloudClient(mode="local")

# Cloud mode (team collaboration, semantic search, multi-device sync)
client = MemoryCloudClient(
    base_url="https://awareness.market/api/v1",
    api_key="aw_xxx",
)
interceptor = AwarenessInterceptor(
    client=client,
    memory_id="mem-xxx",
    min_relevance_score=0.5,  # Filter low-score results (default 0.5)
    max_inject_items=5,        # Cap injected items (default 5)
    query_rewrite="rule",      # Query rewrite mode (default "rule")
)

oai = openai.OpenAI()
interceptor.wrap_openai(oai)
# Now all oai.chat.completions.create() calls get memory injection automatically

Also works with Anthropic:

import anthropic

claude = anthropic.Anthropic()
interceptor.wrap_anthropic(claude)

Interceptor options: retrieve_limit (default 8), max_context_chars (default 4000), min_relevance_score (default 0.5), max_inject_items (default 5), auto_remember (default True), enable_extraction (default True), extraction_model (default "gpt-4o-mini"), extraction_max_tokens (default 16384), query_rewrite (default "rule").

Query Rewrite Modes

The interceptor uses context-aware query rewriting to improve recall accuracy:

"rule" (default): Layer 1 (context-aware query from recent conversation turns) + Layer 2 (structural keyword extraction for full-text search). Zero additional latency or token cost.
"llm": Uses the wrapped LLM to generate optimal semantic_query + keyword_query. Best for ambiguous queries like "continue yesterday's work" or non-technical domains. Adds ~200-500ms latency per query.
"none": Disables query rewriting. Uses the raw last user message as-is (legacy behavior).

Framework adapters (LangChain, CrewAI, PraisonAI, AutoGen) also support query_rewrite via MemoryCloudBaseAdapter.

Quickstart (Explicit API)

For full control, use the client methods directly:

import os
from memory_cloud import MemoryCloudClient

# Local: client = MemoryCloudClient(mode="local")
client = MemoryCloudClient(
    base_url=os.getenv("AWARENESS_API_BASE_URL", "https://awareness.market/api/v1"),
    api_key="aw_xxx",
)

client.write(
    memory_id="memory_123",
    content="Customer asked for SOC2 evidence and retention policy.",
    kwargs={"source": "python-sdk", "session_id": "demo-session"},
)

result = client.retrieve(
    memory_id="memory_123",
    query="What did customer ask for?",
    custom_kwargs={"k": 3},
)
print(result["results"])

Direct Insights (Recommended)

Recommended: Pass insights directly to record() to create knowledge cards in one call, avoiding the extraction round-trip.

client.record(
    memory_id="mem-123",
    content="WHAT: Chose PostgreSQL. WHY: Better JSON support.",
    insights={
        "knowledge_cards": [{"category": "decision", "title": "PostgreSQL over MySQL", "summary": "Chose PostgreSQL for its superior JSON querying and indexing capabilities.", "status": "resolved"}],
    }
)

Legacy Auto-Extraction

Pass an OpenAI or Anthropic client as extraction_llm to automatically extract insights when record returns an extraction_request:

import openai
from memory_cloud import MemoryCloudClient

# Local: client = MemoryCloudClient(mode="local")
client = MemoryCloudClient(
    base_url="https://awareness.market/api/v1",
    api_key="aw_xxx",
    extraction_llm=openai.OpenAI(),  # or anthropic.Anthropic()
)

# Now record automatically extracts insights in the background
client.record(memory_id="mem-xxx", content="Fixed auth bug in login.py by adding JWT refresh.")

When extraction_llm is provided, every record call that receives an extraction_request from the server will:

Call the provided LLM with the extraction prompt (background thread, non-blocking)
Parse the JSON response with brace-depth matching + retry
Submit extracted insights via record(insights=...)

Client extraction options: extraction_llm (OpenAI/Anthropic client), extraction_model (default: "gpt-4o-mini" for OpenAI, "claude-haiku-4-5-20251001" for Anthropic), extraction_max_tokens (default 16384, env: AWARENESS_EXTRACTION_MAX_TOKENS), user_id, agent_role.

API Coverage (SDK/API aligned)

MemoryCloudClient now includes:

Memory: create_memory, list_memories, get_memory, update_memory, delete_memory
Content: write, list_memory_content, delete_memory_content
Retrieval/Chat: retrieve, chat, chat_stream, memory_timeline
MCP ingest: ingest_events, record (use record(scope='knowledge') instead of ingest_content)
Export: export_memory_package, save_export_memory_package
Async jobs & upload: get_async_job_status, upload_file, get_upload_job_status
Insights/API keys/wizard: insights, create_api_key, list_api_keys, revoke_api_key, memory_wizard

MCP-style Helpers (SDK/MCP aligned)

These helpers mirror MCP tool semantics:

recall_for_task
record — unified write (single event, batch, or insights)

Session management is automatic — no need to call begin_memory_session explicitly.

v1.x methods removed in v2.0: remember_step, remember_batch, backfill_conversation_history, begin_memory_session, submit_insights, ingest_content. Use record() instead.

Example:

client.record(
    memory_id="memory_123",
    content="Refactored auth middleware and added tests.",
)
ctx = client.recall_for_task(
    memory_id="memory_123",
    task="summarize latest auth changes",
    limit=8,
    multi_level=False,
    cluster_expand=False,
)
print(ctx["results"])

Read Exported Packages

SDK includes export readers:

read_export_package(path)
read_export_package_bytes(bytes)
parse_jsonl_bytes(bytes)

from memory_cloud import read_export_package

parsed = read_export_package("memory_export.zip")
print(parsed["manifest"])
print(len(parsed["chunks"]))
print(bool(parsed["safetensors"]))
print(parsed.get("kv_summary"))

Agent Profiles & Sub-Agent Prompts

Retrieve enriched agent profiles with auto-generated activation prompts:

# List all agent profiles (with system_prompt and activation_prompt)
agents = client.list_agents(memory_id="mem-xxx")
for agent in agents["agents"]:
    print(agent["agent_role"], agent["title"])
    print(agent["activation_prompt"])  # Ready-to-use prompt for sub-agent spawning

# Get activation prompt for a specific role
prompt = client.get_agent_prompt(memory_id="mem-xxx", agent_role="backend_engineer")

If a profile has a custom system_prompt (set in the frontend Settings), it is used as-is. Otherwise, a prompt is auto-generated from the profile fields (identity, critical_rules, workflow, etc.).

Injected Demo (Streaming + Recall)

Run the end-to-end injected-mode demo (client LLM extraction, server zero-LLM path):

python3 scripts/run_sdk_injected_conversation_demo.py --full-user-journey --stream

The demo validates:

Prompt-only usage via interceptor injection (no manual recall/remember calls in business code)
Background extraction request handling (record -> extraction_request -> record(insights=...))
Cross-session recall in a fresh simulated follow-up session
Streaming token output for runtime observability

Framework Integrations

All integrations share a unified adapter pattern based on MemoryCloudBaseAdapter. Each provides:

wrap_llm() / wrap_function() — transparent memory injection
awareness_recall() / awareness_record() / memory_insights() — explicit tool methods
inject_into_messages() — manual message-level injection
get_tool_functions() — tool definitions for manual registration

See the individual framework guides:

Environment Variables

export AWARENESS_API_BASE_URL="https://awareness.market/api/v1"
export AWARENESS_API_KEY="aw_xxx"

# Optional: configure extraction LLM behavior
export AWARENESS_EXTRACTION_MODEL="gpt-4o-mini"        # Model used for insight extraction (default: gpt-4o-mini)
export AWARENESS_EXTRACTION_MAX_TOKENS="16384"          # Max tokens for extraction output (default: 16384)

All environment variables can also be set via constructor parameters (which take priority over env vars).