Awareness Memory Cloud Python SDK
Python SDK for Awareness Memory Cloud APIs and MCP-style memory workflows.
Install
pip install awareness-memory-cloud
Framework extras:
pip install awareness-memory-cloud[langchain]
pip install awareness-memory-cloud[crewai]
pip install awareness-memory-cloud[frameworks]
Interceptor (Recommended — Zero-Code Memory)
The fastest way to add persistent memory to any Python LLM app. Wrap your existing OpenAI/Anthropic client — memory recall and storage happen automatically:
from memory_cloud import MemoryCloudClient, AwarenessInterceptor
import openai
# Local mode (no API key or memory ID needed)
client = MemoryCloudClient(mode="local")
# Cloud mode (team collaboration, semantic search, multi-device sync)
client = MemoryCloudClient(
base_url="https://awareness.market/api/v1",
api_key="aw_xxx",
)
interceptor = AwarenessInterceptor(
client=client,
memory_id="mem-xxx",
min_relevance_score=0.5, # Filter low-score results (default 0.5)
max_inject_items=5, # Cap injected items (default 5)
query_rewrite="rule", # Query rewrite mode (default "rule")
)
oai = openai.OpenAI()
interceptor.wrap_openai(oai)
# Now all oai.chat.completions.create() calls get memory injection automatically
Also works with Anthropic:
import anthropic
claude = anthropic.Anthropic()
interceptor.wrap_anthropic(claude)
Interceptor options: retrieve_limit (default 8), max_context_chars (default 4000), min_relevance_score (default 0.5), max_inject_items (default 5), auto_remember (default True), enable_extraction (default True), extraction_model (default "gpt-4o-mini"), extraction_max_tokens (default 16384), query_rewrite (default "rule").
Query Rewrite Modes
The interceptor uses context-aware query rewriting to improve recall accuracy:
"rule"(default): Layer 1 (context-aware query from recent conversation turns) + Layer 2 (structural keyword extraction for full-text search). Zero additional latency or token cost."llm": Uses the wrapped LLM to generate optimal semantic_query + keyword_query. Best for ambiguous queries like "continue yesterday's work" or non-technical domains. Adds ~200-500ms latency per query."none": Disables query rewriting. Uses the raw last user message as-is (legacy behavior).
Framework adapters (LangChain, CrewAI, PraisonAI, AutoGen) also support query_rewrite via MemoryCloudBaseAdapter.
Quickstart (Explicit API)
For full control, use the client methods directly:
import os
from memory_cloud import MemoryCloudClient
# Local: client = MemoryCloudClient(mode="local")
client = MemoryCloudClient(
base_url=os.getenv("AWARENESS_API_BASE_URL", "https://awareness.market/api/v1"),
api_key="aw_xxx",
)
client.write(
memory_id="memory_123",
content="Customer asked for SOC2 evidence and retention policy.",
kwargs={"source": "python-sdk", "session_id": "demo-session"},
)
result = client.retrieve(
memory_id="memory_123",
query="What did customer ask for?",
custom_kwargs={"k": 3},
)
print(result["results"])
Direct Insights (Recommended)
Recommended: Pass insights directly to record() to create knowledge cards in one call, avoiding the extraction round-trip.
client.record(
memory_id="mem-123",
content="WHAT: Chose PostgreSQL. WHY: Better JSON support.",
insights={
"knowledge_cards": [{"category": "decision", "title": "PostgreSQL over MySQL", "summary": "Chose PostgreSQL for its superior JSON querying and indexing capabilities.", "status": "resolved"}],
}
)
Legacy Auto-Extraction
Pass an OpenAI or Anthropic client as extraction_llm to automatically extract insights when record returns an extraction_request:
import openai
from memory_cloud import MemoryCloudClient
# Local: client = MemoryCloudClient(mode="local")
client = MemoryCloudClient(
base_url="https://awareness.market/api/v1",
api_key="aw_xxx",
extraction_llm=openai.OpenAI(), # or anthropic.Anthropic()
)
# Now record automatically extracts insights in the background
client.record(memory_id="mem-xxx", content="Fixed auth bug in login.py by adding JWT refresh.")
When extraction_llm is provided, every record call that receives an extraction_request from the server will:
- Call the provided LLM with the extraction prompt (background thread, non-blocking)
- Parse the JSON response with brace-depth matching + retry
- Submit extracted insights via
record(insights=...)
Client extraction options: extraction_llm (OpenAI/Anthropic client), extraction_model (default: "gpt-4o-mini" for OpenAI, "claude-haiku-4-5-20251001" for Anthropic), extraction_max_tokens (default 16384, env: AWARENESS_EXTRACTION_MAX_TOKENS), user_id, agent_role.
API Coverage (SDK/API aligned)
MemoryCloudClient now includes:
- Memory:
create_memory,list_memories,get_memory,update_memory,delete_memory - Content:
write,list_memory_content,delete_memory_content - Retrieval/Chat:
retrieve,chat,chat_stream,memory_timeline - MCP ingest:
ingest_events,record(userecord(scope='knowledge')instead ofingest_content) - Export:
export_memory_package,save_export_memory_package - Async jobs & upload:
get_async_job_status,upload_file,get_upload_job_status - Insights/API keys/wizard:
insights,create_api_key,list_api_keys,revoke_api_key,memory_wizard
MCP-style Helpers (SDK/MCP aligned)
These helpers mirror MCP tool semantics:
recall_for_taskrecord— unified write (single event, batch, or insights)
Session management is automatic — no need to call begin_memory_session explicitly.
v1.x methods removed in v2.0: remember_step, remember_batch, backfill_conversation_history, begin_memory_session, submit_insights, ingest_content. Use record() instead.
Example:
client.record(
memory_id="memory_123",
content="Refactored auth middleware and added tests.",
)
ctx = client.recall_for_task(
memory_id="memory_123",
task="summarize latest auth changes",
limit=8,
multi_level=False,
cluster_expand=False,
)
print(ctx["results"])
Read Exported Packages
SDK includes export readers:
read_export_package(path)read_export_package_bytes(bytes)parse_jsonl_bytes(bytes)
from memory_cloud import read_export_package
parsed = read_export_package("memory_export.zip")
print(parsed["manifest"])
print(len(parsed["chunks"]))
print(bool(parsed["safetensors"]))
print(parsed.get("kv_summary"))
Agent Profiles & Sub-Agent Prompts
Retrieve enriched agent profiles with auto-generated activation prompts:
# List all agent profiles (with system_prompt and activation_prompt)
agents = client.list_agents(memory_id="mem-xxx")
for agent in agents["agents"]:
print(agent["agent_role"], agent["title"])
print(agent["activation_prompt"]) # Ready-to-use prompt for sub-agent spawning
# Get activation prompt for a specific role
prompt = client.get_agent_prompt(memory_id="mem-xxx", agent_role="backend_engineer")
If a profile has a custom system_prompt (set in the frontend Settings), it is used as-is. Otherwise, a prompt is auto-generated from the profile fields (identity, critical_rules, workflow, etc.).
Injected Demo (Streaming + Recall)
Run the end-to-end injected-mode demo (client LLM extraction, server zero-LLM path):
python3 scripts/run_sdk_injected_conversation_demo.py --full-user-journey --stream
The demo validates:
- Prompt-only usage via interceptor injection (no manual recall/remember calls in business code)
- Background extraction request handling (
record->extraction_request->record(insights=...)) - Cross-session recall in a fresh simulated follow-up session
- Streaming token output for runtime observability
Framework Integrations
All integrations share a unified adapter pattern based on MemoryCloudBaseAdapter. Each provides:
wrap_llm()/wrap_function()— transparent memory injectionawareness_recall()/awareness_record()/memory_insights()— explicit tool methodsinject_into_messages()— manual message-level injectionget_tool_functions()— tool definitions for manual registration
See the individual framework guides:
Environment Variables
export AWARENESS_API_BASE_URL="https://awareness.market/api/v1"
export AWARENESS_API_KEY="aw_xxx"
# Optional: configure extraction LLM behavior
export AWARENESS_EXTRACTION_MODEL="gpt-4o-mini" # Model used for insight extraction (default: gpt-4o-mini)
export AWARENESS_EXTRACTION_MAX_TOKENS="16384" # Max tokens for extraction output (default: 16384)
All environment variables can also be set via constructor parameters (which take priority over env vars).