What We Capture Per User
Why This Exists
Every AI agent platform makes a choice: how much do you observe in order to improve, and how much do you leave dark? CollectHive leans toward observability. The more we understand about how Amber is being used — which tools fire, which memories get recalled, where costs spike — the better we can tune it.
This article is a full inventory of what Amber captures on production (agent.collecthive.ai), where it lives, and how long it’s kept. It is written for the team, not as a privacy policy — the intent is transparency about our own system’s behaviour.
Identity & Profile
The foundation. Amber knows who you are via Clerk, which provides the clerkId that ties everything together.
| Field | Where stored | Notes |
|---|---|---|
| Email address | users table | Synced from Clerk on first login |
| Display name | users table | Synced from Clerk |
| Avatar URL | users table | Synced from Clerk |
| Admin flag | users table | Set manually |
| IP address | usageEvents table | Captured per HTTP request |
| User agent string | usageEvents table | Captured per HTTP request |
IP and user agent are stored against every request. Retention: 30 days, then auto-deleted by a daily cron.
Sessions & Messages
The core of what Amber does. Every chat is fully recorded.
Sessions
Each chat session captures:
- Model used — sonnet, opus, haiku, etc.
- Permission mode — default, plan, or auto
- Thinking mode — off, think, or think-hard
- Working directory — the project path the agent ran in
- Total cost — estimated USD from token usage
- Token breakdown — input, output, cache read, cache write
- Duration — wall-clock time in milliseconds
- Context usage — percentage of the context window consumed
- Error — message if the session failed
- Skill names — denormalised list of skills activated during the session
Sessions are retained indefinitely.
Messages
Every message in every session is stored in full, including:
- Role (user, assistant, tool)
- Full content text
- Tool name, input parameters, and output
- Thinking tokens (Claude’s internal reasoning, when enabled)
- Images attached
- Injected memories — which memories were pulled into context for this message
- Extracted memories — what the extraction pipeline pulled out, with confidence scores
Messages are retained indefinitely.
Technical Detail — Memory Injection Tracking
Each message record stores the full set of memories that were injected into Claude’s context at that point. This means you can replay exactly what context the model had for any given response. The extractedMemories field records what the extraction pipeline tried to learn from the message — including cases where it decided not to store (with a skip reason).
Memory System
The memory system is the part of Amber that learns from usage. Every session feeds it.
| What | Details |
|---|---|
| Memory content | Title, body, type classification |
| Embedding | 1,024-dimensional vector (mxbai-embed-large) for semantic search |
| Confidence score | 0–100, adjusted by user feedback |
| Tier | core / active / archive / merged |
| Knowledge graph links | relatedIds connecting related memories |
| Feedback | Per-memory ratings (helpful, not helpful, wrong, outdated) with optional comment |
| Recall log | Which chat triggered a recall, whether the user marked it helpful |
Memories are retained indefinitely. User feedback directly adjusts confidence scores and can trigger auto-archival after repeated negative ratings.
Real-Time Presence
Two lightweight tables track whether you’re actively using Amber, primarily to avoid sending push notifications when you’re already looking at the screen.
userPresence — updated every 10 seconds:
- Which chat you’re currently viewing
- Timestamp of last heartbeat
userActivity — updated on interaction:
- Last activity timestamp
- Last keyboard/mouse/touch interaction
- Platform (web, android, ios)
- Whether the browser tab is visible
Both are ephemeral — they reflect current state, not history.
Event Logs
Three audit tables record discrete actions. All retain data for 30 days.
HTTP Requests (usageEvents)
Every request to the memory-mcp server is logged:
- Path, HTTP method, status code
- IP address, user agent
- Event type (page view, API call, login, WebSocket connect)
Skill Activations (skillEvents)
When skills are used:
- Skill name
- Event type:
activation(injected into context),read(agent fetched content),gated_tool_call(a skill-gated tool was executed) - Tool name (for gated tool calls)
- Session ID
Credential Proxy (proxyAuditLogs)
When Amber uses a stored credential on your behalf:
- Service name (GitHub, Slack, Jira, etc.)
- HTTP method and path (truncated to 500 characters)
- Response status code
Agent Activity Logs
Amber’s autonomous agents — email triage, Moltbook, the admin agent — each have their own activity tables. These are retained indefinitely.
| Agent | What’s logged |
|---|---|
| Email agent | Browse inbox, read email, flag, move, draft response, search, organise — with folder, content, and metadata |
| Moltbook agent | Post, comment, vote, follow, DM send/approve, subscribe — with target and content |
| Admin agent | Fix attempts, escalations, heartbeats, push operations |
| Agent teams | Every message between agents: content, type, sender, recipient, whether read |
System Execution Logs
Beyond agent-specific activity, several system operations are logged per user.
SSH commands — full command text, remote host, exit code, access classification. Retained indefinitely.
Database queries — SQL previews (first 500 characters), row count, duration, status. Retained indefinitely.
Pipeline job logs — every log line from background pipeline jobs (research, consolidation, export), with level (info, warn, error, debug).
Workflow runs — step-by-step results from outer loop and wizard runs, including cost per run.
Push Notifications
To send notifications to mobile and desktop, Amber stores:
- FCM device tokens — one per registered device
- Platform — android, ios, or web
- Device ID (optional)
Invalid tokens are cleaned up automatically. Notification delivery is suppressed when userActivity shows you’re actively using Amber on any device.
Analytics Aggregations
Two Convex modules compute derived analytics from the raw tables.
Per-User Stats (stats.ts)
Aggregated on demand for the dashboard:
- Session trends (8-week lookback) — count, cost, token usage, error rate
- Engagement metrics — messages per session, active days, streak, model distribution
- Memory analytics — count by type and tier, recall ratings
- Tool usage breakdown
Team-Level Analytics (usageAnalytics.ts)
Admin-only, sampled from recent activity:
- Daily and weekly active user counts
- Feature area adoption this week (Projects, Settings, Skills, Images, etc.)
- Error hotspots by feature area
Error Tracking
Sentry is configured on both the frontend (admin-ui) and backend (memory-mcp). It captures:
- Unhandled exceptions with full stack traces
- HTTP 5xx errors with request context
- React error boundaries with component stack
- Breadcrumbs (console logs, network requests leading up to errors)
Sentry is only active when SENTRY_DSN is set in the environment. It is active on production.
What We Are Not Capturing
For completeness:
- No third-party product analytics — no PostHog, Mixpanel, Amplitude, or Segment
- No session replay or click heatmaps
- No page load timing or Core Web Vitals
- No A/B test assignments
- No outbound email open tracking
Retention Summary
| Category | Retention |
|---|---|
| HTTP requests, skill events, proxy audit | 30 days |
| Push tokens, user presence, user activity | Current state only (overwritten) |
| Chat sessions and messages | Indefinite |
| Memories and feedback | Indefinite |
| Agent activity logs | Indefinite |
| SSH and database query logs | Indefinite |
| User profile and settings | Indefinite |
| Sentry errors | Sentry’s default (90 days on free tier) |