What We Capture Per User

Why This Exists

Every AI agent platform makes a choice: how much do you observe in order to improve, and how much do you leave dark? CollectHive leans toward observability. The more we understand about how Amber is being used — which tools fire, which memories get recalled, where costs spike — the better we can tune it.

This article is a full inventory of what Amber captures on production (agent.collecthive.ai), where it lives, and how long it’s kept. It is written for the team, not as a privacy policy — the intent is transparency about our own system’s behaviour.

Identity & Profile

The foundation. Amber knows who you are via Clerk, which provides the clerkId that ties everything together.

Field	Where stored	Notes
Email address	`users` table	Synced from Clerk on first login
Display name	`users` table	Synced from Clerk
Avatar URL	`users` table	Synced from Clerk
Admin flag	`users` table	Set manually
IP address	`usageEvents` table	Captured per HTTP request
User agent string	`usageEvents` table	Captured per HTTP request

IP and user agent are stored against every request. Retention: 30 days, then auto-deleted by a daily cron.

Sessions & Messages

The core of what Amber does. Every chat is fully recorded.

Sessions

Each chat session captures:

Model used — sonnet, opus, haiku, etc.
Permission mode — default, plan, or auto
Thinking mode — off, think, or think-hard
Working directory — the project path the agent ran in
Total cost — estimated USD from token usage
Token breakdown — input, output, cache read, cache write
Duration — wall-clock time in milliseconds
Context usage — percentage of the context window consumed
Error — message if the session failed
Skill names — denormalised list of skills activated during the session

Sessions are retained indefinitely.

Messages

Every message in every session is stored in full, including:

Role (user, assistant, tool)
Full content text
Tool name, input parameters, and output
Thinking tokens (Claude’s internal reasoning, when enabled)
Images attached
Injected memories — which memories were pulled into context for this message
Extracted memories — what the extraction pipeline pulled out, with confidence scores

Messages are retained indefinitely.

Technical Detail — Memory Injection Tracking

Each message record stores the full set of memories that were injected into Claude’s context at that point. This means you can replay exactly what context the model had for any given response. The extractedMemories field records what the extraction pipeline tried to learn from the message — including cases where it decided not to store (with a skip reason).

Memory System

The memory system is the part of Amber that learns from usage. Every session feeds it.

What	Details
Memory content	Title, body, type classification
Embedding	1,024-dimensional vector (mxbai-embed-large) for semantic search
Confidence score	0–100, adjusted by user feedback
Tier	core / active / archive / merged
Knowledge graph links	`relatedIds` connecting related memories
Feedback	Per-memory ratings (helpful, not helpful, wrong, outdated) with optional comment
Recall log	Which chat triggered a recall, whether the user marked it helpful

Memories are retained indefinitely. User feedback directly adjusts confidence scores and can trigger auto-archival after repeated negative ratings.

Real-Time Presence

Two lightweight tables track whether you’re actively using Amber, primarily to avoid sending push notifications when you’re already looking at the screen.

userPresence — updated every 10 seconds:

Which chat you’re currently viewing
Timestamp of last heartbeat

userActivity — updated on interaction:

Last activity timestamp
Last keyboard/mouse/touch interaction
Platform (web, android, ios)
Whether the browser tab is visible

Both are ephemeral — they reflect current state, not history.

Event Logs

Three audit tables record discrete actions. All retain data for 30 days.

HTTP Requests (`usageEvents`)

Every request to the memory-mcp server is logged:

Path, HTTP method, status code
IP address, user agent
Event type (page view, API call, login, WebSocket connect)

Skill Activations (`skillEvents`)

When skills are used:

Skill name
Event type: activation (injected into context), read (agent fetched content), gated_tool_call (a skill-gated tool was executed)
Tool name (for gated tool calls)
Session ID

Credential Proxy (`proxyAuditLogs`)

When Amber uses a stored credential on your behalf:

Service name (GitHub, Slack, Jira, etc.)
HTTP method and path (truncated to 500 characters)
Response status code

Agent Activity Logs

Amber’s autonomous agents — email triage, Moltbook, the admin agent — each have their own activity tables. These are retained indefinitely.

Agent	What’s logged
Email agent	Browse inbox, read email, flag, move, draft response, search, organise — with folder, content, and metadata
Moltbook agent	Post, comment, vote, follow, DM send/approve, subscribe — with target and content
Admin agent	Fix attempts, escalations, heartbeats, push operations
Agent teams	Every message between agents: content, type, sender, recipient, whether read

System Execution Logs

Beyond agent-specific activity, several system operations are logged per user.

SSH commands — full command text, remote host, exit code, access classification. Retained indefinitely.

Database queries — SQL previews (first 500 characters), row count, duration, status. Retained indefinitely.

Pipeline job logs — every log line from background pipeline jobs (research, consolidation, export), with level (info, warn, error, debug).

Workflow runs — step-by-step results from outer loop and wizard runs, including cost per run.

Push Notifications

To send notifications to mobile and desktop, Amber stores:

FCM device tokens — one per registered device
Platform — android, ios, or web
Device ID (optional)

Invalid tokens are cleaned up automatically. Notification delivery is suppressed when userActivity shows you’re actively using Amber on any device.

Analytics Aggregations

Two Convex modules compute derived analytics from the raw tables.

Per-User Stats (`stats.ts`)

Aggregated on demand for the dashboard:

Session trends (8-week lookback) — count, cost, token usage, error rate
Engagement metrics — messages per session, active days, streak, model distribution
Memory analytics — count by type and tier, recall ratings
Tool usage breakdown

Team-Level Analytics (`usageAnalytics.ts`)

Admin-only, sampled from recent activity:

Daily and weekly active user counts
Feature area adoption this week (Projects, Settings, Skills, Images, etc.)
Error hotspots by feature area

Error Tracking

Sentry is configured on both the frontend (admin-ui) and backend (memory-mcp). It captures:

Unhandled exceptions with full stack traces
HTTP 5xx errors with request context
React error boundaries with component stack
Breadcrumbs (console logs, network requests leading up to errors)

Sentry is only active when SENTRY_DSN is set in the environment. It is active on production.

What We Are Not Capturing

For completeness:

No third-party product analytics — no PostHog, Mixpanel, Amplitude, or Segment
No session replay or click heatmaps
No page load timing or Core Web Vitals
No A/B test assignments
No outbound email open tracking

Retention Summary

Category	Retention
HTTP requests, skill events, proxy audit	30 days
Push tokens, user presence, user activity	Current state only (overwritten)
Chat sessions and messages	Indefinite
Memories and feedback	Indefinite
Agent activity logs	Indefinite
SSH and database query logs	Indefinite
User profile and settings	Indefinite
Sentry errors	Sentry’s default (90 days on free tier)