← Tech
data captured per user — by retention period Real-time 30-day logs Indefinite Presence Activity state HTTP requests Skill activations Proxy audit Chat sessions Memories Agent activity SSH & DB logs Convex Database

What We Capture Per User

7 Apr 2026 · Tristan Wright

Why This Exists

Every AI agent platform makes a choice: how much do you observe in order to improve, and how much do you leave dark? CollectHive leans toward observability. The more we understand about how Amber is being used — which tools fire, which memories get recalled, where costs spike — the better we can tune it.

This article is a full inventory of what Amber captures on production (agent.collecthive.ai), where it lives, and how long it’s kept. It is written for the team, not as a privacy policy — the intent is transparency about our own system’s behaviour.


Identity & Profile

The foundation. Amber knows who you are via Clerk, which provides the clerkId that ties everything together.

FieldWhere storedNotes
Email addressusers tableSynced from Clerk on first login
Display nameusers tableSynced from Clerk
Avatar URLusers tableSynced from Clerk
Admin flagusers tableSet manually
IP addressusageEvents tableCaptured per HTTP request
User agent stringusageEvents tableCaptured per HTTP request

IP and user agent are stored against every request. Retention: 30 days, then auto-deleted by a daily cron.


Sessions & Messages

The core of what Amber does. Every chat is fully recorded.

Sessions

Each chat session captures:

  • Model used — sonnet, opus, haiku, etc.
  • Permission mode — default, plan, or auto
  • Thinking mode — off, think, or think-hard
  • Working directory — the project path the agent ran in
  • Total cost — estimated USD from token usage
  • Token breakdown — input, output, cache read, cache write
  • Duration — wall-clock time in milliseconds
  • Context usage — percentage of the context window consumed
  • Error — message if the session failed
  • Skill names — denormalised list of skills activated during the session

Sessions are retained indefinitely.

Messages

Every message in every session is stored in full, including:

  • Role (user, assistant, tool)
  • Full content text
  • Tool name, input parameters, and output
  • Thinking tokens (Claude’s internal reasoning, when enabled)
  • Images attached
  • Injected memories — which memories were pulled into context for this message
  • Extracted memories — what the extraction pipeline pulled out, with confidence scores

Messages are retained indefinitely.

Technical Detail — Memory Injection Tracking

Each message record stores the full set of memories that were injected into Claude’s context at that point. This means you can replay exactly what context the model had for any given response. The extractedMemories field records what the extraction pipeline tried to learn from the message — including cases where it decided not to store (with a skip reason).


Memory System

The memory system is the part of Amber that learns from usage. Every session feeds it.

WhatDetails
Memory contentTitle, body, type classification
Embedding1,024-dimensional vector (mxbai-embed-large) for semantic search
Confidence score0–100, adjusted by user feedback
Tiercore / active / archive / merged
Knowledge graph linksrelatedIds connecting related memories
FeedbackPer-memory ratings (helpful, not helpful, wrong, outdated) with optional comment
Recall logWhich chat triggered a recall, whether the user marked it helpful

Memories are retained indefinitely. User feedback directly adjusts confidence scores and can trigger auto-archival after repeated negative ratings.


Real-Time Presence

Two lightweight tables track whether you’re actively using Amber, primarily to avoid sending push notifications when you’re already looking at the screen.

userPresence — updated every 10 seconds:

  • Which chat you’re currently viewing
  • Timestamp of last heartbeat

userActivity — updated on interaction:

  • Last activity timestamp
  • Last keyboard/mouse/touch interaction
  • Platform (web, android, ios)
  • Whether the browser tab is visible

Both are ephemeral — they reflect current state, not history.


Event Logs

Three audit tables record discrete actions. All retain data for 30 days.

HTTP Requests (usageEvents)

Every request to the memory-mcp server is logged:

  • Path, HTTP method, status code
  • IP address, user agent
  • Event type (page view, API call, login, WebSocket connect)

Skill Activations (skillEvents)

When skills are used:

  • Skill name
  • Event type: activation (injected into context), read (agent fetched content), gated_tool_call (a skill-gated tool was executed)
  • Tool name (for gated tool calls)
  • Session ID

Credential Proxy (proxyAuditLogs)

When Amber uses a stored credential on your behalf:

  • Service name (GitHub, Slack, Jira, etc.)
  • HTTP method and path (truncated to 500 characters)
  • Response status code

Agent Activity Logs

Amber’s autonomous agents — email triage, Moltbook, the admin agent — each have their own activity tables. These are retained indefinitely.

AgentWhat’s logged
Email agentBrowse inbox, read email, flag, move, draft response, search, organise — with folder, content, and metadata
Moltbook agentPost, comment, vote, follow, DM send/approve, subscribe — with target and content
Admin agentFix attempts, escalations, heartbeats, push operations
Agent teamsEvery message between agents: content, type, sender, recipient, whether read

System Execution Logs

Beyond agent-specific activity, several system operations are logged per user.

SSH commands — full command text, remote host, exit code, access classification. Retained indefinitely.

Database queries — SQL previews (first 500 characters), row count, duration, status. Retained indefinitely.

Pipeline job logs — every log line from background pipeline jobs (research, consolidation, export), with level (info, warn, error, debug).

Workflow runs — step-by-step results from outer loop and wizard runs, including cost per run.


Push Notifications

To send notifications to mobile and desktop, Amber stores:

  • FCM device tokens — one per registered device
  • Platform — android, ios, or web
  • Device ID (optional)

Invalid tokens are cleaned up automatically. Notification delivery is suppressed when userActivity shows you’re actively using Amber on any device.


Analytics Aggregations

Two Convex modules compute derived analytics from the raw tables.

Per-User Stats (stats.ts)

Aggregated on demand for the dashboard:

  • Session trends (8-week lookback) — count, cost, token usage, error rate
  • Engagement metrics — messages per session, active days, streak, model distribution
  • Memory analytics — count by type and tier, recall ratings
  • Tool usage breakdown

Team-Level Analytics (usageAnalytics.ts)

Admin-only, sampled from recent activity:

  • Daily and weekly active user counts
  • Feature area adoption this week (Projects, Settings, Skills, Images, etc.)
  • Error hotspots by feature area

Error Tracking

Sentry is configured on both the frontend (admin-ui) and backend (memory-mcp). It captures:

  • Unhandled exceptions with full stack traces
  • HTTP 5xx errors with request context
  • React error boundaries with component stack
  • Breadcrumbs (console logs, network requests leading up to errors)

Sentry is only active when SENTRY_DSN is set in the environment. It is active on production.


What We Are Not Capturing

For completeness:

  • No third-party product analytics — no PostHog, Mixpanel, Amplitude, or Segment
  • No session replay or click heatmaps
  • No page load timing or Core Web Vitals
  • No A/B test assignments
  • No outbound email open tracking

Retention Summary

CategoryRetention
HTTP requests, skill events, proxy audit30 days
Push tokens, user presence, user activityCurrent state only (overwritten)
Chat sessions and messagesIndefinite
Memories and feedbackIndefinite
Agent activity logsIndefinite
SSH and database query logsIndefinite
User profile and settingsIndefinite
Sentry errorsSentry’s default (90 days on free tier)