Cloud Engineering — CollectHive

What Exists

The platform runs on GCP across three environments — local, staging, and production. The CI/CD pipeline is automated. The Memory MCP server, Convex backend, and dev container system are all running. It works.

The problem is that it was built to work, not built to be owned. Nobody on the current team has cloud engineering as their primary discipline. Decisions get made, infrastructure gets stood up, and then it runs until something breaks or costs spike unexpectedly.

What We’d Love Help With

Cost visibility and control. Right now, GCP costs are monitored reactively. Someone with cloud engineering experience would set up proper alerting, identify waste, and make intentional decisions about what to run where.

Resilience. The three-environment setup works in normal conditions. Edge cases — unexpected load, container failures, Convex restarts — are handled ad hoc. A cloud engineer would design for failure, not just recover from it.

Security posture. Secrets are managed via GCP Secret Manager. Networking and IAM are functional but not hardened. There’s room to tighten this significantly before the platform opens to more users.

Infrastructure decisions. As the platform grows — more users, more dev containers, more heartbeat jobs — there will be architectural decisions about scaling, container orchestration, and cost trade-offs. These need someone who thinks in this space, not someone Googling their way through it.

Documentation and runbooks. When something breaks at an inconvenient time, there should be a runbook. Right now there isn’t one.

If This Sounds Like You

The platform is genuinely interesting infrastructure — multi-tenant agent execution, shared memory across users, per-user heartbeat scheduling. If you think in this space and want to be part of something being built from the ground up, we’d love to have you involved. You’d have full visibility into how the system works and real input into how it evolves.