Add Healthcheck API for Cassini

Cassini’s `ListenerManager` currently operates as an mTLS-enabled TCP server that accepts secure client connections and spawns listener actors for each session. There is currently no way for operators, orchestrators (e.g., the control plane), or CI harnesses to verify that Cassini is healthy, responsive, or ready to accept new connections.

This issue proposes adding a **healthcheck API** for Cassini to expose readiness and liveness information.

---

### **Background**

Cassini is a long-running actor-based service. It doesn't expose HTTP endpoints; all interactions occur over a secured TCP layer. However, external orchestration systems (local controller, future distributed runners, or Kubernetes) need a way to:

* Verify that Cassini is up and listening.
* Confirm it can accept and process new client connections.
* Optionally query internal actor or system health (e.g., listener counts, backlog size).

A healthcheck mechanism is especially important as we move toward distributed test orchestration, where the control plane will need to determine service readiness programmatically before dispatching test plans.

---

### **Questions & Design Considerations**

#### 🧩 1. Should the healthcheck be REST/HTTP-based?

**Pros (HTTP/REST):**

* Standardized pattern (`GET /healthz`, `/readyz`) supported by most infra tooling.
* Works seamlessly with container health probes and service monitors.
* Easy to implement with lightweight HTTP server crates (`axum`, `hyper`, `tiny_http`).

**Cons:**

* Introduces a separate protocol (HTTP vs. TCP).
* Slightly more overhead; might feel inconsistent with Cassini’s current architecture.

**Alternative: TCP-based healthcheck**

* Simpler but lower fidelity: e.g., attempt to open a TCP connection to Cassini’s port, verify TLS handshake.
* Confirms the listener is running and certificates are valid.
* Does *not* confirm internal actor state (e.g., if ListenerManager crashed internally but socket is still bound).

**Recommendation:**
✅ Implement a **small auxiliary HTTP healthcheck endpoint** (listening on localhost or a management port).
This endpoint can internally query the `ListenerManager` actor to report:

* `status: "ok" | "degraded" | "error"`
* number of active listener actors
* timestamp of last accepted connection
* broker connectivity (optional future check)

This keeps the operational interface standard without modifying the existing TCP listener behavior.

---

### **Proposed Implementation Plan**

1. **Create a `HealthState` struct** (stored in the supervisor or updated via message from `ListenerManager`):

   ```rust
   struct HealthState {
       last_connection: Option<Instant>,
       active_listeners: usize,
       last_error: Option<String>,
   }
   ```

2. **Expose an HTTP endpoint** (e.g., port `9090` or configurable):

   * `/healthz` → returns `200 OK` if listener is active and responsive.
   * `/metrics` (optional) → returns system stats in JSON or Prometheus format.

3. **Implement a simple healthcheck actor**:

   * Periodically queries the `ListenerManager` for its internal state via an actor message.
   * Updates the `HealthState` cache.

4. **Update deployment configs (optional)** to include readiness/liveness probes hitting `localhost:9090/healthz`.

---

### **Acceptance Criteria**

* [ ] Cassini exposes a healthcheck endpoint returning 200 when listener is healthy.
* [ ] Failing TCP bind or actor supervision crash results in non-200 response.
* [ ] The polar harness controller can use this endpoint to block until Cassini is ready before starting test runs.
* [ ] HealthCheckResponse contains data like uptime, existing topics (if any).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Healthcheck API for Cassini #96

Background

Questions & Design Considerations

🧩 1. Should the healthcheck be REST/HTTP-based?

Proposed Implementation Plan

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Healthcheck API for Cassini #96

Description

Background

Questions & Design Considerations

🧩 1. Should the healthcheck be REST/HTTP-based?

Proposed Implementation Plan

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions