Experiment Analyst Agent

The Experiment Analyst Agent analyzes A/B tests and experiments, providing statistical insights and ship/no-ship recommendations.

What It Does

Calculates significance - Statistical analysis of experiments
Monitors guardrails - Watches for metric regressions
Detects SRM - Sample ratio mismatch detection
Segments analysis - Breaks down results by cohort
Recommends decisions - Ship, iterate, or kill guidance

Experimentation Principles

| Principle | Description | |-----------|-------------| | Define success upfront | Set metrics before launch | | Set guardrails | Protect against regressions | | Calculate sample size | Ensure sufficient power | | Watch for novelty | Early results may be skewed | | Document learnings | Record insights for future |

Configuration

agents:
  - name: experiment-analyst
    template: experiment-analyst
    triggers:
      schedule:
        - cron: "0 9 * * *"  # Daily analysis
    config:
      # Experimentation platform
      platform: statsig  # or launchdarkly, optimizely

      # Statistical settings
      statistics:
        confidence_level: 0.95
        minimum_detectable_effect: 0.02

      # Guardrail metrics (must not regress)
      guardrails:
        - name: error_rate
          threshold: 0.01
          direction: lower_is_better
        - name: latency_p99
          threshold: 500
          direction: lower_is_better

      # Primary metrics
      primary_metrics:
        - conversion_rate
        - revenue_per_user

Experiment Report Example

## Experiment Analysis: New Checkout Flow

### Summary
- **Status:** Running (Day 14 of 21)
- **Sample Size:** 125,430 users (62,715 per variant)
- **Power:** 92% (sufficient)

### Primary Metrics

| Metric | Control | Treatment | Lift | p-value | Significant |
|--------|---------|-----------|------|---------|-------------|
| Conversion Rate | 3.2% | 3.8% | +18.7% | 0.003 | Yes |
| Revenue/User | $4.52 | $4.89 | +8.2% | 0.021 | Yes |

### Guardrail Metrics

| Metric | Control | Treatment | Status |
|--------|---------|-----------|--------|
| Error Rate | 0.8% | 0.7% | Healthy |
| Latency p99 | 340ms | 355ms | Within bounds |

### Segment Analysis

| Segment | Conversion Lift | Significant |
|---------|-----------------|-------------|
| Mobile | +22.1% | Yes |
| Desktop | +12.3% | Marginal |
| New Users | +25.8% | Yes |

### Recommendation
**SHIP** - Strong positive results across primary metrics with no guardrail violations. Mobile and new user segments show particularly strong improvements.