Ana içeriğe geç

B Paketi E2E Smoke Test Runbook

Sahip: backend on-call (yedek: devops-deployment-agent) Ilgili dosyalar:

  • Backend handler: backend/app/core/ocpp/v16/handlers/core.py::on_meter_values + on_stop_transaction
  • Projection modulu: backend/app/core/ocpp/v16/projection.py
  • Backfill: scripts/backfill_charger_measurements.py
  • Frontend: frontend/src/components/analysis/analysis-filters.tsx

Son guncelleme: 2026-05-13 (B Paketi Adim 7 - testing-qa-architect)


0. Bu Runbook Neden Var?

B Paketi ocpp_meter_samples (long-form) telemetrisini ek olarak device_measurements (wide-form) tablosuna projeksiyon eder. Bu sayede:

  • /measurements/consumption ve energy_hourly/_daily/_monthly CAGG'lar charger verilerini kapsar,
  • Frontend hierarchy (Bolge -> Alt Bolge -> Cihaz) charger'lari listeler,
  • Mevcut Modbus/MQTT veri akisi ETKILENMEZ.

Risk:

  • Long-form persist OK ama wide-form projection sessizce fail edebilir (handler try/except ayri — bu kasitli; ancak metric counter'lar takip edilmeli).
  • Backfill, eski 7+ gunluk veri icin TimescaleDB compressed chunk'larda patlatabilir.
  • Frontend subregion_id mapping fix'i regrese olabilir.

Smoke test bu uc katmani ayri ayri dogrular ve canary (1-2 charger) mantigi ile baslar.


1. On-kosullar (Deploy oncesi son kontrol)

  • PR CI yesil (lint + typecheck + unit + integration + frontend testler)
  • quality-gate-report Adim 7 ciktisindaki tablo gecmis
  • DB migration plan: BU pakette YENI MIGRATION YOK (sadece kod degisikligi)
  • Backfill cli scripti scripts/backfill_charger_measurements.py mevcut + --dry-run lokalde test edilmis (varsa)
  • Hedef saha: enerji.kepmark.com (Zeus 2.0 single-server staging+prod)
  • Onayli aktif charger ID (canary): ZEUS-Q95KREQY7U1FAP4H (device_id: c29212e2-4379-4daa-afd7-b712d50d8750)

2. Smoke Senaryolari (8 adim, ~15 dk)

Her senaryo: BEKLENEN ciktiyi ve FAIL semantigi ile gelir.

Senaryo 1 — Backend health

curl -fsSL https://enerji.kepmark.com/health
# BEKLENEN: HTTP 200 + {"status":"ok"}

# Charger device endpoint'i (subregion_id dolu mu?)
TOKEN="<JWT>"
curl -fsSL -H "Authorization: Bearer $TOKEN" \
https://enerji.kepmark.com/api/devices/c29212e2-4379-4daa-afd7-b712d50d8750 \
| jq '.subregion_id, .device_source, .name'
# BEKLENEN: subregion_id UUID dolu (NULL DEGIL), device_source="ocpp"

FAIL semantigi:

  • /health 5xx -> deploy basarisiz, rollback.
  • subregion_id: null -> Adim 5 (backend) DeviceResponse LEFT JOIN broke; frontend dropdown'a charger gelmez. HOTFIX gerekli.

Senaryo 2 — Prometheus metric emit

Backend container icinden veya scrape endpoint'inden:

# Live MeterValues geldikce counter artmali
curl -fsSL http://<backend-host>:8000/metrics 2>/dev/null \
| grep -E 'ocpp_messages_total{[^}]*action="MeterValues_projection"' \
| head -20
# BEKLENEN: 4-5 dakika gozlem icinde "success" varianti > 0,
# "no_device_id" minimal (sadece null device_id charger varsa).

FAIL semantigi:

  • Counter HIC artmamis -> handler import path'i kirilmis veya _persist_meter_values_projection cagri pathi bozulmus. Backend log'larini grep et:
    docker compose logs backend --since 5m 2>&1 | grep ocpp_projection
  • Sadece no_device_id artiyorsa: production'da MIGRATION sira-disi ya da charger.device_id NULL kalmis. Adim 4 audit ile kontrol et.

Senaryo 3 — DB satir sayisi (backfill öncesi/sonrasi)

-- 1) Backfill ONCESI baseline (canary charger)
SELECT count(*) AS dm_count
FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750';
-- BEKLENEN baseline: 0 (henuz wide-form yazim yok)

-- 2) Long-form sayim (eski veri)
SELECT count(*) AS lf_count
FROM ocpp_meter_samples
WHERE charger_id=(SELECT id FROM ocpp_chargers
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750');
-- BEKLENEN: ~4131 (saha onaylanan kayit)

Backfill --dry-run:

docker compose exec backend python -m scripts.backfill_charger_measurements \
--charger-id <CHARGER_UUID> \
--start-time 2026-04-15 \
--dry-run
# BEKLENEN: stdout'ta projected_rows > 0, ilk 10 grup pretty-print.

Backfill APPLY (canary charger):

docker compose exec backend python -m scripts.backfill_charger_measurements \
--charger-id <CHARGER_UUID> \
--start-time 2026-04-15 \
--batch-size 500
# BEKLENEN: stats projected_rows >= 4000, integrity_errors=0.

Sonra:

SELECT count(*) FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750';
-- BEKLENEN: backfill sonrasi >=4000.

FAIL semantigi:

  • integrity_errors > 0 -> compressed chunk veya PK conflict. Loglara bak:
    docker compose logs backend --since 10m | grep backfill_integrity_error
  • 7+ gun eski veriye yazim compression_warning log'unda gozukur; manuel decompress_chunk() gerekebilir.

Senaryo 4 — Consumption endpoint

curl -fsSL -H "Authorization: Bearer $TOKEN" \
"https://enerji.kepmark.com/api/measurements/devices/c29212e2-4379-4daa-afd7-b712d50d8750/aggregated?interval=hourly&start=2026-05-12T00:00:00Z&end=2026-05-13T00:00:00Z" \
| jq '.data | length'
# BEKLENEN: > 0 (saatlik agregasyon kayitlari)

FAIL semantigi:

  • 200 + {"data": []} -> backfill yapilmadi VEYA CAGG refresh edilmedi. Bir sonraki adimda CAGG refresh komutlari basilir.
  • 5xx -> measurements endpoint bozulmus, backend log incele.

Senaryo 5 — Frontend visual smoke

  1. https://enerji.kepmark.com/measurements/consumption ac (yetkili kullanici)
  2. Filter:
    • Bolge dropdown -> Ankara
    • Alt Bolge dropdown -> Ankaref
    • Cihaz dropdown -> ZEUS-Q95KREQY7U1FAP4H GORUNMELI
  3. Cihaz sec + Tarih: son 24 saat + "Veri Getir"
  4. Energy / Power / Voltage grafikleri render olmali (data point > 0)

FAIL semantigi:

  • Cihaz dropdown'da YOK -> Senaryo 1'deki subregion_id null senaryosu; frontend regression test'i (analysis-filters.test.tsx) lokalde fail eder, CI'da yakalanmali.
  • Cihaz var ama grafik bos -> Senaryo 3/6'daki backfill/CAGG durumunu yeniden kontrol et.

Senaryo 6 — CAGG refresh sonrasi

Backfill bittikten sonra DevOps psql ile (asagidaki script-block):

CALL refresh_continuous_aggregate(
'energy_hourly',
'2026-04-15 00:00:00+00',
'2026-05-13 23:59:59+00'
);
CALL refresh_continuous_aggregate(
'energy_daily',
'2026-04-15 00:00:00+00',
'2026-05-13 23:59:59+00'
);
CALL refresh_continuous_aggregate(
'energy_monthly',
'2026-04-15 00:00:00+00',
'2026-05-13 23:59:59+00'
);

Dogrulama:

SELECT count(*) FROM energy_hourly
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750';
-- BEKLENEN: > 0

FAIL semantigi:

  • refresh_continuous_aggregate hatasi -> CAGG WatermarkPolicy etkin olabilir; manuel refresh ihtiyaci yok ama 1-2 saat icinde otomatik doldurur. CAGG bilgilerini izle:
    SELECT * FROM timescaledb_information.continuous_aggregate_stats
    WHERE view_name IN ('energy_hourly','energy_daily','energy_monthly');

Senaryo 7 — Long-form vs Wide-form divergence

SELECT
(SELECT count(*) FROM ocpp_meter_samples
WHERE charger_id=(SELECT id FROM ocpp_chargers
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750'))
AS long_count,
(SELECT count(*) FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750')
AS wide_count;

BEKLENEN:

  • long_count (long-form) > wide_count (wide-form) muhtemel — long-form'da measurand×phase ayri row, wide-form'da timestamp basina TEK row. Tipik oran: long ~3-7x wide.
  • wide_count > 0 ve >= 4000 (canary'de).

FAIL semantigi:

  • wide_count == 0 -> backfill yapilmadi.
  • wide_count > long_count -> ekstra yazim yapilmis (unbeklenen); duplicate timestamp veya ON CONFLICT mantigi kirilmis.

Senaryo 8 — Live MeterValues (gercek zamanli)

Charger sarjda iken (/devices/.../charge-sessions/active ile dogrula):

-- Son 10 dakikadaki wide-form satir sayisi
SELECT count(*), MAX(time) AS last_time
FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750'
AND time > NOW() - INTERVAL '10 minutes';

BEKLENEN:

  • Charger MeterValues gonderiyorsa: > 0, last_time 30sn-2dk icinde.
  • Konfigurasyona bagli: meter_sample_interval_seconds=30 -> dakikada ~2 satir.

FAIL semantigi:

  • Charger online ama yeni satir GELMIYOR -> handler MeterValues_projection counter'inda error veya no_device_id. Backend log:
    docker compose logs backend --since 5m 2>&1 | grep ocpp_projection

3. Otomasyon — scripts/smoke_b_paketi.sh

Yarii-otomatik smoke icin shell script: <REPO_ROOT>/scripts/smoke_b_paketi.sh.

Devops Adim 10 deploy sonrasi bash scripts/smoke_b_paketi.sh calistirir, script asagidakileri yapar:

  1. /health ping
  2. /api/devices/<canary_uuid> ile subregion_id null kontrolu
  3. Prometheus MeterValues_projection counter snapshot
  4. DB count'lar (long vs wide form)
  5. Son 10 dakika wide-form satir akisi

Manuel adimlar (frontend visual + backfill) script disinda kalmistir — operator senin tarafindan adim adim takip edilir.


4. Rollback Karari

Smoke'da FAIL olursa:

Senaryo FAILKarar
1 (subregion null)Hotfix DeviceResponse LEFT JOIN — devops kucuk PR. Deploy NOT rolled back.
2 (counter 0) + 8 (live yazim yok)Backend handler regression. Rollback (PR revert).
3 (backfill integrity)Rollback YOK; backfill durdurulur, manuel decompress
4 (consumption boş)CAGG refresh ile coz, rollback YOK
5 (frontend dropdown)Frontend regression — sub Aksiyon: 1 ile birlestirilebilir
6 (CAGG fail)Sirali manuel refresh; rollback YOK
7 (wide > long)Beklenmedik — incident open, rollback CONSIDER

5. Metric Threshold (24h canary sonrasi)

Canary 24-48h gozetim sonunda asagidaki esikler yesil olmali:

# Projection success rate (24h pencere) — esik 0.99
rate(ocpp_messages_total{action="MeterValues_projection", status="success"}[24h])
/ rate(ocpp_messages_total{action="MeterValues_projection"}[24h]) > 0.99

# Unknown measurand rate (Beny typo) — alarm > 0.05
rate(ocpp_messages_total{action="MeterValues_projection", status="unknown_measurand"}[1h])
/ rate(ocpp_messages_total{action="MeterValues_projection"}[1h]) < 0.05

# no_device_id rate — alarm > 0
rate(ocpp_messages_total{action="MeterValues_projection", status="no_device_id"}[15m]) > 0

Bu esikler prometheus/rules/ocpp_meter_values.yml icine eklenmeli (monitoring-observability-architect Adim 9 ciktisi).


6. Yetki + Saha Notlari

  • Saha cagrisi gerekirse: charger'in HMI'sinden bir transaction'i baslat-bitir
    • 1 dakika icinde Senaryo 4'u tekrarla.
  • Tek enerji.kepmark.com sunucusu (staging=prod) oldugu icin canary asamasi = 1-2 charger ile saha gozetim. Phased rollout YOK, ama metric esikler 24-48h sonra dogrulanmali (CLAUDE.md tuzak #7).