B Paketi E2E Smoke Test Runbook
Sahip: backend on-call (yedek: devops-deployment-agent)
Ilgili dosyalar:
- Backend handler:
backend/app/core/ocpp/v16/handlers/core.py::on_meter_values+on_stop_transaction - Projection modulu:
backend/app/core/ocpp/v16/projection.py - Backfill:
scripts/backfill_charger_measurements.py - Frontend:
frontend/src/components/analysis/analysis-filters.tsx
Son guncelleme: 2026-05-13 (B Paketi Adim 7 - testing-qa-architect)
0. Bu Runbook Neden Var?
B Paketi ocpp_meter_samples (long-form) telemetrisini ek olarak
device_measurements (wide-form) tablosuna projeksiyon eder. Bu sayede:
/measurements/consumptionveenergy_hourly/_daily/_monthlyCAGG'lar charger verilerini kapsar,- Frontend hierarchy (Bolge -> Alt Bolge -> Cihaz) charger'lari listeler,
- Mevcut Modbus/MQTT veri akisi ETKILENMEZ.
Risk:
- Long-form persist OK ama wide-form projection sessizce fail edebilir (handler try/except ayri — bu kasitli; ancak metric counter'lar takip edilmeli).
- Backfill, eski 7+ gunluk veri icin TimescaleDB compressed chunk'larda patlatabilir.
- Frontend
subregion_idmapping fix'i regrese olabilir.
Smoke test bu uc katmani ayri ayri dogrular ve canary (1-2 charger)
mantigi ile baslar.
1. On-kosullar (Deploy oncesi son kontrol)
- PR CI yesil (lint + typecheck + unit + integration + frontend testler)
-
quality-gate-reportAdim 7 ciktisindaki tablo gecmis - DB migration plan: BU pakette YENI MIGRATION YOK (sadece kod degisikligi)
- Backfill cli scripti
scripts/backfill_charger_measurements.pymevcut +--dry-runlokalde test edilmis (varsa) - Hedef saha:
enerji.kepmark.com(Zeus 2.0 single-server staging+prod) - Onayli aktif charger ID (canary):
ZEUS-Q95KREQY7U1FAP4H(device_id:c29212e2-4379-4daa-afd7-b712d50d8750)
2. Smoke Senaryolari (8 adim, ~15 dk)
Her senaryo: BEKLENEN ciktiyi ve FAIL semantigi ile gelir.
Senaryo 1 — Backend health
curl -fsSL https://enerji.kepmark.com/health
# BEKLENEN: HTTP 200 + {"status":"ok"}
# Charger device endpoint'i (subregion_id dolu mu?)
TOKEN="<JWT>"
curl -fsSL -H "Authorization: Bearer $TOKEN" \
https://enerji.kepmark.com/api/devices/c29212e2-4379-4daa-afd7-b712d50d8750 \
| jq '.subregion_id, .device_source, .name'
# BEKLENEN: subregion_id UUID dolu (NULL DEGIL), device_source="ocpp"
FAIL semantigi:
/health5xx -> deploy basarisiz, rollback.subregion_id: null-> Adim 5 (backend) DeviceResponse LEFT JOIN broke; frontend dropdown'a charger gelmez. HOTFIX gerekli.
Senaryo 2 — Prometheus metric emit
Backend container icinden veya scrape endpoint'inden:
# Live MeterValues geldikce counter artmali
curl -fsSL http://<backend-host>:8000/metrics 2>/dev/null \
| grep -E 'ocpp_messages_total{[^}]*action="MeterValues_projection"' \
| head -20
# BEKLENEN: 4-5 dakika gozlem icinde "success" varianti > 0,
# "no_device_id" minimal (sadece null device_id charger varsa).
FAIL semantigi:
- Counter HIC artmamis -> handler import path'i kirilmis veya
_persist_meter_values_projectioncagri pathi bozulmus. Backend log'larini grep et:docker compose logs backend --since 5m 2>&1 | grep ocpp_projection - Sadece
no_device_idartiyorsa: production'da MIGRATION sira-disi ya da charger.device_id NULL kalmis. Adim 4 audit ile kontrol et.
Senaryo 3 — DB satir sayisi (backfill öncesi/sonrasi)
-- 1) Backfill ONCESI baseline (canary charger)
SELECT count(*) AS dm_count
FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750';
-- BEKLENEN baseline: 0 (henuz wide-form yazim yok)
-- 2) Long-form sayim (eski veri)
SELECT count(*) AS lf_count
FROM ocpp_meter_samples
WHERE charger_id=(SELECT id FROM ocpp_chargers
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750');
-- BEKLENEN: ~4131 (saha onaylanan kayit)
Backfill --dry-run:
docker compose exec backend python -m scripts.backfill_charger_measurements \
--charger-id <CHARGER_UUID> \
--start-time 2026-04-15 \
--dry-run
# BEKLENEN: stdout'ta projected_rows > 0, ilk 10 grup pretty-print.
Backfill APPLY (canary charger):
docker compose exec backend python -m scripts.backfill_charger_measurements \
--charger-id <CHARGER_UUID> \
--start-time 2026-04-15 \
--batch-size 500
# BEKLENEN: stats projected_rows >= 4000, integrity_errors=0.
Sonra:
SELECT count(*) FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750';
-- BEKLENEN: backfill sonrasi >=4000.
FAIL semantigi:
integrity_errors > 0-> compressed chunk veya PK conflict. Loglara bak:docker compose logs backend --since 10m | grep backfill_integrity_error- 7+ gun eski veriye yazim
compression_warninglog'unda gozukur; manueldecompress_chunk()gerekebilir.
Senaryo 4 — Consumption endpoint
curl -fsSL -H "Authorization: Bearer $TOKEN" \
"https://enerji.kepmark.com/api/measurements/devices/c29212e2-4379-4daa-afd7-b712d50d8750/aggregated?interval=hourly&start=2026-05-12T00:00:00Z&end=2026-05-13T00:00:00Z" \
| jq '.data | length'
# BEKLENEN: > 0 (saatlik agregasyon kayitlari)
FAIL semantigi:
- 200 +
{"data": []}-> backfill yapilmadi VEYA CAGG refresh edilmedi. Bir sonraki adimda CAGG refresh komutlari basilir. - 5xx -> measurements endpoint bozulmus, backend log incele.
Senaryo 5 — Frontend visual smoke
https://enerji.kepmark.com/measurements/consumptionac (yetkili kullanici)- Filter:
- Bolge dropdown -> Ankara
- Alt Bolge dropdown -> Ankaref
- Cihaz dropdown -> ZEUS-Q95KREQY7U1FAP4H GORUNMELI
- Cihaz sec + Tarih: son 24 saat + "Veri Getir"
- Energy / Power / Voltage grafikleri render olmali (data point > 0)
FAIL semantigi:
- Cihaz dropdown'da YOK -> Senaryo 1'deki
subregion_idnull senaryosu; frontend regression test'i (analysis-filters.test.tsx) lokalde fail eder, CI'da yakalanmali. - Cihaz var ama grafik bos -> Senaryo 3/6'daki backfill/CAGG durumunu yeniden kontrol et.
Senaryo 6 — CAGG refresh sonrasi
Backfill bittikten sonra DevOps psql ile (asagidaki script-block):
CALL refresh_continuous_aggregate(
'energy_hourly',
'2026-04-15 00:00:00+00',
'2026-05-13 23:59:59+00'
);
CALL refresh_continuous_aggregate(
'energy_daily',
'2026-04-15 00:00:00+00',
'2026-05-13 23:59:59+00'
);
CALL refresh_continuous_aggregate(
'energy_monthly',
'2026-04-15 00:00:00+00',
'2026-05-13 23:59:59+00'
);
Dogrulama:
SELECT count(*) FROM energy_hourly
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750';
-- BEKLENEN: > 0
FAIL semantigi:
refresh_continuous_aggregatehatasi -> CAGG WatermarkPolicy etkin olabilir; manuel refresh ihtiyaci yok ama 1-2 saat icinde otomatik doldurur. CAGG bilgilerini izle:SELECT * FROM timescaledb_information.continuous_aggregate_stats
WHERE view_name IN ('energy_hourly','energy_daily','energy_monthly');
Senaryo 7 — Long-form vs Wide-form divergence
SELECT
(SELECT count(*) FROM ocpp_meter_samples
WHERE charger_id=(SELECT id FROM ocpp_chargers
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750'))
AS long_count,
(SELECT count(*) FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750')
AS wide_count;
BEKLENEN:
long_count(long-form) >wide_count(wide-form) muhtemel — long-form'da measurand×phase ayri row, wide-form'da timestamp basina TEK row. Tipik oran: long ~3-7x wide.wide_count> 0 ve>= 4000(canary'de).
FAIL semantigi:
wide_count == 0-> backfill yapilmadi.wide_count > long_count-> ekstra yazim yapilmis (unbeklenen); duplicate timestamp veya ON CONFLICT mantigi kirilmis.
Senaryo 8 — Live MeterValues (gercek zamanli)
Charger sarjda iken (/devices/.../charge-sessions/active ile dogrula):
-- Son 10 dakikadaki wide-form satir sayisi
SELECT count(*), MAX(time) AS last_time
FROM device_measurements
WHERE device_id='c29212e2-4379-4daa-afd7-b712d50d8750'
AND time > NOW() - INTERVAL '10 minutes';
BEKLENEN:
- Charger MeterValues gonderiyorsa: > 0,
last_time30sn-2dk icinde. - Konfigurasyona bagli: meter_sample_interval_seconds=30 -> dakikada ~2 satir.
FAIL semantigi:
- Charger online ama yeni satir GELMIYOR -> handler
MeterValues_projectioncounter'indaerrorveyano_device_id. Backend log:docker compose logs backend --since 5m 2>&1 | grep ocpp_projection
3. Otomasyon — scripts/smoke_b_paketi.sh
Yarii-otomatik smoke icin shell script: <REPO_ROOT>/scripts/smoke_b_paketi.sh.
Devops Adim 10 deploy sonrasi bash scripts/smoke_b_paketi.sh calistirir,
script asagidakileri yapar:
/healthping/api/devices/<canary_uuid>ile subregion_id null kontrolu- Prometheus
MeterValues_projectioncounter snapshot - DB count'lar (long vs wide form)
- Son 10 dakika wide-form satir akisi
Manuel adimlar (frontend visual + backfill) script disinda kalmistir — operator senin tarafindan adim adim takip edilir.
4. Rollback Karari
Smoke'da FAIL olursa:
| Senaryo FAIL | Karar |
|---|---|
| 1 (subregion null) | Hotfix DeviceResponse LEFT JOIN — devops kucuk PR. Deploy NOT rolled back. |
| 2 (counter 0) + 8 (live yazim yok) | Backend handler regression. Rollback (PR revert). |
| 3 (backfill integrity) | Rollback YOK; backfill durdurulur, manuel decompress |
| 4 (consumption boş) | CAGG refresh ile coz, rollback YOK |
| 5 (frontend dropdown) | Frontend regression — sub Aksiyon: 1 ile birlestirilebilir |
| 6 (CAGG fail) | Sirali manuel refresh; rollback YOK |
| 7 (wide > long) | Beklenmedik — incident open, rollback CONSIDER |
5. Metric Threshold (24h canary sonrasi)
Canary 24-48h gozetim sonunda asagidaki esikler yesil olmali:
# Projection success rate (24h pencere) — esik 0.99
rate(ocpp_messages_total{action="MeterValues_projection", status="success"}[24h])
/ rate(ocpp_messages_total{action="MeterValues_projection"}[24h]) > 0.99
# Unknown measurand rate (Beny typo) — alarm > 0.05
rate(ocpp_messages_total{action="MeterValues_projection", status="unknown_measurand"}[1h])
/ rate(ocpp_messages_total{action="MeterValues_projection"}[1h]) < 0.05
# no_device_id rate — alarm > 0
rate(ocpp_messages_total{action="MeterValues_projection", status="no_device_id"}[15m]) > 0
Bu esikler prometheus/rules/ocpp_meter_values.yml icine eklenmeli
(monitoring-observability-architect Adim 9 ciktisi).
6. Yetki + Saha Notlari
- Saha cagrisi gerekirse: charger'in HMI'sinden bir transaction'i baslat-bitir
- 1 dakika icinde Senaryo 4'u tekrarla.
- Tek
enerji.kepmark.comsunucusu (staging=prod) oldugu icin canary asamasi = 1-2 charger ile saha gozetim. Phased rollout YOK, ama metric esikler 24-48h sonra dogrulanmali (CLAUDE.md tuzak #7).