OCPP Auth List Push Silent-Fail
Semptomlar
- Operator/CSO panelinden "RFID kart cihaza gonderildi" mesaji aliniyor ama saha geri donusu "kart cihazda yok" / "ekrandan giremiyorum" seklinde
- Dashboard'da
OcppAuthListPushEmptyTargetsveyaOcppAuthListPushHighExceptionRatealarmi aktif OcppCommandLogtablosundaaction='SendLocalList'icin son saatte beklenen sayida kayit yok- Tenant-wide push butonu basildiktan sonra response'da
dispatched_count=0,warning='no_chargers_resolved'veyatenant_filter_dropped_Ngoruluyor
Olasi Sebepler
- UI bug — frontend bos
charger_idsile submit ediyor (target=specific ama liste bos) - Tenant filter drift — kullanici tenant disi UUID'leri secti, backend
sustu (
tenant_filter_dropped_Nuyarisi) - Concurrent push (CAS drift) — eski admin paneli + yeni panel ayni tenant uzerinde es zamanli push gonderiyor; CAS update 0 row affected verip drift logluyor
- Charger toplu offline — registry'de owner yok, push hicbir charger'a ulasmadan offline counter'a yansiyor (silent-fail degil ama operator "ulasti" sandi)
- Dispatcher / pubsub backlog —
_send_ocpp_commandwrapper'i timeout donduruyor;OcppCommandLoglog_status='timeout'ile kayit gecer - Backend exception —
push_tenant_auth_listicinde uncaught hata (DB connection, sqlalchemy session expire vb.)
Teshis Adimlari
1. Alarm Sebebini Daralt
# Hangi metric/result kombosu tetikledi:
# Grafana'da "Auth List Push Attempts" panel'inde son 1 saatteki
# {mode, result} kirilimini kontrol et.
#
# Prometheus query:
# sum by (mode, result) (rate(ocpp_auth_list_push_attempts_total[1h]))
#
# Beklenen:
# - result=dispatched > 0 -> akis saglikli
# - result=empty cogunluk -> UI bug veya tenant filter sorunu
# - result=exception > 0 -> backend uncaught error
2. OcppCommandLog Audit Kayitlarini Sorgula
-- Son 1 saatte tenant icin SendLocalList denemeleri (audit kanit).
SELECT
id,
charger_id,
user_id,
log_status, -- accepted | rejected | timeout | error
error_code,
error_description,
latency_ms,
created_at
FROM ocpp_command_log
WHERE tenant_id = '<TENANT_UUID>'
AND action = 'SendLocalList'
AND created_at > now() - interval '1 hour'
ORDER BY created_at DESC
LIMIT 50;
- Hic kayit yok -> wrapper hic cagrilmamis (silent-fail kesin: empty target veya schema validation error).
- Sadece
log_status='timeout'-> dispatcher/charger reply lag,ocpp_command_timeouts_total{timeout_kind}breakdown'una bak. log_status='rejected'cogunluk -> charger SendLocalList reject ediyor (firmware sorunu / listVersion drift olabilir).
3. AuditLog Tenant Push Kaydini Kontrol Et
-- push_tenant_auth_list her cagrida bir audit kaydi yazar; payload'da
-- requested/resolved/dispatched/accepted/rejected/offline/error/warning
-- ozeti yer alir.
SELECT
id,
actor_user_id,
payload,
created_at
FROM audit_log
WHERE tenant_id = '<TENANT_UUID>'
AND action = 'ocpp.auth_list.tenant_pushed'
AND created_at > now() - interval '1 hour'
ORDER BY created_at DESC
LIMIT 20;
payload->>'requested_count' > 0ANDpayload->>'resolved_count' = 0-> tenant filter drop. Cosumer (frontend) tenant disi UUID gonderdi.payload->>'resolved_count' > 0ANDpayload->>'dispatched_count' = 0-> tum charger'lar offline. Registry kontrol et.payload->>'cas_drift_count' > 0-> concurrent push var, eski admin paneli ariyor olabilir.
4. Per-Charger Outcome Dagilimi
# Grafana "Auth List Push Per-Charger Outcomes" panel'i (yoksa Prometheus
# manuel query):
# sum by (outcome) (rate(ocpp_auth_list_push_chargers_total[1h]))
#
# Beklenen tipik dagilim:
# accepted > 80%, offline %10-15, rejected < %5, timeout/error < %2
#
# error+timeout > %20 -> dispatcher/Redis pubsub backlog veya charger
# firmware reply problemi.
5. Frontend Payload Inceleme (Empty Pattern)
UI'dan tenant push tetiklendiginde browser devtools Network sekmesinde ilgili request'i ac:
- Endpoint:
POST /api/v1/ocpp/auth-list/push - Payload:
{
"target": "specific",
"charger_ids": ["uuid1", "uuid2", ...]
} target="specific"ANDcharger_idsbos array -> frontend bug (selection state lost). UI ekibine acil ticket.target="all"AND backendresolved_count=0-> tenant'ta hic charger yok (yeni tenant) veyaOcppChargertablosu DB query'sinde tenant_id filter yanlis (backend regresyon).
6. CAS Drift Warning Loglarini Tara
# Backend structlog query — son 1 saatte ocpp_auth_list_cas_drift uyarisi:
docker compose logs backend --since 1h | grep ocpp_auth_list_cas_drift | tail -20
# Cikti ornek:
# {"event":"ocpp_auth_list_cas_drift","charger_id":"...","candidate_version":42,"expected_current":41,...}
#
# Cok sayida drift -> es zamanli push'lar var. Admin panelinde aktif
# session'lar ve son 5dk submit eden user_id'leri sorgula.
SELECT actor_user_id, created_at
FROM audit_log
WHERE action = 'ocpp.auth_list.tenant_pushed'
AND tenant_id = '<TENANT_UUID>'
AND created_at > now() - interval '5 minutes'
ORDER BY created_at DESC;
7. Backend Exception Trace (Exception Pattern)
# F4 refactor sonrasi outer try/finally exception path'i loglar:
docker compose logs backend --since 30m \
| grep -E "push_tenant_auth_list|ocpp_auth_list_push_charger_failed" \
| tail -50
# Genelde:
# - sqlalchemy.exc.OperationalError -> DB connection pool exhausted
# - sqlalchemy.exc.InvalidRequestError -> session expire race
# - asyncpg.exceptions.* -> DB protocol error
Mitigasyon
Geçici (5dk içinde uygulanabilir)
-
Empty target pattern -> Operator'a yeni panelden tek tek charger secip push etmesini soyle (tenant-wide submit'i ertele); frontend ekibine bug raporu duş.
-
Backend exception pattern ->
docker compose restart backend. Bu CAS drift olusturmaz cunku wrapper kendi commit'lerini yapar; restart sonrası bir sonraki push temiz baslar. -
Dispatcher timeout cogunluk -> Eger PR-E6 fix devrede degilse
docker compose restart backend; PR-E6 canary aktifseOcppDispatcherLoopErrorsalarmlarini cross-check et (runbook:pre6-canary-monitoring).
Kalıcı
- Frontend empty submit guard — UI'dan boş
charger_idsile request gonderilmesini onle (button disabled state). - Backend rate limiting — Ayni tenant icin 60sn'de en fazla 2 push (concurrent CAS drift'i azaltir).
- CAS drift > %5 ise —
push_tenant_auth_listicine pessimistic lock (transaction-level advisory lock) eklenmesi degerlendir.
Rollback Plani
F4 (push_tenant_auth_list wrapper refactor) PR'inin etkisi:
- DB migration yok -> kolay revert.
- Revert sonrasi
_send_ocpp_commandcagrisi kaybolur, eski directsend_commandmantigi geri gelir -> silent-fail riski geri doner. - Revert yalnizca acil regresyon (orn. tum push'lar exception donuyorsa)
durumunda dusunulmeli. Once
OcppCommandLogaudit'inden gercek hata pattern'ini belirle; F4 refactor yapisal olarak gerekiyor.
Komutlar:
# Revert (sadece acil durumda):
git revert <F4-commit-sha>
git push origin main
# CI yesil donduginde deploy.yml otomatik tetiklenir.
Dashboard
Grafana OCPP Charger Fleet dashboard'una eklenmesi onerilen panel'ler
(henuz mevcut JSON'da YOK — manuel eklenecek):
- Auth List Push Attempts (rate/dk) ->
sum by (mode, result) (rate(ocpp_auth_list_push_attempts_total[5m])) * 60 - Auth List Push Per-Charger Outcomes ->
sum by (outcome) (rate(ocpp_auth_list_push_chargers_total[5m])) - Auth List Push Duration p50/p95/p99 ->
histogram_quantile(0.95, sum by (le, mode) (rate(ocpp_auth_list_push_duration_seconds_bucket[5m])))
Bunlari eklemek icin: grafana/dashboards/ocpp-fleet.json -> yeni
panel ID'leri 16-18 olarak ekle, gridPos y=47 baslangiclı 3 yan yana
6-genişlik panel.
Eskalasyon
Asagidaki durumlarda Backend ekibine eskalasyon yapin:
OcppAuthListPushHighExceptionRate15dk+ devam ediyorsa- CAS drift count saatte > 50 (concurrent push storm'u)
OcppCommandLoglog_status='error'error_codedistribution'unda bilinmeyen kod (dispatcher_error,charger_not_founddisinda) cikiyorsa- Rollback dusunuluyor ise (PR sahibi + on-call backend + DevOps onayi gereklidir)