OTA Guncelleme Takıldı
Semptomlar
- OTA islemi uzun suredir "in_progress" durumunda kaliyor
- Cihaz firmware versiyonu guncellenmemis (eski versiyon gorunuyor)
- OTA ilerleme yuzdesi belirli bir noktada donmus (ornegin %30)
- Toplu OTA isleminde bazi cihazlar guncellenmis, bazilari takılmis
Olasi Sebepler
- Ag baglantisi kararsiz — firmware indirme yarida kesilmis
- MinIO uzerindeki firmware dosyasina erisilemıyor
- ESP32 flash bellek yetersiz veya bozuk
- OTA timeout suresi dolmus (varsayilan 7200s)
- Firmware dosyasi bozuk veya checksum uyusmuyar
- Batch OTA esanli cihaz sayisi cok fazla, bant genisligi yetersiz
Teshis Adimlari
1. OTA Ilerleme Durumunu Kontrol Et
# Veritabaninda OTA gorev durumunu sorgula
docker exec -it zeus-postgres psql -U zeus -d zeus_db -c "
SELECT oj.id, oj.device_id, oj.status, oj.progress,
oj.firmware_version, oj.started_at,
NOW() - oj.started_at AS gecen_sure,
oj.error_message
FROM ota_jobs oj
WHERE oj.status = 'in_progress'
ORDER BY oj.started_at ASC;
"
# Tamamlanmis ve basarisiz OTA islerini kontrol et
docker exec -it zeus-postgres psql -U zeus -d zeus_db -c "
SELECT status, COUNT(*) AS sayi
FROM ota_jobs
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY status;
"
2. MQTT OTA Progress Topic'ini Dinle
# OTA ilerleme mesajlarini dinle
timeout 120 mosquitto_sub -h localhost -p 1883 \
-t "zeus/+/+/ota/progress" -v
# OTA sonuc mesajlarini dinle
timeout 120 mosquitto_sub -h localhost -p 1883 \
-t "zeus/+/+/ota/result" -v
3. MinIO Firmware Dosyasi Erisimini Kontrol Et
# MinIO saglik kontrolu
curl -s -o /dev/null -w "%{http_code}" http://localhost:9000/minio/health/live
# Firmware dosyasinin MinIO'da var olup olmadigini kontrol et
docker exec zeus-minio mc ls local/firmware/
# Belirli bir firmware dosyasini kontrol et
docker exec zeus-minio mc stat local/firmware/{firmware_filename}
# Dosya boyutunu ve hash'ini dogrula
docker exec zeus-minio mc cat local/firmware/{firmware_filename} | md5sum
4. ESP32 OTA Loglarini Kontrol Et
# ESP32'den gelen OTA durum mesajlarini MQTT uzerinden kontrol et
mosquitto_sub -h localhost -p 1883 \
-t "zeus/{tenant_id}/{gateway_id}/ota/status" -v -C 1 --retained-only
# Backend OTA servis loglarini incele
docker logs --tail 300 zeus-backend 2>&1 | grep -i "ota\|firmware\|update"
# Celery worker'daki OTA gorevlerini kontrol et
docker exec zeus-backend celery -A app.core.celery.app inspect active \
| grep -i "ota"
5. Ag Bant Genisligi ve Baglanti Kontrolu
# MinIO'dan firmware indirme hizini test et
time curl -s -o /dev/null "http://localhost:9000/firmware/{firmware_filename}"
# ESP32 cihazın WiFi sinyal gucunu kontrol et (son rapor)
docker exec -it zeus-postgres psql -U zeus -d zeus_db -c "
SELECT device_id, metadata->>'wifi_rssi' AS rssi,
metadata->>'free_heap' AS bos_bellek,
created_at
FROM device_events
WHERE device_id = '{device_id}'
AND event_type = 'status_report'
ORDER BY created_at DESC
LIMIT 5;
"
Cozum Adimlari
OTA Timeout Kontrolu ve Uzatma
# OTA timeout ayarini kontrol et (varsayilan: 7200 saniye = 2 saat)
docker exec zeus-backend env | grep -i OTA_TIMEOUT
# Gerekirse timeout'u uzat
# docker-compose.yml icinde: OTA_TIMEOUT_SECONDS=10800 (3 saat)
docker compose restart backend
Takılan OTA Gorevini Iptal Et ve Yeniden Baslat
# Takılan OTA gorevini "failed" olarak isaretle
docker exec -it zeus-postgres psql -U zeus -d zeus_db -c "
UPDATE ota_jobs
SET status = 'failed',
error_message = 'Manuel iptal - timeout',
completed_at = NOW()
WHERE id = '{ota_job_id}'
AND status = 'in_progress';
"
# Cihaza OTA iptal komutu gonder
mosquitto_pub -h localhost -p 1883 \
-t "zeus/{tenant_id}/{gateway_id}/ota/cmd" \
-m '{"action": "cancel"}'
# Yeni OTA gorevi baslat
curl -X POST -H "Authorization: Bearer {TOKEN}" \
-H "Content-Type: application/json" \
"http://localhost:8000/api/v1/ota/jobs" \
-d '{
"device_ids": ["{device_id}"],
"firmware_id": "{firmware_id}"
}'
Firmware Checksum Dogrulama
# MinIO'daki firmware dosyasinin checksum'ini kontrol et
docker exec zeus-minio mc hash sha256 local/firmware/{firmware_filename}
# Veritabanindaki kayitli checksum ile karsilastir
docker exec -it zeus-postgres psql -U zeus -d zeus_db -c "
SELECT id, version, filename, checksum, file_size,
created_at
FROM firmware
WHERE id = '{firmware_id}';
"
# Uyusmuyorsa firmware'i yeniden yukle
curl -X POST -H "Authorization: Bearer {TOKEN}" \
-F "file=@/path/to/firmware.bin" \
-F "version=v1.2.3" \
"http://localhost:8000/api/v1/firmware/upload"
Cihaz Rollback
# Cihaza rollback komutu gonder (onceki firmware'e don)
mosquitto_pub -h localhost -p 1883 \
-t "zeus/{tenant_id}/{gateway_id}/ota/cmd" \
-m '{"action": "rollback"}'
# Rollback durumunu izle
timeout 300 mosquitto_sub -h localhost -p 1883 \
-t "zeus/{tenant_id}/{gateway_id}/ota/result" -v
Toplu OTA Islemini Yonet
# Toplu OTA durumunu kontrol et
docker exec -it zeus-postgres psql -U zeus -d zeus_db -c "
SELECT batch_id, status, COUNT(*) AS cihaz_sayisi
FROM ota_jobs
WHERE batch_id = '{batch_id}'
GROUP BY batch_id, status;
"
# Basarisiz cihazlari yeniden dene (batch retry)
curl -X POST -H "Authorization: Bearer {TOKEN}" \
"http://localhost:8000/api/v1/ota/batches/{batch_id}/retry-failed"
# Esanli OTA sayisini azalt (bant genisligi icin)
# OTA_CONCURRENT_DEVICES=5 (varsayilan: 10)
docker compose restart backend
Eskalasyon
Asagidaki durumlarda eskalasyon yapin:
- Cihaz brick olmussa (OTA sonrası hic yanit vermiyor) — firmware ekibine ACIL bildirin
- Toplu OTA isleminde %50'den fazla cihaz basarisizsa — firmware ve ag ekibine bildirin
- MinIO servisi tamamen erisilemediyse — DevOps ekibine bildirin
- Guvenlık sorunu — yanlis firmware yuklenmis veya checksum uyusmazligi varsa ACIL durdurun
- Rollback basarisizsa ve cihaz cevrimdisi kalmaya devam ediyorsa — saha mudahale planlayin