fix(push): don't delay overdue down checks after restart (#6866)#7067
fix(push): don't delay overdue down checks after restart (#6866)#7067dharunashokkumar wants to merge 5 commits intolouislam:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adjusts push monitor startup behavior so overdue/down detection isn’t delayed by a full interval after a server restart, addressing long-interval push monitors staying incorrectly UP.
Changes:
- Preloads the most recent heartbeat for push monitors on
start()to decide whether to run an immediate check. - For push monitors with an existing heartbeat, runs
safeBeat()immediately; for monitors with no prior heartbeat, keeps the initial grace delay.
| previousBeat = await R.findOne("heartbeat", " monitor_id = ? ORDER BY time DESC", [this.id]); | ||
| if (previousBeat) { | ||
| retries = previousBeat.retries; |
There was a problem hiding this comment.
The initial R.findOne("heartbeat", ...) for push monitors runs outside of safeBeat()/beat() error handling. If the DB query fails (e.g., transient DB lock or connection issue), start() will reject and the monitor may fail to start/resume. Consider wrapping this lookup in a try/catch (fall back to previousBeat = null) or moving it into safeBeat() so failures are handled consistently and the monitor can retry instead of aborting startup.
| previousBeat = await R.findOne("heartbeat", " monitor_id = ? ORDER BY time DESC", [this.id]); | |
| if (previousBeat) { | |
| retries = previousBeat.retries; | |
| try { | |
| previousBeat = await R.findOne("heartbeat", " monitor_id = ? ORDER BY time DESC", [this.id]); | |
| if (previousBeat) { | |
| retries = previousBeat.retries; | |
| } | |
| } catch (e) { | |
| console.trace(e); | |
| UptimeKumaServer.errorLog(e, false); | |
| log.error("monitor", "Failed to load previous heartbeat for push monitor, starting without previous state"); | |
| previousBeat = null; |
| if (this.type === "push") { | ||
| setTimeout(() => { | ||
| if (previousBeat) { | ||
| safeBeat(); | ||
| }, this.interval * 1000); | ||
| } else { | ||
| this.heartbeatInterval = setTimeout(safeBeat, this.interval * 1000); | ||
| } |
There was a problem hiding this comment.
This change alters push-monitor startup scheduling (immediate timeout evaluation when a prior heartbeat exists). There are backend monitor tests in test/backend-test/monitors/, but no regression test appears to cover push monitor restart behavior. Adding a unit/integration test that seeds a prior heartbeat and asserts start() triggers an immediate check (and does not wait a full interval) would help prevent regressions for long-interval push monitors.
Summary
In this pull request, the following changes are made:
for push monitors, if an existing heartbeat is already overdue at startup/restart, timeout is checked immediately instead of waiting one more full interval.
for fresh push monitors with no previous heartbeat, the initial grace delay is kept as-is.
this fixes long-interval push monitors staying up after they should be down.
Relates to Push monitors with long intervals (>24h) fail to detect DOWN status #6866
Resolves Push monitors with long intervals (>24h) fail to detect DOWN status #6866
Please follow this checklist to avoid unnecessary back and forth (click to expand)
I understand that I am responsible for and able to explain every line of code I submit.
Screenshots for Visual Changes
UPDOWN