Skip to content

mavlink_mission: GETLIST retry path can prevent mission protocol timeout from ever clearing a stuck upload #27132

@vpaulgreenwood

Description

@vpaulgreenwood

Summary

When PX4 is receiving a mission upload and stops getting the next MISSION_ITEM, the GETLIST retry branch can keep resending MISSION_REQUEST often enough that the protocol timeout handler never runs.

The result is that _transfer_in_progress can remain set until reboot, and later MISSION_COUNT messages are rejected with MAV_MISSION_ERROR.

Observed behavior

  • Mission upload stalls after PX4 has already entered MAVLINK_WPM_STATE_GETLIST.
  • PX4 keeps retrying MISSION_REQUEST.
  • The mission protocol never reaches the timeout path that calls switch_to_idle_state() and clears _transfer_in_progress.
  • Subsequent upload attempts are rejected in handle_mission_count() because _transfer_in_progress is still true.
  • Reboot clears the problem immediately.

Why this looks like a real bug

In affected trees, the timeout logic is ordered like this:

if (_state == MAVLINK_WPM_STATE_GETLIST && retry_timeout_expired) {
    send_mission_request(...);
} else if (_state != MAVLINK_WPM_STATE_IDLE && protocol_timeout_expired) {
    switch_to_idle_state();
    _transfer_in_progress = false;
}

If no new mission item is received, _time_last_recv keeps aging toward the protocol timeout, but the GETLIST retry branch keeps matching first every retry interval. That prevents the protocol-timeout branch from executing.

Reproduction outline

  1. Start a mission upload so PX4 enters MAVLINK_WPM_STATE_GETLIST.
  2. Let PX4 receive the initial mission metadata and at least one request transition into GETLIST.
  3. Drop or stop the next expected MISSION_ITEM from the uploader.
  4. Wait longer than MAVLINK_MISSION_PROTOCOL_TIMEOUT_DEFAULT.
  5. Observe that PX4 keeps resending MISSION_REQUEST and does not clear _transfer_in_progress.
  6. Send a fresh MISSION_COUNT.
  7. Observe that PX4 rejects it because _transfer_in_progress is still set.

Scope

This is closely related to older mission-transfer timeout bugs, but this report is specifically about retry ordering preventing the timeout cleanup from ever running.

Current upstream main may need the same ordering review for any resend branch that sits ahead of protocol-timeout handling, including the newer SENDLIST resend path for dropped MISSION_COUNT.

Proposed fix

Handle protocol timeout before retry resends:

if (_state != MAVLINK_WPM_STATE_IDLE && protocol_timeout_expired) {
    switch_to_idle_state();
    _transfer_in_progress = false;
} else if (_state == MAVLINK_WPM_STATE_GETLIST && retry_timeout_expired) {
    send_mission_request(...);
}

That preserves normal retry behavior before the timeout window, but once the protocol has been silent too long, timeout cleanup wins and PX4 can accept a fresh upload again.

Field impact

On low-bandwidth or lossy links, this can leave the vehicle requiring a reboot to recover mission upload capability even though the protocol already has timeout cleanup logic intended to recover from exactly this kind of stall.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions