Flight Controller Crashes/Resets on USB Device Reset / Replug / Reinit from Host device

Description

Steps to reproduce
1. flight controller is connected via USB
2. USB device is

  • reset without unplugging it, for example
    explicit reset via ioctl(fd, USBDEVFS_RESET, 0);
    implicit reset due to host computer reboot, by OS/BIOS

  • physically unplugged and then replugged (while FC remains powered through servo headers) and subsequently initialized by the OS

expected behaviour

  • graceful USB client device re-initialization. No effect on higher order firmware functions or flight control

actual behaviour

  • flight controller reboots (even if in flight , if USB connection is active in flight)

Debug Trace: (useless)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x08000bd4 in ?? ()
(gdb) bt
#0 0x08000bd4 in ?? ()
#1 0xffffffff in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Code to reproduce:

find out dev tree of flight controller and explicitly reset it by sending aproprate ioctl:
on commandline:
$> lsusb | grep Clay
Bus 001 Device 117: ID 20a0:415e Clay Logic

usbreset /dev/bus/usb/001/117

with usbreset code snippet from
https://askubuntu.com/questions/645/how-do-you-reset-a-usb-device-from-the-command-line

Additional info:
If GCS is connected via USBHID, and the reset is sent, controller does NOT reboot, but the telemetry stream is interrupted. Then two things can happen.
1.
If GCS is manually disconnected and reconnected to the USB device (which might change device ID - sometimes), then this step can be reproduced indefinitely

2. if GCS is kept in its degraded connection state or simply closed and not reconnected, any further RESET ioctl will crash/reset the flight controller. the same happens if GCS was never connected

Environment

STM32f4 for example Openpilot Revolution, revomini, ...

Activity

Show:
Eric Price
October 12, 2018, 3:23 PM
Eric Price
October 15, 2018, 11:35 AM

I updated the pull request. The ISR now works as before, but also checks for the failure case of no data being pushed into the fifo (stalled) separately.

This prevents the reboots, The problem is that a soft (ioctl) or hard (replugging) reset of the USB connection currently makes the HID telemetry unuseable - even if it hadn't been connected yet.

The reboots triggered by the replugging masked this issue so far. But I am tempted to consider this a separate bug, which might be more tricky to fix.

PIOS_COM_USB_**** don't properly handle their underlying device resetting underneath them. Both USBHID and VCP can enter unuseable states, although the exact symptoms differ.

I don't know if the cause is in PIOS_COM_USB_* or in the STM USB device library or both.

Some observations:

PIOS_COM_SendBufferNonBlockingInternal() in pios_com checks via
(com_dev->driver->available(com_dev->lower_id)) if the underlying device has gotten away and if so, flushes the pios_com tx buffer.

but this gets rechecked only every 5 seconds, if the device is nonfunctional thanks to pios.com.c line 592:
if (xSemaphoreTake(com_dev->tx_sem, 5000) != pdTRUE)

that means this flushing never takes place if the device went away for less than 5 seconds

this does not clear low level driver hardware buffer and states in pios_usb_hid.c and pios_usb_cdc.c

pios_usb.c has an IRQ handler for the hardware state
PIOS_USB_ChangeConnectionState() which gets triggered through usbhooks if the USB got unplugged or replugged (which is correctly triggered by both a hard replug and a soft replug)

however this information never reaches the endpoint drivers for hid and vcp.

the COM_Available() function for HID and vcp each call pios_usb.c COM_Available() function for the main connection and don't store internal states themselves.

the code in PIOS_USB_CheckAvailable() in pios_usb.c is unclean.

One major issue is that it checks for USB port availability by means of the 5V line being powered (checking a gpio) which is no guarantee for a established USB connection (the device might not have booted but is powered) and it reports false if a non-standard usb cable doesn't supply power but data only. So this check is prone to both false positives and false negatives. it might be saver to check a state set in PIOS_USB_ChangeConnectionState()

This is IMHO also the right place to trigger the DisconnectionCallback, not in Available() which is only periodically polled and might not get called in time to notice that the connection ever went away.

unfortunately the reconnect() function in pios_usbhooks.c only resets the overall usb state and not the endpoint drivers internal states.

it might be necessary to have a reconnect() hook inside each endpoint driver active, which would have to do ... what? reset buffers and states? reinit the com device?

I don't know. Any insight from anyone?

Eric Price
October 15, 2018, 1:44 PM

I did a bit more checking. it looks like the issue is the sending from flight controller to computer. receiving seems to work. some hacking with manual fifo flushes in usb_dcd_int.c looks more promising than changes in the pios_com layer

Eric Price
October 16, 2018, 10:17 AM

I updated the pull request.

The only thing that can get the USBHID to work again from its degraded fifo state is a complete USB stack reset with reenumeration of both core and endpoint devices.
Since this only happens on a replug or explicit USB RESET ioctl from host, this is probably the correct behaviour in such a case anyway.

To make this happen, I had to change pios_usb.c and pios_usbhook.c

Among the changes is a redesign of PIOS_USB_CheckAvailable() which now checks exclusively for connection state instead of +5V headers (cable connected) and is no longer responsible for dispatching the reconnection dispatcher (which is instead done from the connection state callback as it should be)

This has implications for the bootloader, which checked if a USB cable was detected by calling PIOS_USB_CheckAvailable() instead of PIOS_USB_CableConnected() as it probably should have (USB only really becomes available after a link has been established, which only happens after init, which bootloader only does if it detects a cable)

implementing this defined but unimplemented function and using it in the bootloader fixes this.

this is in effect not a change in bootloader behaviour, as the old PIOS_USB_CheckAvailable() did the exact same thing (check if a cable was plugged in) so this probably does not warrant a new bootloader version, nor requires a bootloader update. the chance that the bug fixed here gets triggered in bootloader are very slim (although theoretically possible)

I would consider LP602 fixed, pending code review in https://bitbucket.org/librepilot/librepilot/pull-requests/518/lp-602-workaround-fix-bug-in-stm32-usb-lib/diff

if you spot any issues, please note here, in pull request, or send me a PM!

Eric Price
October 16, 2018, 2:00 PM

update: there was one small issue with VCP (pios_usb_cdc.c) stopping to listen for incoming packets after a USB reset

to fix this I had to reintroduce rx_status tracking in usb_cdc in order to re-enable rx after a reset if it wa enabled before the reset.

Assignee

Eric Price

Reporter

Eric Price

Labels

None

Components

Fix versions

Priority

Medium
Configure