Skip to content

uhd_find_devices / libusb1_base seg faults #615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jpalladino opened this issue Jul 28, 2022 · 4 comments
Closed

uhd_find_devices / libusb1_base seg faults #615

jpalladino opened this issue Jul 28, 2022 · 4 comments

Comments

@jpalladino
Copy link

Issue Description

Using UHD 4.1.0.5 on either Ubuntu 18.04 or Ubuntu 20.04 machines, we occasionally see seg faults when executing uhd_find_devices. This has been traced back to global session management in 'host/lib/transport/libusb1_base.cpp', specifically 'libusb::session::sptr libusb::session::get_global_session(void)'. It appears that the existence of a global_session is checked for. If a session does exist, the next step is to return a pointer to that session. On occasion, it seems that the session expires just after the check, and an empty shared pointer is returned by get_global_session. This has been tested on many different host machines.

Setup Details

UHD 4.1.0.5 / Ubuntu 18.04 or 20.04.
run uhd_find_devices.

Expected Behavior

No Seg Fault

Actual Behaviour

Occasional seg faults.

Steps to reproduce the problem

To reproduce the issue, I would run the following:
while true; do date; uhd_find_devices; sleep 6; done
Leaving this run, the problem might occur anywhere from 1 to maybe 100 times over 24 hours

Additional Information

When the seg fault occurs, this would be displayed in the terminal:

Mon Jul 25 08:16:00 EDT 2022
[INFO] [UHD] linux; GNU C++ version 7.5.0; Boost_106501; UHD_4.1.0.HEAD-0-g6bd0be9c
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[DEBUG] [MPMD] Discovering MPM devices on port 49600
Segmentation fault (core dumped)

Checking "dmesg -T" would result in:

[Mon Jul 25 08:14:56 2022] uhd_find_device[30881]: segfault at 0 ip 00007f208cc7efd5 sp 00007f2082ffc500 error 4 in libuhd.so.4.1.0[7f208c2f6000+cb8000]
[Mon Jul 25 08:14:56 2022] Code: 48 c7 47 18 00 00 00 00 48 89 07 48 8d 47 08 48 8d 7c 24 30 48 89 44 24 18 e8 67 be ff ff 48 8b 7c 24 30 48 8d 1d db ca ff ff <48> 8b 07 48 8b 40 10 48 39 d8 0f 85 cb 01 00 00 48 39 d8 0f 85 d9

We were able to capture some coredumps. a backtrace in gdb showed:

#0  libusb_session_impl::get_context (this=0x0)
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:53
#1  uhd::transport::libusb::device_list::make ()
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:180
#2  0x00007f7ecf2b295a in uhd::transport::usb_device_handle::get_device_list (
    vid_pid_pair_list=std::vector of length 1, capacity 1 = {...})
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:475
#3  0x00007f7ecf2b30e9 in uhd::transport::usb_device_handle::get_device_list (
    vid=<optimized out>, pid=<optimized out>)
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:468
#4  0x00007f7ecf10bf59 in b100_find (hint=...)
    at /opt/gnuradio/v3.8/src/uhd/host/lib/usrp/b100/b100_impl.cpp:69
#5  0x00007f7ecf0213c0 in std::_Function_handler<std::vector<uhd::device_addr_t, std::allocator<uhd::device_addr_t> > (uhd::device_addr_t const&), std::vector<uhd::device_addr_t, std::allocator<uhd::device_addr_t> > (*)(uhd::device_addr_t const&)>::_M_invoke(std::_Any_data const&, uhd::device_

Adding a debug message like the following prints the session pointer. When a seg fault occurs, the pointer would print as "0x0". Normally, when not seg faulting, it would show a larger, "proper" looking pointer value.

{
public:
    libusb_device_list_impl(void)
    {
        libusb::session::sptr sess = libusb::session::get_global_session();
        UHD_LOGGER_DEBUG("JIMDEBUG") << "Global Session Pointer: " << sess;
        sess->get_context();

To make the problem occur much more frequently, you can add something that take time after line 102 in libusb1_base.cpp. If I print a log message as follows, the seg faults occur almost every time uhd_find_devices is run:

// not expired -> get existing session
if (not global_session.expired()){
   UHD_LOGGER_DEBUG("JIMDEBUG")
           << "Using old GS pointer.";
   return global_session.lock();
}

Potential Fix

I modified lines 102 and 103 and changed them from:

if (not global_session.expired())
   return global_session.lock();

to

if (auto g_session_ptr = global_session.lock())
    return g_session_ptr;

After rebuilding with this change, we no longer see any seg faults (with multiple hosts running the uhd_find_devices loop for several days). I believe this fix creates a shared pointer as it checks for session expiration, which maintains ownership and prevents session expiration until "get_global_session" returns (assuming the session hadn't already expired prior to calling global_session.lock()). I don't know if this is the most appropriate fix, as I'm not even close to an expert in this kind of thing.

Thanks,
Jim

@jpalladino
Copy link
Author

jpalladino commented Jan 20, 2023

In the original post, I noted the issue on 4.1.0.5. I'm just confirming that this issue is still present in UHD 4.3.0.0. However, the potential fix I posted above doesn't seem to help. I'm still getting occasional seg faults. Using UHD 4.3.0.0 on Ubuntu 20.04, the output of uhd_find_devices when it seg faults looks like:

[INFO] [UHD] linux; GNU C++ version 9.4.0; Boost_107100; UHD_ libusb: debug [libusb_get_device_descriptor] Segmentation fault (core dumped)

Thanks,
Jim

@mbr0wn
Copy link
Contributor

mbr0wn commented May 26, 2025

This clearly fell under our radar, despite the fantastic bug report. As @jcmartin noticed in #860, there is another place in the session management where we did the incorrect expired()->lock() sequence. With the merging of #860, this should be history.

@joergho joergho closed this as completed May 27, 2025
@joergho
Copy link
Contributor

joergho commented May 27, 2025

The fix will be in master soon.

@mbr0wn
Copy link
Contributor

mbr0wn commented May 27, 2025

FYI, I changed some minor formatting and added some info to the commit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants