Skip to content

Fixes #615 Remove expired() to remove race condition #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jcmartin
Copy link
Contributor

Pull Request Details

Presently, there is a race condition where the pointer may be freed
between calling a weak pointer expired() and then calling lock()
results in a null pointer being returned. This results in a segfault in the
function expecting a non null function.

Description

This change was required because this race condition did come up in
practice resulting in segfaults. The following is one
such segfault in libusb_device_list_impl(), which expects a pointer
to the session to be non null.

Code expecting a non-null session (sess):

        libusb::session::sptr sess = libusb::session::get_global_session();

        // allocate a new list of devices
        libusb_device** dev_list;
        ssize_t ret = libusb_get_device_list(sess->get_context(), &dev_list);

The stack trace:

* thread #99, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000123c1aff4 libuhd.4.8.0.dylib`libusb_device_list_impl::libusb_device_list_impl() + 68
    frame #1: 0x0000000123c18678 libuhd.4.8.0.dylib`uhd::transport::usb_device_handle::get_device_list(std::__1::vector<std::__1::pair<unsigned short, unsigned short>, std::__1::allocator<std::__1::pair<unsigned short, unsigned short> > > const&) + 64
    frame #2: 0x0000000123c1860c libuhd.4.8.0.dylib`uhd::transport::usb_device_handle::get_device_list(unsigned short, unsigned short) + 76
    frame #3: 0x0000000123a53d0c libuhd.4.8.0.dylib`b100_find(uhd::device_addr_t const&) + 1380
    frame #4: 0x0000000123c7e658 libuhd.4.8.0.dylib`std::__1::__async_assoc_state<std::__1::vector<uhd::device_addr_t, std::__1::allocator<uhd::device_addr_t> >, std::__1::__async_func<uhd::device::find(uhd::device_addr_t const&, uhd::device::device_filter_t)::$_0> >::__execute() + 56
    frame #5: 0x0000000123c7e908 libuhd.4.8.0.dylib`void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<std::__1::vector<uhd::device_addr_t, std::__1::allocator<uhd::device_addr_t> >, std::__1::__async_func<uhd::device::find(uhd::device_addr_t const&, uhd::device::device_filter_t)::$_0> >::*)(), std::__1::__async_assoc_state<std::__1::vector<uhd::device_addr_t, std::__1::allocator<uhd::device_addr_t> >, std::__1::__async_func<uhd::device::find(uhd::device_addr_t const&, uhd::device::device_filter_t)::$_0> >*> >(void*) + 72
    frame #6: 0x000000018be62f94 libsystem_pthread.dylib`_pthread_start + 136

The following is the exact location (->) where the segfault occurs in the assembly.
You will note that the assembly line corresponds to the dereference of the sess
pointer.

    0x123c1afe0 <+48>:  str    xzr, [x19, #0x8]!
    0x123c1afe4 <+52>:  stp    xzr, xzr, [x0, #0x10]
    0x123c1afe8 <+56>:  add    x8, sp, #0x20
    0x123c1afec <+60>:  bl     0x123c173b4               ; uhd::transport::libusb::session::get_global_session()
    0x123c1aff0 <+64>:  ldr    x0, [sp, #0x20]
->  0x123c1aff4 <+68>:  ldr    x8, [x0]
    0x123c1aff8 <+72>:  ldr    x8, [x8, #0x10]
    0x123c1affc <+76>:  blr    x8
    0x123c1b000 <+80>:  add    x1, sp, #0x18
    0x123c1b004 <+84>:  bl     0x123caf548               ; symbol stub for: libusb_get_device_list
    0x123c1b008 <+88>:  mov    x21, x0
    0x123c1b00c <+92>:  tbnz   x0, #0x3f, 0x123c1b110    ; <+352>
    0x123c1b010 <+96>:  cbz    x21, 0x123c1b0b4          ; <+260>
    0x123c1b014 <+100>: mov    x23, #0x0
    0x123c1b018 <+104>: adrp   x24, 783
    0x123c1b01c <+108>: add    x24, x24, #0xd18          ; vtable for std::__1::__shared_ptr_pointer<libusb_device_impl*, std::__1::shared_ptr<uhd::transport::libusb::device>::__shared_ptr_default_delete<uhd::transport::libusb::device, libusb_device_impl>, std::__1::allocator<libusb_device_impl> > + 16
    0x123c1b020 <+112>: mov    x25, #-0x1
    0x123c1b024 <+116>: b      0x123c1b03c               ; <+140>
    0x123c1b028 <+120>: stp    x22, x0, [x8], #0x10
    0x123c1b02c <+124>: str    x8, [x20, #0x10]
    0x123c1b030 <+128>: add    x23, x23, #0x1
    0x123c1b034 <+132>: cmp    x21, x23
    0x123c1b038 <+136>: b.eq   0x123c1b0b4               ; <+260>
    0x123c1b03c <+140>: mov    w0, #0x20
    0x123c1b040 <+144>: bl     0x123caf230               ; symbol stub for: operator new(unsigned long)
    0x123c1b044 <+148>: mov    x22, x0
    0x123c1b048 <+152>: ldr    x8, [sp, #0x18]
    0x123c1b04c <+156>: ldr    x1, [x8, x23, lsl #3]
    0x123c1b050 <+160>: bl     0x123c1b35c               ; libusb_device_impl::libusb_device_impl(libusb_device*)

Changes were made in accordance to how CPP reference suggests dealing
with weak pointers: https://en.cppreference.com/w/cpp/memory/weak_ptr/lock

Related Issue

#615

Which devices/areas does this affect?

This affects all devices as the code for checking a USRP exists
is not limited to if one has the USRP. I have managed to get it
to segfault without any device connected at all!

Testing Done

Since this is a race condition, one needs to luck out with the timing
such that the pointer is freed between expired() and lock(). The
error could occur in 5 minutes or 5 hours.

To test the change, I repeatedly called uhd_usrp_find for severals
without any segfaults.

Checklist

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project. See CODING.md.
  • N/A I have updated the documentation accordingly.
  • I have added tests to cover my changes, and all previous tests pass.
  • N/AI have checked all compat numbers if they need updating (FPGA compat,
    MPM compat, noc_shell, specific RFNoC block, ...)

Copy link
Contributor

@mbr0wn mbr0wn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic, thanks @jcmartin. Looks like the newer parts have a correct handling, but I prefer your notation, also making it consistent, as well as the link to cppreference.com.

@joergho joergho closed this May 27, 2025
@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants