[lldb-dap] Refactor lldb-dap event handling. #139669

ashgti · 2025-05-13T06:10:22Z

Refactor lldp-dap to use a lldb_private::MainLoop to handle events and protocol messages.

This should ensure more uniform handling of debugger state by ensuring only one action is being performed at a time. The MainLoop handles both protocol messages and events. As a result, we can remove some API locks and mutex locks that are not needed.

Refactor lldp-dap to use a `lldb_private::MainLoop` to handle events and protocol messages. This should ensure more uniform handling of debugger state by ensuring only one action is being performed at a time. The MainLoop handles both protocol messages and events. As a result, we can remove some API locks and mutex locks that are not needed.

ashgti · 2025-05-13T06:14:11Z

The tests are passing for me locally on macOS, but I need to check Linux at least before this is fully ready for review. I'll let the CI job run and go from there.

JDevlieghere · 2025-05-13T06:18:43Z

I added @labath as a reviewer since he's the MainLoop architect and because we briefly discussed this at EuroLLVM.

I only glanced at this and I'll take a more in-depth look tomorrow. One thing I remember form reading the MainLoop docs is that the class itself is not thread safe, so I think we may need our own synchronization when adding callbacks. This applies more broadly, but especially this patch might be worth building this with TSan and seeing if that catches anything.

labath · 2025-05-13T10:08:04Z

I also haven't looked at it in detail yet, but I think it makes sense overall. I'm not saying it has to be done -- it depends on where we want to take this. Doing everything on one thread makes it easy to avoid races and it's a strategy I generally like, but it also kind of goes against some of our other goals of making things faster by doing work in parallel. But then again, it's only "kind of" since it's still possible to introduce more controlled parallelism to -- I don't know -- fetch inferior threads in parallel if it makes sense. And that may actually be better in the long run.

Since I'm not going to be doing this work, I'll let you figure out the direction here.

I think the patch could use some splitting up. There's a bunch of typo fixes and other things that could go separately. And what's up with all the test changes -- shouldn't this be "NFC" ?

Jonas is right that class isn't (completely) thread safe. That's because it originally started out as a completely single-process thing -- which means there are no threads to synchronise. It's grown beyond that now though, and now it has some synchronization. Specifically, AddPendingCallback is thread-safe (but not async-signal-safe -- I think that's the part that Jonas is remembering). The thing that's not thread-safe is RegisterReadObject. Right now it's possible to work around that by doing something like loop.AddPendingCallback([&] { loop.RegisterReadObject(); }); but if you find yourself needing to do that, maybe we can add that directly to the MainLoop class.

ashgti · 2025-05-13T15:44:41Z

I think the patch could use some splitting up. There's a bunch of typo fixes and other things that could go separately. And what's up with all the test changes -- shouldn't this be "NFC" ?

I can split out some of the type fixes and some other changes I ended up lumping in here.

This isn't really a NFC because this is forcing some things to be serial that were previously not serial. We previously could trigger an event in the middle of a request handler and now we cannot. That forced some tests to change behavior slightly to figure out some synchronization points in tests.

As an aside, I think a number of our tests in general are underspecified about the expected state. If you look in the logs a number of tests are sending 'continue' events when the process isn't paused, which I think may be contributing to our test stability.

The thing that's not thread-safe is RegisterReadObject. Right now it's possible to work around that by doing something like loop.AddPendingCallback([&] { loop.RegisterReadObject(); }); but if you find yourself needing to do that, maybe we can add that directly to the MainLoop class.

Right now, I'm not actually using any read objects, just using the MainLoop to handle dispatching events. I think we can't use the MainLoop for reading from stdin on Windows yet. If it did work with stdin on Windows then we could use RegisterReadObject for handling events and remove a thread.

labath · 2025-05-13T16:14:10Z

I can split out some of the type fixes and some other changes I ended up lumping in here.

That would be great. Thanks.

This isn't really a NFC because this is forcing some things to be serial that were previously not serial. We previously could trigger an event in the middle of a request handler and now we cannot. That forced some tests to change behavior slightly to figure out some synchronization points in tests.

I believe you, but I don't understand. Since this restricts the set of possible message orderings (does it?), I would naively expect that a test which worked previously (with the wider set of orderings) would continue to work afterwards.

The only way I can imagine this not being true is if the tests weren't 100% correct (i.e., they were flaky), but whereas previously the "bad" case would happen 0.1% of the time, this patch now makes it the 100% case. And it's easier to make the test always expect the (previously) bad case than it is to accept all of the technically legal orderings. Is that's what happening? Maybe an example would help...

Right now, I'm not actually using any read objects, just using the MainLoop to handle dispatching events. I think we can't use the MainLoop for reading from stdin on Windows yet. If it did work with stdin on Windows then we could use RegisterReadObject for handling events and remove a thread.

Yeah, that unfortunately doesn't work right now. I would like to make that work one day, but my priority right now is for making it work with pipes (insomuch as you can call this a priority).

ashgti · 2025-05-13T18:09:24Z

I believe you, but I don't understand. Since this restricts the set of possible message orderings (does it?), I would naively expect that a test which worked previously (with the wider set of orderings) would continue to work afterwards.

The only way I can imagine this not being true is if the tests weren't 100% correct (i.e., they were flaky), but whereas previously the "bad" case would happen 0.1% of the time, this patch now makes it the 100% case. And it's easier to make the test always expect the (previously) bad case than it is to accept all of the technically legal orderings. Is that's what happening? Maybe an example would help...

I think our tests are not fully specifying their expected state. For example, lldb/test/API/tools/lldb-dap/console/TestDAP_console.py TestDAP_console.test_diagnositcs was performing an evaluate and then using get_important to fetch output events with category 'important'.

With the change, the important output was always coming after the request was handled. I updated the test to use a self.collect_important instead of get_important. Previously, this could have been emitted while the request was being handled, but now the output would only ever be emitted after the request handler is finished.

ashgti · 2025-05-13T23:28:06Z

Thinking about this some more, would it be possible to treat everything as an SBEvent? We could use the DAP::broadcaster to create events for the DAP protocol messages we receive, right? Then when we listen for events we'd just need to check if the event was a DAPBroadcasterBits::eBroadcastBitProtocolMessage as the event type and handle protocol messages in the same loop we're handling events.

This would have a similar effect as using a MainLoop but unifies the event thread with the DAP handler. We'd still need to transport thread to parse messages and add events to the broadcaster.

labath · 2025-05-14T09:02:32Z

I think our tests are not fully specifying their expected state. For example, lldb/test/API/tools/lldb-dap/console/TestDAP_console.py TestDAP_console.test_diagnositcs was performing an evaluate and then using get_important to fetch output events with category 'important'.

With the change, the important output was always coming after the request was handled. I updated the test to use a self.collect_important instead of get_important. Previously, this could have been emitted while the request was being handled, but now the output would only ever be emitted after the request handler is finished.

Okay, so it's kind of what I said, right? The "important" output was coming before the message was handled most of the time, although it wasn't guaranteed, and the test could fail if that happens. Now, it always comes too late.

If that's true, then the test change to use self.collect_important could be made before the functionality change, right?

Thinking about this some more, would it be possible to treat everything as an SBEvent? We could use the DAP::broadcaster to create events for the DAP protocol messages we receive, right? Then when we listen for events we'd just need to check if the event was a DAPBroadcasterBits::eBroadcastBitProtocolMessage as the event type and handle protocol messages in the same loop we're handling events.

This would have a similar effect as using a MainLoop but unifies the event thread with the DAP handler. We'd still need to transport thread to parse messages and add events to the broadcaster.

Yeah, that should be possible, and it may make more sense in a world where the MainLoop cannot listen on all FD types (since you need the forwarding thread anyway). It's unfortunate that there's no synchronization operation (at least, not a portable one, FUTEX_FD seems kinda nice) that allows you do wait for condition variables and FDs, necessitating these forwarding threads. Since forwarding would add a bit of latency, one of the factors would be which kinds of operations do we want to make slower.

ashgti · 2025-05-14T17:59:11Z

Yeah, that should be possible, and it may make more sense in a world where the MainLoop cannot listen on all FD types (since you need the forwarding thread anyway). It's unfortunate that there's no synchronization operation (at least, not a portable one, FUTEX_FD seems kinda nice) that allows you do wait for condition variables and FDs, necessitating these forwarding threads. Since forwarding would add a bit of latency, one of the factors would be which kinds of operations do we want to make slower.

We could adjust the lldb-dap VSCode extension to launch lldb-dap using a named pipe or local tcp port on Windows. Then we'd uniformly be able to support reading the input using the existing lldb NativeFile/Socket helpers. See https://code.visualstudio.com/api/references/vscode-api#DebugAdapterNamedPipeServer

ashgti · 2025-05-15T18:34:53Z

I'm working on splitting out the test changes, #140107 is the main set that I think are related. If so, this may become a more of a NFC from the test perspective.

labath · 2025-05-16T07:29:26Z

Yeah, that should be possible, and it may make more sense in a world where the MainLoop cannot listen on all FD types (since you need the forwarding thread anyway). It's unfortunate that there's no synchronization operation (at least, not a portable one, FUTEX_FD seems kinda nice) that allows you do wait for condition variables and FDs, necessitating these forwarding threads. Since forwarding would add a bit of latency, one of the factors would be which kinds of operations do we want to make slower.

We could adjust the lldb-dap VSCode extension to launch lldb-dap using a named pipe or local tcp port on Windows. Then we'd uniformly be able to support reading the input using the existing lldb NativeFile/Socket helpers. See https://code.visualstudio.com/api/references/vscode-api#DebugAdapterNamedPipeServer

That might be nice. I didn't realize that was an option.

I'm working on splitting out the test changes, #140107 is the main set that I think are related. If so, this may become a more of a NFC from the test perspective.

👍

JDevlieghere requested a review from labath May 13, 2025 06:12

ashgti mentioned this pull request May 13, 2025

[lldb-dap] While handling events, grab the APIMutex for consistency. #139596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb-dap] Refactor lldb-dap event handling. #139669

[lldb-dap] Refactor lldb-dap event handling. #139669

Uh oh!

ashgti commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

JDevlieghere commented May 13, 2025

Uh oh!

labath commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

labath commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

labath commented May 14, 2025

Uh oh!

ashgti commented May 14, 2025

Uh oh!

ashgti commented May 15, 2025

Uh oh!

labath commented May 16, 2025

Uh oh!

Uh oh!

[lldb-dap] Refactor lldb-dap event handling. #139669

Are you sure you want to change the base?

[lldb-dap] Refactor lldb-dap event handling. #139669

Uh oh!

Conversation

ashgti commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

JDevlieghere commented May 13, 2025

Uh oh!

labath commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

labath commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

ashgti commented May 13, 2025

Uh oh!

labath commented May 14, 2025

Uh oh!

ashgti commented May 14, 2025

Uh oh!

ashgti commented May 15, 2025

Uh oh!

labath commented May 16, 2025

Uh oh!

Uh oh!