Skip to content

[lldb] Change lldb's breakpoint handling behavior #96260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Next Next commit
[lldb] Change lldb's breakpoint handling behavior
lldb today has two rules:  When a thread stops at a BreakpointSite,
we set the thread's StopReason to be "breakpoint hit" (regardless
if we've actually hit the breakpoint, or if we've merely stopped
*at* the breakpoint instruction/point and haven't tripped it yet).
And second, when resuming a process, any thread sitting at a
BreakpointSite is silently stepped over the BreakpointSite -- because
we've already flagged the breakpoint hit when we stopped there
originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint.  When we resume, we only silently step past a
BreakpointSite that we've registered as hit.  We preserve this state
across inferior function calls that the user may do while stopped,
etc.

Also, when a user adds a new breakpoint at $pc while stopped, or
changes $pc to be the address of a BreakpointSite, we will silently
step past that breakpoint when the process resumes.  This is purely
a UX call, I don't think there's any person who wants to set a
breakpoint at $pc and then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is
necessary:  If you're stopped at a BreakpointSite that has not yet
executed, you `stepi`, you will hit the breakpoint and the pc will
not yet advance.  This thread has not completed its stepi, and the
thread plan is still on the stack.  If you then `continue` the
thread, lldb will now stop and say, "instruction step completed",
one instruction past the BreakpointSite.  You can continue a second
time to resume execution.  I discussed this with Jim, and trying
to paper over this behavior will lead to more complicated scenarios
behaving non-intuitively.  And mostly it's the testsuite that was
trying to instruction step past a breakpoint and getting thrown off
-- and I changed those tests to expect the new behavior.

The bugs driving this change are all from lldb dropping the real
stop reason for a thread and setting it to breakpoint-hit when that
was not the case.  Jim hit one where we have an aarch64 watchpoint
that triggers one instruction before a BreakpointSite.  On this
arch we are notified of the watchpoint hit after the instruction
has been unrolled -- we disable the watchpoint, instruction step,
re-enable the watchpoint and collect the new value.  But now we're
on a BreakpointSite so the watchpoint-hit stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282
we attach to/launch a process with the pc at a BreakpointSite and
misbehave.  Caroline Tice mentioned it is also a problem they've
had with putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control
is that

1. If we've stopped at a BreakpointSite (whether we hit it or not),
we call Thread::SetThreadStoppedAtBreakpointSite(pc) to record the
state at the point when the thread stopped.  (so we can detect
newly-added breakpoints, or when the pc is changed to an instruction
that is a BreakpointSite)

2. When we have actually hit a breakpoint, and it is enabled for
this thread, we call Thread::SetThreadHitBreakpointAtAddr(pc) so
we know that it should be silently stepped past when we resume
execution.

When resuming, we silently step over a breakpoint if we've hit it,
or if it is newly added (or the pc was changed to an existing
BreakpointSite).

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason.  The Mach exception
codes differ in a few places depending on the target (unambiguously),
and I didn't want to duplicate the new code for each target so I've
tested what mach exceptions we get for each action on each target,
and reorganized StopInfoMachException::CreateStopReasonWithMachException
to document these possible values, and handle them without specializing
based on the target arch.

rdar://123942164
  • Loading branch information
jasonmolenda committed Jun 21, 2024
commit 9b541e6a035635e26c6a24eca022de8552fa4c17
29 changes: 29 additions & 0 deletions lldb/include/lldb/Target/Thread.h
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,11 @@ class Thread : public std::enable_shared_from_this<Thread>,
register_backup_sp; // You need to restore the registers, of course...
uint32_t current_inlined_depth;
lldb::addr_t current_inlined_pc;
lldb::addr_t
hit_bp_at_addr; // Set to the address of a breakpoint that we have hit.
lldb::addr_t bpsite_at_stop_pc; // Set to the address of a breakpoint
// instruction that we have not yet hit, but
// will hit when we resume.
};

/// Constructor
Expand Down Expand Up @@ -377,6 +382,19 @@ class Thread : public std::enable_shared_from_this<Thread>,

virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}

/// When a thread has executed/trapped a breakpoint, set the address of that
/// breakpoint so we know it has been hit already, and should be silently
/// stepped past on resume.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unclear as to why do we need to store this separately. Shouldn't this already be stored in the stop reason of the thread (i.e., StopInfoBreakpoint implies we've hit a breakpoint, and the breakpoint site within it should give us the PC value)?

void SetThreadHitBreakpointAtAddr(lldb::addr_t pc) { m_hit_bp_at_addr = pc; }

/// When a thread stops at a breakpoint instruction/address, but has not yet
/// executed/triggered it, record that so we can detect when a user adds a
/// breakpoint (or changes a thread to a breakpoint site) and we need to
/// silently step past that when resuming.
void SetThreadStoppedAtBreakpointSite(lldb::addr_t pc) {
m_bpsite_at_stop_pc = pc;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also finding it hard to wrap my head around the meaning of this variable. If I understand correctly it tells us: the pc that we've stopped at; that there was a breakpoint site there at the time of the stop; and we did not hit that site.

I'm wondering if it would be clearer if we unpacked that. What if we:

  • unconditionally stored the PC of the last stop. Maybe this could be even a part of the StopInfo class, as I think it could be useful to see the PC value at the time of the stop, even if the user changed the PC afterwards.
  • a flag indicating whether we stopped at a breakpoint site, regardless of whether we've hit it or not (per the previous comment, that could be indicated by the stop reason). This doesn't really look like it belongs to StopInfo class, but I think I'd be fine with putting it there for collocation purposes.


/// Whether this Thread already has all the Queue information cached or not
///
/// A Thread may be associated with a libdispatch work Queue at a given
Expand Down Expand Up @@ -1311,6 +1329,17 @@ class Thread : public std::enable_shared_from_this<Thread>,
bool m_should_run_before_public_stop; // If this thread has "stop others"
// private work to do, then it will
// set this.
lldb::addr_t m_hit_bp_at_addr; // If this thread originally stopped at a
// breakpoint instruction, AND HIT IT,
// record the address of that breakpoint.
// LLDB_INVALID_ADDRESS if this thread did
// not stop at a breakpoint insn, or did not
// hit it yet.
lldb::addr_t m_bpsite_at_stop_pc; // If this thread originally stopped at a
// breakpoint site, record the address of
// that breakpoint site.
// LLDB_INVALID_ADDRESS if this thread did
// not stop at a breakpoint site.
const uint32_t m_index_id; ///< A unique 1 based index assigned to each thread
/// for easy UI/command line access.
lldb::RegisterContextSP m_reg_context_sp; ///< The register context for this
Expand Down
296 changes: 118 additions & 178 deletions lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -488,38 +488,6 @@ const char *StopInfoMachException::GetDescription() {
return m_description.c_str();
}

static StopInfoSP GetStopInfoForHardwareBP(Thread &thread, Target *target,
uint32_t exc_data_count,
uint64_t exc_sub_code,
uint64_t exc_sub_sub_code) {
// Try hardware watchpoint.
if (target) {
// The exc_sub_code indicates the data break address.
WatchpointResourceSP wp_rsrc_sp =
target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
(addr_t)exc_sub_code);
if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
return StopInfo::CreateStopReasonWithWatchpointID(
thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
}
}

// Try hardware breakpoint.
ProcessSP process_sp(thread.GetProcess());
if (process_sp) {
// The exc_sub_code indicates the data break address.
lldb::BreakpointSiteSP bp_sp =
process_sp->GetBreakpointSiteList().FindByAddress(
(lldb::addr_t)exc_sub_code);
if (bp_sp && bp_sp->IsEnabled()) {
return StopInfo::CreateStopReasonWithBreakpointSiteID(thread,
bp_sp->GetID());
}
}

return nullptr;
}

#if defined(__APPLE__)
const char *
StopInfoMachException::MachException::Name(exception_type_t exc_type) {
Expand Down Expand Up @@ -633,171 +601,143 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
}
break;

// [exc_type, exc_code, exc_sub_code, exc_sub_sub_code]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table is great, but it deserves a line saying what it is.

//
// Instruction step:
// [6, 1, 0]
// Intel KDP [6, 3, ??]
// armv7 [6, 0x102, <stop-pc>] Same as software breakpoint!
//
// Software breakpoint:
// x86 [6, 2, 0]
// Intel KDP [6, 2, <bp-addr + 1>]
// arm64 [6, 1, <bp-addr>]
// armv7 [6, 0x102, <bp-addr>] Same as instruction step!
//
// Hardware breakpoint:
// x86 [6, 1, <bp-addr>, 0]
// x86/Rosetta not implemented, see software breakpoint
// arm64 [6, 1, <bp-addr>]
// armv7 not implemented, see software breakpoint
//
// Hardware watchpoint:
// x86 [6, 1, <accessed-addr>, 0] (both Intel hw and Rosetta)
// arm64 [6, 0x102, <accessed-addr>, 0]
// armv7 [6, 0x102, <accessed-addr>, 0]
//
// arm64 BRK instruction (imm arg not reflected in the ME)
// [ 6, 1, <addr-of-BRK-insn>]
//
// In order of codes mach exceptions:
// [6, 1, 0] - instruction step
// [6, 1, <bp-addr>] - hardware breakpoint or watchpoint
//
// [6, 2, 0] - software breakpoint
// [6, 2, <bp-addr + 1>] - software breakpoint
//
// [6, 3] - instruction step
//
// [6, 0x102, <stop-pc>] armv7 instruction step
// [6, 0x102, <bp-addr>] armv7 software breakpoint
// [6, 0x102, <accessed-addr>, 0] arm64/armv7 watchpoint
case 6: // EXC_BREAKPOINT
{
bool is_actual_breakpoint = false;
bool is_trace_if_actual_breakpoint_missing = false;
switch (cpu) {
case llvm::Triple::x86:
case llvm::Triple::x86_64:
if (exc_code == 1) // EXC_I386_SGL
{
if (!exc_sub_code) {
// This looks like a plain trap.
// Have to check if there is a breakpoint here as well. When you
// single-step onto a trap, the single step stops you not to trap.
// Since we also do that check below, let's just use that logic.
is_actual_breakpoint = true;
is_trace_if_actual_breakpoint_missing = true;
} else {
if (StopInfoSP stop_info =
GetStopInfoForHardwareBP(thread, target, exc_data_count,
exc_sub_code, exc_sub_sub_code))
return stop_info;
}
} else if (exc_code == 2 || // EXC_I386_BPT
exc_code == 3) // EXC_I386_BPTFLT
{
// KDP returns EXC_I386_BPTFLT for trace breakpoints
if (exc_code == 3)
is_trace_if_actual_breakpoint_missing = true;

is_actual_breakpoint = true;
if (!pc_already_adjusted)
pc_decrement = 1;
}
break;
bool stopped_by_hitting_breakpoint = false;
bool stopped_by_completing_stepi = false;
bool stopped_watchpoint = false;
std::optional<addr_t> value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name for this than "value" - that's pretty generic.


// exc_code 1
if (exc_code == 1 && exc_sub_code == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite our fear of indentation, since the two if's exhaust the exe_code == 1 possibilities, this would be clearer as:

if (exc_code == 1) {
if (exc_sub_code == 0) {
} else {
}
}

stopped_by_completing_stepi = true;
if (exc_code == 1 && exc_sub_code != 0) {
stopped_by_hitting_breakpoint = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the face of it, it seems weird that you are asserting that this is both a "breakpoint" and a "watchpoint" hit. Later on you'll tell the difference based on the exe_sub_code stored in value. This would be easier to read if there were a comment explaining that.

stopped_watchpoint = true;
value = exc_sub_code;
}

case llvm::Triple::arm:
case llvm::Triple::thumb:
if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
{
// LWP_TODO: We need to find the WatchpointResource that matches
// the address, and evaluate its Watchpoints.

// It's a watchpoint, then, if the exc_sub_code indicates a
// known/enabled data break address from our watchpoint list.
lldb::WatchpointSP wp_sp;
if (target)
wp_sp = target->GetWatchpointList().FindByAddress(
(lldb::addr_t)exc_sub_code);
if (wp_sp && wp_sp->IsEnabled()) {
return StopInfo::CreateStopReasonWithWatchpointID(thread,
wp_sp->GetID());
} else {
is_actual_breakpoint = true;
is_trace_if_actual_breakpoint_missing = true;
}
} else if (exc_code == 1) // EXC_ARM_BREAKPOINT
{
is_actual_breakpoint = true;
is_trace_if_actual_breakpoint_missing = true;
} else if (exc_code == 0) // FIXME not EXC_ARM_BREAKPOINT but a kernel
// is currently returning this so accept it
// as indicating a breakpoint until the
// kernel is fixed
{
is_actual_breakpoint = true;
is_trace_if_actual_breakpoint_missing = true;
}
break;
// exc_code 2
if (exc_code == 2 && exc_sub_code == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about nesting here.

stopped_by_hitting_breakpoint = true;
if (exc_code == 2 && exc_sub_code != 0) {
stopped_by_hitting_breakpoint = true;
// Intel KDP software breakpoint
if (!pc_already_adjusted)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you need to consult the adjust_pc_if_needed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set pc_decrement here and later it's applied only if adjust_pc_if_needed (although that one doesn't look at pc_already_adjusted - this is so confusingly implemented. I carried all of this logic over from the original, and didn't test x86 KDP debugging so I was reticent to change much.)

pc_decrement = 1;
}

case llvm::Triple::aarch64_32:
case llvm::Triple::aarch64: {
// xnu describes three things with type EXC_BREAKPOINT:
//
// exc_code 0x102 [EXC_ARM_DA_DEBUG], exc_sub_code addr-of-insn
// Watchpoint access. exc_sub_code is the address of the
// instruction which trigged the watchpoint trap.
// debugserver may add the watchpoint number that was triggered
// in exc_sub_sub_code.
//
// exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code 0
// Instruction step has completed.
//
// exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code address-of-instruction
// Software breakpoint instruction executed.

if (exc_code == 1 && exc_sub_code == 0) // EXC_ARM_BREAKPOINT
{
// This is hit when we single instruction step aka MDSCR_EL1 SS bit 0
// is set
is_actual_breakpoint = true;
is_trace_if_actual_breakpoint_missing = true;
if (thread.GetTemporaryResumeState() != eStateStepping)
not_stepping_but_got_singlestep_exception = true;
}
if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
{
// LWP_TODO: We need to find the WatchpointResource that matches
// the address, and evaluate its Watchpoints.

// It's a watchpoint, then, if the exc_sub_code indicates a
// known/enabled data break address from our watchpoint list.
lldb::WatchpointSP wp_sp;
if (target)
wp_sp = target->GetWatchpointList().FindByAddress(
(lldb::addr_t)exc_sub_code);
if (wp_sp && wp_sp->IsEnabled()) {
return StopInfo::CreateStopReasonWithWatchpointID(thread,
wp_sp->GetID());
}
// EXC_ARM_DA_DEBUG seems to be reused for EXC_BREAKPOINT as well as
// EXC_BAD_ACCESS
if (thread.GetTemporaryResumeState() == eStateStepping)
return StopInfo::CreateStopReasonToTrace(thread);
// exc_code 3
if (exc_code == 3)
stopped_by_completing_stepi = true;

// exc_code 0x102
if (exc_code == 0x102 && exc_sub_code != 0) {
if (cpu == llvm::Triple::arm || cpu == llvm::Triple::thumb) {
stopped_by_hitting_breakpoint = true;
stopped_by_completing_stepi = true;
}
// It looks like exc_sub_code has the 4 bytes of the instruction that
// triggered the exception, i.e. our breakpoint opcode
is_actual_breakpoint = exc_code == 1;
break;
stopped_watchpoint = true;
value = exc_sub_code;
}

default:
break;
}
// Go through the reasons why we stopped, starting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this comment above where you start gathering these settings. That's where you are actually setting more than one seemingly incompatible option, so it would be better to explain it there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can say here "Now we're disambiguating..."

// with the easiest to detect unambiguously. We
// may have multiple possible reasons set.

if (is_actual_breakpoint) {
if (stopped_by_hitting_breakpoint) {
ProcessSP process_sp(thread.GetProcess());
RegisterContextSP reg_ctx_sp(thread.GetRegisterContext());
addr_t pc = reg_ctx_sp->GetPC() - pc_decrement;

ProcessSP process_sp(thread.CalculateProcess());

lldb::BreakpointSiteSP bp_site_sp;
if (process_sp)
BreakpointSiteSP bp_site_sp;
addr_t pc = LLDB_INVALID_ADDRESS;
if (reg_ctx_sp)
Copy link
Collaborator

@jimingham jimingham Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this logic confusing. At the start of this function, if you had a register context, you calculated the bp_site_sp from the register context pc, but not using the decrement. Now you do it again, but prioritizing using the value's version of the pc, and overwriting the bp_site_sp that was found at the beginning of the function. But if you don't have a value, you either use the breakpoint site you calculated above w/o the decrement if that was found, or look it up with the decrement here.
This at least needs a comment explaining what you are doing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could happen with x86 KDP debugging (where this pc-decrement behavior exists) where we have a breakpoint on instruction 0x100 which was originally a 1-byte instruction, and also a breakpoint on the instruction at 0x101, and we're really stopped at the first breakpoint. We'd falsely mark that we're stopped at an unexecuted breakpoint (0x101) at the top of the method. Then we find we've actually hit the breakpoint at 0x100 but it's reported as pc 0x101, we decrement that, find it's 0x100 and mark this as SetThreadHitBreakpointSite.

I've added a comment at the initial calling of SetThreadStoppedAtUnexecutedBP(pc) to explain how this will be overridden if it turns out to have been wrong.

pc = reg_ctx_sp->GetPC() - pc_decrement;
else if (value)
pc = *value;

if (value)
bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(*value);
if (!bp_site_sp && reg_ctx_sp) {
bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(pc);
}
if (bp_site_sp && bp_site_sp->IsEnabled()) {
// Update the PC if we were asked to do so, but only do so if we find
// a breakpoint that we know about cause this could be a trap
// instruction in the code
if (pc_decrement > 0 && adjust_pc_if_needed)
reg_ctx_sp->SetPC(pc);

// If the breakpoint is for this thread, then we'll report the hit,
// but if it is for another thread, we can just report no reason. We
// don't need to worry about stepping over the breakpoint here, that
// will be taken care of when the thread resumes and notices that
// there's a breakpoint under the pc. If we have an operating system
// plug-in, we might have set a thread specific breakpoint using the
// operating system thread ID, so we can't make any assumptions about
// the thread ID so we must always report the breakpoint regardless
// of the thread.
// If we have an operating system plug-in, we might have set a thread
// specific breakpoint using the operating system thread ID, so we
// can't make any assumptions about the thread ID so we must always
// report the breakpoint regardless of the thread.
if (bp_site_sp->ValidForThisThread(thread) ||
thread.GetProcess()->GetOperatingSystem() != nullptr)
thread.GetProcess()->GetOperatingSystem() != nullptr) {
// Update the PC if we were asked to do so, but only do so if we find
// a breakpoint that we know about cause this could be a trap
// instruction in the code
if (pc_decrement > 0 && adjust_pc_if_needed && reg_ctx_sp)
reg_ctx_sp->SetPC(pc);
thread.SetThreadHitBreakpointAtAddr(pc);
return StopInfo::CreateStopReasonWithBreakpointSiteID(
thread, bp_site_sp->GetID());
else if (is_trace_if_actual_breakpoint_missing)
return StopInfo::CreateStopReasonToTrace(thread);
else
} else {
thread.SetThreadHitBreakpointAtAddr(pc);
return StopInfoSP();
}
}
}

// Don't call this a trace if we weren't single stepping this thread.
if (is_trace_if_actual_breakpoint_missing &&
thread.GetTemporaryResumeState() == eStateStepping) {
return StopInfo::CreateStopReasonToTrace(thread);
if (stopped_watchpoint && value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we've dealt with the case of a hit breakpoint. So now we're looking for watchpoints or step-i's. A comment to that effect here would make the sections of this analysis easier to follow.

WatchpointResourceSP wp_rsrc_sp =
target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
*value);
if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
return StopInfo::CreateStopReasonWithWatchpointID(
thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
}
}

if (stopped_by_completing_stepi) {
if (thread.GetTemporaryResumeState() != eStateStepping)
not_stepping_but_got_singlestep_exception = true;
else
return StopInfo::CreateStopReasonToTrace(thread);
}

} break;

case 7: // EXC_SYSCALL
Expand Down
Loading
Loading