Skip to content

Support disassembling RISC-V proprietary instructions #145793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tedwoodward
Copy link

RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions in LLDB.

With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt .

Changes in this PR:

  • Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid.

  • Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble.

  • Print out RISC-V opcodes the way llvm-objdump does. Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

  • Print for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing.

  • Update max riscv32 and riscv64 instruction size to 8.

  • Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5

RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
@tedwoodward tedwoodward self-assigned this Jun 25, 2025
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Jun 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 25, 2025

@llvm/pr-subscribers-lldb

Author: None (tedwoodward)

Changes

RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions in LLDB.

With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt .

Changes in this PR:

  • Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid.

  • Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble.

  • Print out RISC-V opcodes the way llvm-objdump does. Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

  • Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing.

  • Update max riscv32 and riscv64 instruction size to 8.

  • Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5


Full diff: https://github.com/llvm/llvm-project/pull/145793.diff

6 Files Affected:

  • (added) lldb/examples/python/filter_disasm.py (+87)
  • (modified) lldb/include/lldb/Core/Opcode.h (+1)
  • (modified) lldb/source/Core/Disassembler.cpp (+11-3)
  • (modified) lldb/source/Core/Opcode.cpp (+38)
  • (modified) lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp (+21-18)
  • (modified) lldb/source/Utility/ArchSpec.cpp (+2-2)
diff --git a/lldb/examples/python/filter_disasm.py b/lldb/examples/python/filter_disasm.py
new file mode 100644
index 0000000000000..adb3455209055
--- /dev/null
+++ b/lldb/examples/python/filter_disasm.py
@@ -0,0 +1,87 @@
+"""
+Defines a command, fdis, that does filtered disassembly. The command does the
+lldb disassemble command with -b and any other arguments passed in, and
+pipes that through a provided filter program.
+
+The intention is to support disassembly of RISC-V proprietary instructions.
+This is handled with llvm-objdump by piping the output of llvm-objdump through
+a filter program. This script is intended to mimic that workflow.
+"""
+
+import lldb
+import subprocess
+
+filter_program = "crustfilt"
+
+def __lldb_init_module(debugger, dict):
+    debugger.HandleCommand(
+        'command script add -f filter_disasm.fdis fdis')
+    print("Disassembly filter command (fdis) loaded")
+    print("Filter program set to %s" % filter_program)
+
+
+def fdis(debugger, args, result, dict):
+    """
+  Call the built in disassembler, then pass its output to a filter program
+  to add in disassembly for hidden opcodes.
+  Except for get and set, use the fdis command like the disassemble command.
+  By default, the filter program is crustfilt, from
+  https://github.com/quic/crustfilt . This can be changed by changing
+  the global variable filter_program.
+
+  Usage:
+    fdis [[get] [set <program>] [<disassembly options>]]
+
+    Choose one of the following:
+        get
+            Gets the current filter program
+
+        set <program>
+            Sets the current filter program. This can be an executable, which
+            will be found on PATH, or an absolute path.
+
+        <disassembly options>
+            If the first argument is not get or set, the args will be passed
+            to the disassemble command as is.
+
+    """
+
+    global filter_program
+    args_list = args.split(' ')
+    result.Clear()
+
+    if len(args_list) == 1 and args_list[0] == 'get':
+        result.PutCString(filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    if len(args_list) == 2 and args_list[0] == 'set':
+        filter_program = args_list[1]
+        result.PutCString("Filter program set to %s" % filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    res = lldb.SBCommandReturnObject()
+    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, res)
+    if (len(res.GetError()) > 0):
+        result.SetError(res.GetError())
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+    output = res.GetOutput()
+
+    try:
+        proc = subprocess.run([filter_program], capture_output=True, text=True, input=output)
+    except (subprocess.SubprocessError, OSError) as e:
+        result.PutCString("Error occurred. Original disassembly:\n\n" + output)
+        result.SetError(str(e))
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+
+    print(proc.stderr)
+    if proc.stderr:
+        pass
+        #result.SetError(proc.stderr)
+        #result.SetStatus(lldb.eReturnStatusFailed)
+    else:
+        result.PutCString(proc.stdout)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
diff --git a/lldb/include/lldb/Core/Opcode.h b/lldb/include/lldb/Core/Opcode.h
index f72f2687b54fe..88ef17093d3f3 100644
--- a/lldb/include/lldb/Core/Opcode.h
+++ b/lldb/include/lldb/Core/Opcode.h
@@ -200,6 +200,7 @@ class Opcode {
   }
 
   int Dump(Stream *s, uint32_t min_byte_width);
+  int DumpRISCV(Stream *s, uint32_t min_byte_width);
 
   const void *GetOpcodeBytes() const {
     return ((m_type == Opcode::eTypeBytes) ? m_data.inst.bytes : nullptr);
diff --git a/lldb/source/Core/Disassembler.cpp b/lldb/source/Core/Disassembler.cpp
index 833e327579a29..f95e446448036 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -658,8 +658,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
       // the byte dump to be able to always show 15 bytes (3 chars each) plus a
       // space
       if (max_opcode_byte_size > 0)
-        m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
-      else
+        // make RISC-V opcode dump look like llvm-objdump
+        if (exe_ctx &&
+            exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV())
+          m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1);
+        else
+          m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+       else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
@@ -685,10 +690,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
     }
   }
   const size_t opcode_pos = ss.GetSizeOfLastLine();
-  const std::string &opcode_name =
+  std::string &opcode_name =
       show_color ? m_markup_opcode_name : m_opcode_name;
   const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
 
+  if (opcode_name.empty())
+    opcode_name = "<unknown>";
+
   // The default opcode size of 7 characters is plenty for most architectures
   // but some like arm can pull out the occasional vqrshrun.s16.  We won't get
   // consistent column spacing in these cases, unfortunately. Also note that we
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index 3e30d98975d8a..dbcd18cc0d8d2 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -78,6 +78,44 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
   return eByteOrderInvalid;
 }
 
+// make RISC-V byte dumps look like llvm-objdump, instead of just dumping bytes
+int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
+  const uint32_t previous_bytes = s->GetWrittenBytes();
+  // if m_type is not bytes, call Dump
+  if (m_type != Opcode::eTypeBytes)
+    return Dump(s, min_byte_width);
+
+  // from RISCVPrettyPrinter in llvm-objdump.cpp
+  // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
+  // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
+  // else fall back and print bytes
+  for (uint32_t i = 0; i < m_data.inst.length;) {
+    if (i > 0)
+      s->PutChar(' ');
+    if (!(m_data.inst.length % 4)) {
+      s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
+                                        m_data.inst.bytes[i + 2],
+                                        m_data.inst.bytes[i + 1],
+                                        m_data.inst.bytes[i + 0]);
+      i += 4;
+    } else if (!(m_data.inst.length % 2)) {
+      s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
+                              m_data.inst.bytes[i + 0]);
+      i += 2;
+    } else {
+      s->Printf("%2.2x", m_data.inst.bytes[i]);
+      ++i;
+    }
+  }
+
+  uint32_t bytes_written_so_far = s->GetWrittenBytes() - previous_bytes;
+  // Add spaces to make sure bytes display comes out even in case opcodes aren't
+  // all the same size.
+  if (bytes_written_so_far < min_byte_width)
+    s->Printf("%*s", min_byte_width - bytes_written_so_far, "");
+  return s->GetWrittenBytes() - previous_bytes;
+}
+
 uint32_t Opcode::GetData(DataExtractor &data) const {
   uint32_t byte_size = GetByteSize();
   uint8_t swap_buf[8];
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index ed6047f8f4ef3..eeb6020abd73a 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -61,6 +61,8 @@ class DisassemblerLLVMC::MCDisasmInstance {
 
   uint64_t GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
                      lldb::addr_t pc, llvm::MCInst &mc_inst) const;
+  bool GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
+                 lldb::addr_t pc, llvm::MCInst &mc_inst, size_t &size) const;
   void PrintMCInst(llvm::MCInst &mc_inst, lldb::addr_t pc,
                    std::string &inst_string, std::string &comments_string);
   void SetStyle(bool use_hex_immed, HexImmediateStyle hex_style);
@@ -524,11 +526,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
           const addr_t pc = m_address.GetFileAddress();
           llvm::MCInst inst;
 
-          const size_t inst_size =
-              mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-          if (inst_size == 0)
-            m_opcode.Clear();
-          else {
+          size_t inst_size = 0;
+          m_is_valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+          m_opcode.Clear();
+          if (inst_size != 0) {
             m_opcode.SetOpcodeBytes(opcode_data, inst_size);
             m_is_valid = true;
           }
@@ -604,10 +606,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
         const uint8_t *opcode_data = data.GetDataStart();
         const size_t opcode_data_len = data.GetByteSize();
         llvm::MCInst inst;
-        size_t inst_size =
-            mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-
-        if (inst_size > 0) {
+        size_t inst_size = 0;
+        bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
+                                             inst, inst_size);
+ 
+        if (valid && inst_size > 0) {
           mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
 
           const bool saved_use_color = mc_disasm_ptr->GetUseColor();
@@ -1206,9 +1209,10 @@ class InstructionLLVMC : public lldb_private::Instruction {
     const uint8_t *opcode_data = data.GetDataStart();
     const size_t opcode_data_len = data.GetByteSize();
     llvm::MCInst inst;
-    const size_t inst_size =
-        mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-    if (inst_size == 0)
+    size_t inst_size = 0;
+    const bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+    if (!valid)
       return;
 
     m_has_visited_instruction = true;
@@ -1337,19 +1341,18 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
          m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
 }
 
-uint64_t DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
+bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
     const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
-    llvm::MCInst &mc_inst) const {
+    llvm::MCInst &mc_inst, size_t &size) const {
   llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
   llvm::MCDisassembler::DecodeStatus status;
 
-  uint64_t new_inst_size;
-  status = m_disasm_up->getInstruction(mc_inst, new_inst_size, data, pc,
+  status = m_disasm_up->getInstruction(mc_inst, size, data, pc,
                                        llvm::nulls());
   if (status == llvm::MCDisassembler::Success)
-    return new_inst_size;
+    return true;
   else
-    return 0;
+    return false;
 }
 
 void DisassemblerLLVMC::MCDisasmInstance::PrintMCInst(
diff --git a/lldb/source/Utility/ArchSpec.cpp b/lldb/source/Utility/ArchSpec.cpp
index 70b9800f4dade..7c71aaae6bcf2 100644
--- a/lldb/source/Utility/ArchSpec.cpp
+++ b/lldb/source/Utility/ArchSpec.cpp
@@ -228,9 +228,9 @@ static const CoreDefinition g_core_definitions[] = {
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,
      ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},
 
-    {eByteOrderLittle, 4, 2, 4, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
+    {eByteOrderLittle, 4, 2, 8, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
      "riscv32"},
-    {eByteOrderLittle, 8, 2, 4, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
+    {eByteOrderLittle, 8, 2, 8, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
      "riscv64"},
 
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::loongarch32,

Copy link

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r HEAD~1...HEAD lldb/examples/python/filter_disasm.py
View the diff from darker here.
--- filter_disasm.py	2025-06-25 21:39:02.000000 +0000
+++ filter_disasm.py	2025-06-25 21:54:28.640611 +0000
@@ -11,77 +11,79 @@
 import lldb
 import subprocess
 
 filter_program = "crustfilt"
 
+
 def __lldb_init_module(debugger, dict):
-    debugger.HandleCommand(
-        'command script add -f filter_disasm.fdis fdis')
+    debugger.HandleCommand("command script add -f filter_disasm.fdis fdis")
     print("Disassembly filter command (fdis) loaded")
     print("Filter program set to %s" % filter_program)
 
 
 def fdis(debugger, args, result, dict):
     """
-  Call the built in disassembler, then pass its output to a filter program
-  to add in disassembly for hidden opcodes.
-  Except for get and set, use the fdis command like the disassemble command.
-  By default, the filter program is crustfilt, from
-  https://github.com/quic/crustfilt . This can be changed by changing
-  the global variable filter_program.
+    Call the built in disassembler, then pass its output to a filter program
+    to add in disassembly for hidden opcodes.
+    Except for get and set, use the fdis command like the disassemble command.
+    By default, the filter program is crustfilt, from
+    https://github.com/quic/crustfilt . This can be changed by changing
+    the global variable filter_program.
 
-  Usage:
-    fdis [[get] [set <program>] [<disassembly options>]]
+    Usage:
+      fdis [[get] [set <program>] [<disassembly options>]]
 
-    Choose one of the following:
-        get
-            Gets the current filter program
+      Choose one of the following:
+          get
+              Gets the current filter program
 
-        set <program>
-            Sets the current filter program. This can be an executable, which
-            will be found on PATH, or an absolute path.
+          set <program>
+              Sets the current filter program. This can be an executable, which
+              will be found on PATH, or an absolute path.
 
-        <disassembly options>
-            If the first argument is not get or set, the args will be passed
-            to the disassemble command as is.
+          <disassembly options>
+              If the first argument is not get or set, the args will be passed
+              to the disassemble command as is.
 
     """
 
     global filter_program
-    args_list = args.split(' ')
+    args_list = args.split(" ")
     result.Clear()
 
-    if len(args_list) == 1 and args_list[0] == 'get':
+    if len(args_list) == 1 and args_list[0] == "get":
         result.PutCString(filter_program)
         result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
         return
 
-    if len(args_list) == 2 and args_list[0] == 'set':
+    if len(args_list) == 2 and args_list[0] == "set":
         filter_program = args_list[1]
         result.PutCString("Filter program set to %s" % filter_program)
         result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
         return
 
     res = lldb.SBCommandReturnObject()
-    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, res)
-    if (len(res.GetError()) > 0):
+    debugger.GetCommandInterpreter().HandleCommand("disassemble -b " + args, res)
+    if len(res.GetError()) > 0:
         result.SetError(res.GetError())
         result.SetStatus(lldb.eReturnStatusFailed)
         return
     output = res.GetOutput()
 
     try:
-        proc = subprocess.run([filter_program], capture_output=True, text=True, input=output)
+        proc = subprocess.run(
+            [filter_program], capture_output=True, text=True, input=output
+        )
     except (subprocess.SubprocessError, OSError) as e:
         result.PutCString("Error occurred. Original disassembly:\n\n" + output)
         result.SetError(str(e))
         result.SetStatus(lldb.eReturnStatusFailed)
         return
 
     print(proc.stderr)
     if proc.stderr:
         pass
-        #result.SetError(proc.stderr)
-        #result.SetStatus(lldb.eReturnStatusFailed)
+        # result.SetError(proc.stderr)
+        # result.SetStatus(lldb.eReturnStatusFailed)
     else:
         result.PutCString(proc.stdout)
         result.SetStatus(lldb.eReturnStatusSuccessFinishResult)

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- lldb/include/lldb/Core/Opcode.h lldb/source/Core/Disassembler.cpp lldb/source/Core/Opcode.cpp lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp lldb/source/Utility/ArchSpec.cpp
View the diff from clang-format here.
diff --git a/lldb/source/Core/Disassembler.cpp b/lldb/source/Core/Disassembler.cpp
index f95e44644..e1ba74d22 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -664,7 +664,7 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
           m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1);
         else
           m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
-       else
+      else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
@@ -690,8 +690,7 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
     }
   }
   const size_t opcode_pos = ss.GetSizeOfLastLine();
-  std::string &opcode_name =
-      show_color ? m_markup_opcode_name : m_opcode_name;
+  std::string &opcode_name = show_color ? m_markup_opcode_name : m_opcode_name;
   const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
 
   if (opcode_name.empty())
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index dbcd18cc0..a09db58bb 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -94,13 +94,12 @@ int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
       s->PutChar(' ');
     if (!(m_data.inst.length % 4)) {
       s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
-                                        m_data.inst.bytes[i + 2],
-                                        m_data.inst.bytes[i + 1],
-                                        m_data.inst.bytes[i + 0]);
+                m_data.inst.bytes[i + 2], m_data.inst.bytes[i + 1],
+                m_data.inst.bytes[i + 0]);
       i += 4;
     } else if (!(m_data.inst.length % 2)) {
       s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
-                              m_data.inst.bytes[i + 0]);
+                m_data.inst.bytes[i + 0]);
       i += 2;
     } else {
       s->Printf("%2.2x", m_data.inst.bytes[i]);
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index eeb6020ab..ae780a0a5 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -608,8 +608,8 @@ public:
         llvm::MCInst inst;
         size_t inst_size = 0;
         bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
-                                             inst, inst_size);
- 
+                                              inst, inst_size);
+
         if (valid && inst_size > 0) {
           mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
 
@@ -1341,14 +1341,15 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
          m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
 }
 
-bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
-    const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
-    llvm::MCInst &mc_inst, size_t &size) const {
+bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(const uint8_t *opcode_data,
+                                                    size_t opcode_data_len,
+                                                    lldb::addr_t pc,
+                                                    llvm::MCInst &mc_inst,
+                                                    size_t &size) const {
   llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
   llvm::MCDisassembler::DecodeStatus status;
 
-  status = m_disasm_up->getInstruction(mc_inst, size, data, pc,
-                                       llvm::nulls());
+  status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls());
   if (status == llvm::MCDisassembler::Success)
     return true;
   else

@tedwoodward
Copy link
Author

tedwoodward commented Jun 25, 2025

Before this change, disassembly of the crustfilt test program looks like this:

(lldb) dis -b
riscv_filter_disasm`main:
    0x390 <+0>:  01 11        addi   sp, sp, -0x20
    0x392 <+2>:  06 ce        sw     ra, 0x1c(sp)
    0x394 <+4>:  22 cc        sw     s0, 0x18(sp)
    0x396 <+6>:  00 10        addi   s0, sp, 0x20
    0x398 <+8>:  01 45        li     a0, 0x0
    0x39a <+10>: 23 2a a4 fe  sw     a0, -0xc(s0)
->  0x39e <+14>: 23 28 a4 fe  sw     a0, -0x10(s0)
    0x3a2 <+18>: 13 05 00 02  li     a0, 0x20
    0x3a6 <+22>: 23 26 a4 fe  sw     a0, -0x14(s0)
    0x3aa <+26>: <invalid>           
    0x3ac <+28>: 40 09        addi   s0, sp, 0x94
    0x3ae <+30>: 20 00        addi   s0, sp, 0x8
    0x3b0 <+32>: 20 00        addi   s0, sp, 0x8
    0x3b2 <+34>: <invalid>           
    0x3b4 <+36>: 00 00        unimp  
    0x3b6 <+38>: 00 10        addi   s0, sp, 0x20
    0x3b8 <+40>: <invalid>           
    0x3ba <+42>: <invalid>           
    0x3bc <+44>: <invalid>           
    0x3be <+46>: <invalid>           
    0x3c0 <+48>: <invalid>           
    0x3c2 <+50>: <invalid>           
    0x3c4 <+52>: 03 25 04 ff  lw     a0, -0x10(s0)
    0x3c8 <+56>: 83 25 c4 fe  lw     a1, -0x14(s0)
    0x3cc <+60>: 33 05 b5 02  mul    a0, a0, a1
    0x3d0 <+64>: f2 40        lw     ra, 0x1c(sp)
    0x3d2 <+66>: 62 44        lw     s0, 0x18(sp)
    0x3d4 <+68>: 05 61        addi   sp, sp, 0x20
    0x3d6 <+70>: 82 80        ret    

Note that the instruction at 0x3aa is an 8 byte instruction, but lldb's disassembler is incorrectly treating it as a 2 byte instruction, then incorrectly disassembling the following addresses as 2 byte instructions.

After this change, disassembly looks like this:

(lldb) dis -b
riscv_filter_disasm`main:
    0x390 <+0>:  1101                     addi   sp, sp, -0x20
    0x392 <+2>:  ce06                     sw     ra, 0x1c(sp)
    0x394 <+4>:  cc22                     sw     s0, 0x18(sp)
    0x396 <+6>:  1000                     addi   s0, sp, 0x20
    0x398 <+8>:  4501                     li     a0, 0x0
    0x39a <+10>: fea42a23                 sw     a0, -0xc(s0)
->  0x39e <+14>: fea42823                 sw     a0, -0x10(s0)
    0x3a2 <+18>: 02000513                 li     a0, 0x20
    0x3a6 <+22>: fea42623                 sw     a0, -0x14(s0)
    0x3aa <+26>: 0940003f 00200020        <unknown>
    0x3b2 <+34>: 021f 0000 1000           <unknown>
    0x3b8 <+40>: 084f940b                 <unknown>
    0x3bc <+44>: b8f2                     <unknown>
    0x3be <+46>: 084f940b                 <unknown>
    0x3c2 <+50>: b8f2                     <unknown>
    0x3c4 <+52>: ff042503                 lw     a0, -0x10(s0)
    0x3c8 <+56>: fec42583                 lw     a1, -0x14(s0)
    0x3cc <+60>: 02b50533                 mul    a0, a0, a1
    0x3d0 <+64>: 40f2                     lw     ra, 0x1c(sp)
    0x3d2 <+66>: 4462                     lw     s0, 0x18(sp)
    0x3d4 <+68>: 6105                     addi   sp, sp, 0x20
    0x3d6 <+70>: 8082                     ret    

The instruction at 0x3aa is identified as a 64 bit instruction, the opcode is shown, and the instruction is marked as <unknown>.

The output from fdis looks like this:


riscv_filter_disasm`main:
    0x390 <+0>:  1101                     addi   sp, sp, -0x20
    0x392 <+2>:  ce06                     sw     ra, 0x1c(sp)
    0x394 <+4>:  cc22                     sw     s0, 0x18(sp)
    0x396 <+6>:  1000                     addi   s0, sp, 0x20
    0x398 <+8>:  4501                     li     a0, 0x0
    0x39a <+10>: fea42a23                 sw     a0, -0xc(s0)
->  0x39e <+14>: fea42823                 sw     a0, -0x10(s0)
    0x3a2 <+18>: 02000513                 li     a0, 0x20
    0x3a6 <+22>: fea42623                 sw     a0, -0x14(s0)
    0x3aa <+26>: 0940003f 00200020        Fake64
    0x3b2 <+34>: 021f 0000 1000           xqci.e.li 4, 268435456
    0x3b8 <+40>: 084f940b                 insbi x8, #0x1f, #0x4, #0x4
    0x3bc <+44>: b8f2                     CmPush {ra, s0-s11},-0
    0x3be <+46>: 084f940b                 insbi x8, #0x1f, #0x4, #0x4
    0x3c2 <+50>: b8f2                     CmPush {ra, s0-s11},-0
    0x3c4 <+52>: ff042503                 lw     a0, -0x10(s0)
    0x3c8 <+56>: fec42583                 lw     a1, -0x14(s0)
    0x3cc <+60>: 02b50533                 mul    a0, a0, a1
    0x3d0 <+64>: 40f2                     lw     ra, 0x1c(sp)
    0x3d2 <+66>: 4462                     lw     s0, 0x18(sp)
    0x3d4 <+68>: 6105                     addi   sp, sp, 0x20
    0x3d6 <+70>: 8082                     ret

The filter replaces the instructions with instructions that it can disassemble.

@apazos
Copy link
Contributor

apazos commented Jun 26, 2025

Thanks @tedwoodward!
This is useful for evaluating new instructions (as part of ISA evolution efforts) and for supporting proprietary custom instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants