Skip to content

[NVPTX] use untyped loads and stores where ever possible #137698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 10, 2025

Conversation

AlexMaclean
Copy link
Member

In most cases, the type information attached to load and store instructions is meaningless and inconsistently applied. We can usually use ".b" loads and avoid the complexity of trying to assign the correct type. The one expectation is sign-extending load, which will continue to use ".s" to ensure the sign extension into a larger register is done correctly.

@llvmbot
Copy link
Member

llvmbot commented Apr 28, 2025

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

In most cases, the type information attached to load and store instructions is meaningless and inconsistently applied. We can usually use ".b" loads and avoid the complexity of trying to assign the correct type. The one expectation is sign-extending load, which will continue to use ".s" to ensure the sign extension into a larger register is done correctly.


Patch is 2.20 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137698.diff

189 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp (+13-54)
  • (modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+15-15)
  • (modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+33-33)
  • (modified) llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll (+87-87)
  • (modified) llvm/test/CodeGen/NVPTX/MachineSink-call.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/MachineSink-convergent.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/access-non-generic.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/addr-mode.ll (+10-10)
  • (modified) llvm/test/CodeGen/NVPTX/addrspacecast-folding.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll (+16-16)
  • (modified) llvm/test/CodeGen/NVPTX/addrspacecast.ll (+19-19)
  • (modified) llvm/test/CodeGen/NVPTX/aggregate-return.ll (+15-15)
  • (modified) llvm/test/CodeGen/NVPTX/and-or-setcc.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/anonymous-fn-param.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/applypriority.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm70.ll (+13-13)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm90.ll (+13-13)
  • (modified) llvm/test/CodeGen/NVPTX/atomics.ll (+66-66)
  • (modified) llvm/test/CodeGen/NVPTX/barrier.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+57-57)
  • (modified) llvm/test/CodeGen/NVPTX/bf16.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/bfe.ll (+14-14)
  • (modified) llvm/test/CodeGen/NVPTX/bswap.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/bug21465.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/bug22246.ll (+5-5)
  • (modified) llvm/test/CodeGen/NVPTX/bug26185-2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/bug26185.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/call-with-alloca-buffer.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/chain-different-as.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll (+630-630)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll (+630-630)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll (+630-630)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg.ll (+210-210)
  • (modified) llvm/test/CodeGen/NVPTX/combine-mad.ll (+28-28)
  • (modified) llvm/test/CodeGen/NVPTX/convert-fp-i8.ll (+10-10)
  • (modified) llvm/test/CodeGen/NVPTX/convert-int-sm20.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm100.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm100a.ll (+21-21)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm80.ll (+25-25)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm90.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/copysign.ll (+18-18)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s.ll (+146-146)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-prefetch.ll (+31-31)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-reduce.ll (+51-51)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-s2g.ll (+60-60)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk.ll (+31-31)
  • (modified) llvm/test/CodeGen/NVPTX/ctlz.ll (+14-14)
  • (modified) llvm/test/CodeGen/NVPTX/dag-cse.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/demote-vars.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/discard.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/disjoint-or-addr.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/distributed-shared-cluster.ll (+13-13)
  • (modified) llvm/test/CodeGen/NVPTX/div.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/dot-product.ll (+37-37)
  • (modified) llvm/test/CodeGen/NVPTX/dynamic-stackalloc-regression.ll (+5-5)
  • (modified) llvm/test/CodeGen/NVPTX/dynamic_stackalloc.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/elect.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/extloadv.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/extractelement.ll (+7-7)
  • (modified) llvm/test/CodeGen/NVPTX/f16-instructions.ll (+32-32)
  • (modified) llvm/test/CodeGen/NVPTX/f16x2-instructions.ll (+45-45)
  • (modified) llvm/test/CodeGen/NVPTX/f32-ex2.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/f32-lg2.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/fabs-intrinsics.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/fexp2.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/flo.ll (+8-8)
  • (modified) llvm/test/CodeGen/NVPTX/flog2.ll (+8-8)
  • (modified) llvm/test/CodeGen/NVPTX/fma-relu-contract.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/fma-relu-fma-intrinsic.ll (+9-9)
  • (modified) llvm/test/CodeGen/NVPTX/fma-relu-instruction-flag.ll (+18-18)
  • (modified) llvm/test/CodeGen/NVPTX/fns.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/forward-ld-param.ll (+9-9)
  • (modified) llvm/test/CodeGen/NVPTX/fp-contract.ll (+21-21)
  • (modified) llvm/test/CodeGen/NVPTX/fp128-storage-type.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/frem.ll (+32-32)
  • (modified) llvm/test/CodeGen/NVPTX/funnel-shift-clamp.ll (+10-10)
  • (modified) llvm/test/CodeGen/NVPTX/generic-to-nvvm.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/globals_lowering.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/half.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/i1-ext-load.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/i1-icmp.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/i1-load-lower.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/i1-select.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/i128-array.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/i128-param.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/i128-retval.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/i128.ll (+14-14)
  • (modified) llvm/test/CodeGen/NVPTX/i16x2-instructions.ll (+102-102)
  • (modified) llvm/test/CodeGen/NVPTX/i8-param.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/i8x2-instructions.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/i8x4-instructions.ll (+129-129)
  • (modified) llvm/test/CodeGen/NVPTX/idioms.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/indirect_byval.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test1.ll (+8-8)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test2.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test3.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/intrinsics.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/jump-table.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/ld-addrspace.ll (+36-36)
  • (modified) llvm/test/CodeGen/NVPTX/ld-generic.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/ld-st-addrrspace.py (+15-2)
  • (modified) llvm/test/CodeGen/NVPTX/ldg-invariant.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/ldparam-v4.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/ldu-i8.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ldu-ldg.ll (+42-42)
  • (modified) llvm/test/CodeGen/NVPTX/ldu-reg-plus-offset.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/load-sext-i1.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-scalars.ll (+576-576)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-sm-70.ll (+960-960)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-sm-90.ll (+384-384)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-vectors.ll (+264-264)
  • (modified) llvm/test/CodeGen/NVPTX/load-with-non-coherent-cache.ll (+44-44)
  • (modified) llvm/test/CodeGen/NVPTX/local-stack-frame.ll (+16-16)
  • (modified) llvm/test/CodeGen/NVPTX/lower-aggr-copies.ll (+11-11)
  • (modified) llvm/test/CodeGen/NVPTX/lower-alloca.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/lower-args-gridconstant.ll (+40-40)
  • (modified) llvm/test/CodeGen/NVPTX/lower-args.ll (+29-29)
  • (modified) llvm/test/CodeGen/NVPTX/lower-byval-args.ll (+154-154)
  • (modified) llvm/test/CodeGen/NVPTX/lower-kernel-ptr-arg.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/machine-sink.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/match.ll (+8-8)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+152-152)
  • (modified) llvm/test/CodeGen/NVPTX/misaligned-vector-ldst.ll (+40-40)
  • (modified) llvm/test/CodeGen/NVPTX/misched_func_call.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/mulhi-intrins.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/nounroll.ll (+8-8)
  • (modified) llvm/test/CodeGen/NVPTX/nvvm-reflect-arch-O0.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/param-add.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/param-align.ll (+15-15)
  • (modified) llvm/test/CodeGen/NVPTX/param-load-store.ll (+118-118)
  • (modified) llvm/test/CodeGen/NVPTX/param-overalign.ll (+14-14)
  • (modified) llvm/test/CodeGen/NVPTX/param-vectorize-device.ll (+25-25)
  • (modified) llvm/test/CodeGen/NVPTX/param-vectorize-kernel.ll (+104-104)
  • (modified) llvm/test/CodeGen/NVPTX/pr13291-i1-store.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/pr16278.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/prefetch.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/proxy-reg-erasure-ptx.ll (+9-9)
  • (modified) llvm/test/CodeGen/NVPTX/rcp-opt.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/read-global-variable-constant.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/reduction-intrinsics.ll (+178-178)
  • (modified) llvm/test/CodeGen/NVPTX/redux-sync-f32.ll (+24-24)
  • (modified) llvm/test/CodeGen/NVPTX/reg-types.ll (+10-10)
  • (modified) llvm/test/CodeGen/NVPTX/rotate-add.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/rotate.ll (+82-82)
  • (modified) llvm/test/CodeGen/NVPTX/rotate_64.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/sad-intrins.ll (+18-18)
  • (modified) llvm/test/CodeGen/NVPTX/sched1.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/sched2.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/sext-params.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/sext-setcc.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/shfl-p.ll (+32-32)
  • (modified) llvm/test/CodeGen/NVPTX/shfl-sync-p.ll (+40-40)
  • (modified) llvm/test/CodeGen/NVPTX/shfl-sync.ll (+20-20)
  • (modified) llvm/test/CodeGen/NVPTX/shfl.ll (+9-9)
  • (modified) llvm/test/CodeGen/NVPTX/short-ptr.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/shuffle-vec-undef-init.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/st-addrspace.ll (+36-36)
  • (modified) llvm/test/CodeGen/NVPTX/st-generic.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/st-param-imm.ll (+147-147)
  • (modified) llvm/test/CodeGen/NVPTX/st_bulk.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/stacksaverestore.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/store-retval.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/store-undef.ll (+28-28)
  • (modified) llvm/test/CodeGen/NVPTX/surf-read-cuda.ll (+7-7)
  • (modified) llvm/test/CodeGen/NVPTX/surf-read.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/surf-write-cuda.ll (+5-5)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-alloc.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-commit.ll (+12-12)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-cp.ll (+36-36)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-ld.ll (+10-10)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-shift.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-st.ll (+654-654)
  • (modified) llvm/test/CodeGen/NVPTX/tex-read-cuda.ll (+11-11)
  • (modified) llvm/test/CodeGen/NVPTX/tex-read.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/texsurf-queries.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/unaligned-param-load-store.ll (+43-43)
  • (modified) llvm/test/CodeGen/NVPTX/unfold-masked-merge-vector-variablemask.ll (+87-87)
  • (modified) llvm/test/CodeGen/NVPTX/vaargs.ll (+21-21)
  • (modified) llvm/test/CodeGen/NVPTX/variadics-backend.ll (+46-46)
  • (modified) llvm/test/CodeGen/NVPTX/vec-param-load.ll (+26-26)
  • (modified) llvm/test/CodeGen/NVPTX/vec8.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/vector-args.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/vector-call.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/vector-compare.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/vector-loads.ll (+17-17)
  • (modified) llvm/test/CodeGen/NVPTX/vector-select.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/vector-stores.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/vectorize-misaligned.ll (+4-4)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
index 295ed666a1902..e81448ff227be 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
@@ -1077,21 +1077,6 @@ pickOpcodeForVT(MVT::SimpleValueType VT, unsigned Opcode_i8,
   }
 }
 
-static int getLdStRegType(EVT VT) {
-  if (VT.isFloatingPoint())
-    switch (VT.getSimpleVT().SimpleTy) {
-    case MVT::f16:
-    case MVT::bf16:
-    case MVT::v2f16:
-    case MVT::v2bf16:
-      return NVPTX::PTXLdStInstCode::Untyped;
-    default:
-      return NVPTX::PTXLdStInstCode::Float;
-    }
-  else
-    return NVPTX::PTXLdStInstCode::Unsigned;
-}
-
 bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
   MemSDNode *LD = cast<MemSDNode>(N);
   assert(LD->readMem() && "Expected load");
@@ -1122,24 +1107,14 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
   //          type is integer
   // Float  : ISD::NON_EXTLOAD or ISD::EXTLOAD and the type is float
   MVT SimpleVT = LoadedVT.getSimpleVT();
-  MVT ScalarVT = SimpleVT.getScalarType();
   // Read at least 8 bits (predicates are stored as 8-bit values)
-  unsigned FromTypeWidth = std::max(8U, (unsigned)ScalarVT.getSizeInBits());
-  unsigned int FromType;
+  unsigned FromTypeWidth = std::max(8U, (unsigned)SimpleVT.getSizeInBits());
 
   // Vector Setting
-  unsigned VecType = NVPTX::PTXLdStInstCode::Scalar;
-  if (SimpleVT.isVector()) {
-    assert((Isv2x16VT(LoadedVT) || LoadedVT == MVT::v4i8) &&
-           "Unexpected vector type");
-    // v2f16/v2bf16/v2i16 is loaded using ld.b32
-    FromTypeWidth = 32;
-  }
-
-  if (PlainLoad && (PlainLoad->getExtensionType() == ISD::SEXTLOAD))
-    FromType = NVPTX::PTXLdStInstCode::Signed;
-  else
-    FromType = getLdStRegType(ScalarVT);
+  unsigned int FromType =
+      (PlainLoad && (PlainLoad->getExtensionType() == ISD::SEXTLOAD))
+          ? NVPTX::PTXLdStInstCode::Signed
+          : NVPTX::PTXLdStInstCode::Untyped;
 
   // Create the machine instruction DAG
   SDValue Offset, Base;
@@ -1147,7 +1122,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
   SDValue Ops[] = {getI32Imm(Ordering, DL),
                    getI32Imm(Scope, DL),
                    getI32Imm(CodeAddrSpace, DL),
-                   getI32Imm(VecType, DL),
+                   getI32Imm(NVPTX::PTXLdStInstCode::Scalar, DL),
                    getI32Imm(FromType, DL),
                    getI32Imm(FromTypeWidth, DL),
                    Base,
@@ -1214,7 +1189,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
   unsigned ExtensionType = N->getConstantOperandVal(N->getNumOperands() - 1);
   unsigned FromType = (ExtensionType == ISD::SEXTLOAD)
                           ? NVPTX::PTXLdStInstCode::Signed
-                          : getLdStRegType(MemVT.getScalarType());
+                          : NVPTX::PTXLdStInstCode::Untyped;
 
   unsigned VecType;
   unsigned FromTypeWidth;
@@ -1232,8 +1207,8 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
   }
 
   if (isSubVectorPackedInI32(EltVT)) {
+    assert(ExtensionType == ISD::NON_EXTLOAD);
     EltVT = MVT::i32;
-    FromType = NVPTX::PTXLdStInstCode::Untyped;
   }
 
   SDValue Offset, Base;
@@ -1434,21 +1409,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
   auto [Ordering, Scope] = insertMemoryInstructionFence(DL, Chain, ST);
 
   // Vector Setting
-  MVT SimpleVT = StoreVT.getSimpleVT();
-  unsigned VecType = NVPTX::PTXLdStInstCode::Scalar;
-
-  // Type Setting: toType + toTypeWidth
-  // - for integer type, always use 'u'
-  MVT ScalarVT = SimpleVT.getScalarType();
-  unsigned ToTypeWidth = ScalarVT.getSizeInBits();
-  if (SimpleVT.isVector()) {
-    assert((Isv2x16VT(StoreVT) || StoreVT == MVT::v4i8) &&
-           "Unexpected vector type");
-    // v2x16 is stored using st.b32
-    ToTypeWidth = 32;
-  }
-
-  unsigned int ToType = getLdStRegType(ScalarVT);
+  const unsigned ToTypeWidth = StoreVT.getSimpleVT().getSizeInBits();
 
   // Create the machine instruction DAG
   SDValue Value = PlainStore ? PlainStore->getValue() : AtomicStore->getVal();
@@ -1460,8 +1421,8 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
                    getI32Imm(Ordering, DL),
                    getI32Imm(Scope, DL),
                    getI32Imm(CodeAddrSpace, DL),
-                   getI32Imm(VecType, DL),
-                   getI32Imm(ToType, DL),
+                   getI32Imm(NVPTX::PTXLdStInstCode::Scalar, DL),
+                   getI32Imm(NVPTX::PTXLdStInstCode::Untyped, DL),
                    getI32Imm(ToTypeWidth, DL),
                    Base,
                    Offset,
@@ -1507,7 +1468,6 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
   // Type Setting: toType + toTypeWidth
   // - for integer type, always use 'u'
   const unsigned TotalWidth = StoreVT.getSimpleVT().getSizeInBits();
-  unsigned ToType = getLdStRegType(StoreVT.getSimpleVT().getScalarType());
 
   SmallVector<SDValue, 12> Ops;
   SDValue N2;
@@ -1534,7 +1494,6 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
 
   if (isSubVectorPackedInI32(EltVT)) {
     EltVT = MVT::i32;
-    ToType = NVPTX::PTXLdStInstCode::Untyped;
   }
 
   SDValue Offset, Base;
@@ -1542,8 +1501,8 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
 
   Ops.append({getI32Imm(Ordering, DL), getI32Imm(Scope, DL),
               getI32Imm(CodeAddrSpace, DL), getI32Imm(VecType, DL),
-              getI32Imm(ToType, DL), getI32Imm(ToTypeWidth, DL), Base, Offset,
-              Chain});
+              getI32Imm(NVPTX::PTXLdStInstCode::Untyped, DL),
+              getI32Imm(ToTypeWidth, DL), Base, Offset, Chain});
 
   std::optional<unsigned> Opcode;
   switch (N->getOpcode()) {
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 043da14bcb236..21846583a8c04 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -2296,11 +2296,11 @@ def LoadParamMemV2I8   : LoadParamV2MemInst<Int16Regs, ".b8">;
 def LoadParamMemV4I32  : LoadParamV4MemInst<Int32Regs, ".b32">;
 def LoadParamMemV4I16  : LoadParamV4MemInst<Int16Regs, ".b16">;
 def LoadParamMemV4I8   : LoadParamV4MemInst<Int16Regs, ".b8">;
-def LoadParamMemF32    : LoadParamMemInst<Float32Regs, ".f32">;
-def LoadParamMemF64    : LoadParamMemInst<Float64Regs, ".f64">;
-def LoadParamMemV2F32  : LoadParamV2MemInst<Float32Regs, ".f32">;
-def LoadParamMemV2F64  : LoadParamV2MemInst<Float64Regs, ".f64">;
-def LoadParamMemV4F32  : LoadParamV4MemInst<Float32Regs, ".f32">;
+def LoadParamMemF32    : LoadParamMemInst<Float32Regs, ".b32">;
+def LoadParamMemF64    : LoadParamMemInst<Float64Regs, ".b64">;
+def LoadParamMemV2F32  : LoadParamV2MemInst<Float32Regs, ".b32">;
+def LoadParamMemV2F64  : LoadParamV2MemInst<Float64Regs, ".b64">;
+def LoadParamMemV4F32  : LoadParamV4MemInst<Float32Regs, ".b32">;
 
 defm StoreParamI64    : StoreParamInst<Int64Regs, i64imm, ".b64">;
 defm StoreParamI32    : StoreParamInst<Int32Regs, i32imm, ".b32">;
@@ -2319,13 +2319,13 @@ defm StoreParamV4I32  : StoreParamV4Inst<Int32Regs, i32imm, ".b32">;
 defm StoreParamV4I16  : StoreParamV4Inst<Int16Regs, i16imm, ".b16">;
 defm StoreParamV4I8   : StoreParamV4Inst<Int16Regs, i8imm,  ".b8">;
 
-defm StoreParamF32    : StoreParamInst<Float32Regs, f32imm, ".f32">;
-defm StoreParamF64    : StoreParamInst<Float64Regs, f64imm, ".f64">;
+defm StoreParamF32    : StoreParamInst<Float32Regs, f32imm, ".b32">;
+defm StoreParamF64    : StoreParamInst<Float64Regs, f64imm, ".b64">;
 
-defm StoreParamV2F32  : StoreParamV2Inst<Float32Regs, f32imm, ".f32">;
-defm StoreParamV2F64  : StoreParamV2Inst<Float64Regs, f64imm, ".f64">;
+defm StoreParamV2F32  : StoreParamV2Inst<Float32Regs, f32imm, ".b32">;
+defm StoreParamV2F64  : StoreParamV2Inst<Float64Regs, f64imm, ".b64">;
 
-defm StoreParamV4F32  : StoreParamV4Inst<Float32Regs, f32imm, ".f32">;
+defm StoreParamV4F32  : StoreParamV4Inst<Float32Regs, f32imm, ".b32">;
 
 def StoreRetvalI64    : StoreRetvalInst<Int64Regs, ".b64">;
 def StoreRetvalI32    : StoreRetvalInst<Int32Regs, ".b32">;
@@ -2341,11 +2341,11 @@ def StoreRetvalV4I32  : StoreRetvalV4Inst<Int32Regs, ".b32">;
 def StoreRetvalV4I16  : StoreRetvalV4Inst<Int16Regs, ".b16">;
 def StoreRetvalV4I8   : StoreRetvalV4Inst<Int16Regs, ".b8">;
 
-def StoreRetvalF64    : StoreRetvalInst<Float64Regs, ".f64">;
-def StoreRetvalF32    : StoreRetvalInst<Float32Regs, ".f32">;
-def StoreRetvalV2F64  : StoreRetvalV2Inst<Float64Regs, ".f64">;
-def StoreRetvalV2F32  : StoreRetvalV2Inst<Float32Regs, ".f32">;
-def StoreRetvalV4F32  : StoreRetvalV4Inst<Float32Regs, ".f32">;
+def StoreRetvalF64    : StoreRetvalInst<Float64Regs, ".b64">;
+def StoreRetvalF32    : StoreRetvalInst<Float32Regs, ".b32">;
+def StoreRetvalV2F64  : StoreRetvalV2Inst<Float64Regs, ".b64">;
+def StoreRetvalV2F32  : StoreRetvalV2Inst<Float32Regs, ".b32">;
+def StoreRetvalV4F32  : StoreRetvalV4Inst<Float32Regs, ".b32">;
 
 def CallArgBeginInst : NVPTXInst<(outs), (ins), "(", [(CallArgBegin)]>;
 def CallArgEndInst1  : NVPTXInst<(outs), (ins), ");", [(CallArgEnd (i32 1))]>;
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 3eedb43e4c81a..4d56cf38531e7 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -2302,12 +2302,12 @@ class LDU_G<string TyStr, NVPTXRegClass regclass>
                "ldu.global." # TyStr # " \t$result, [$src];",
                       []>, Requires<[hasLDU]>;
 
-def INT_PTX_LDU_GLOBAL_i8  : LDU_G<"u8", Int16Regs>;
-def INT_PTX_LDU_GLOBAL_i16 : LDU_G<"u16", Int16Regs>;
-def INT_PTX_LDU_GLOBAL_i32 : LDU_G<"u32", Int32Regs>;
-def INT_PTX_LDU_GLOBAL_i64 : LDU_G<"u64", Int64Regs>;
-def INT_PTX_LDU_GLOBAL_f32 : LDU_G<"f32", Float32Regs>;
-def INT_PTX_LDU_GLOBAL_f64 : LDU_G<"f64", Float64Regs>;
+def INT_PTX_LDU_GLOBAL_i8  : LDU_G<"b8", Int16Regs>;
+def INT_PTX_LDU_GLOBAL_i16 : LDU_G<"b16", Int16Regs>;
+def INT_PTX_LDU_GLOBAL_i32 : LDU_G<"b32", Int32Regs>;
+def INT_PTX_LDU_GLOBAL_i64 : LDU_G<"b64", Int64Regs>;
+def INT_PTX_LDU_GLOBAL_f32 : LDU_G<"b32", Float32Regs>;
+def INT_PTX_LDU_GLOBAL_f64 : LDU_G<"b64", Float64Regs>;
 
 // vector
 
@@ -2324,19 +2324,19 @@ class VLDU_G_ELE_V4<string TyStr, NVPTXRegClass regclass>
                "ldu.global.v4." # TyStr # " \t{{$dst1, $dst2, $dst3, $dst4}}, [$src];", []>;
 
 
-def INT_PTX_LDU_G_v2i8_ELE : VLDU_G_ELE_V2<"u8", Int16Regs>;
-def INT_PTX_LDU_G_v2i16_ELE : VLDU_G_ELE_V2<"u16", Int16Regs>;
-def INT_PTX_LDU_G_v2i32_ELE : VLDU_G_ELE_V2<"u32", Int32Regs>;
-def INT_PTX_LDU_G_v2f32_ELE : VLDU_G_ELE_V2<"f32", Float32Regs>;
-def INT_PTX_LDU_G_v2i64_ELE : VLDU_G_ELE_V2<"u64", Int64Regs>;
-def INT_PTX_LDU_G_v2f64_ELE : VLDU_G_ELE_V2<"f64", Float64Regs>;
+def INT_PTX_LDU_G_v2i8_ELE : VLDU_G_ELE_V2<"b8", Int16Regs>;
+def INT_PTX_LDU_G_v2i16_ELE : VLDU_G_ELE_V2<"b16", Int16Regs>;
+def INT_PTX_LDU_G_v2i32_ELE : VLDU_G_ELE_V2<"b32", Int32Regs>;
+def INT_PTX_LDU_G_v2f32_ELE : VLDU_G_ELE_V2<"b32", Float32Regs>;
+def INT_PTX_LDU_G_v2i64_ELE : VLDU_G_ELE_V2<"b64", Int64Regs>;
+def INT_PTX_LDU_G_v2f64_ELE : VLDU_G_ELE_V2<"b64", Float64Regs>;
 
-def INT_PTX_LDU_G_v4i8_ELE : VLDU_G_ELE_V4<"u8", Int16Regs>;
-def INT_PTX_LDU_G_v4i16_ELE : VLDU_G_ELE_V4<"u16", Int16Regs>;
-def INT_PTX_LDU_G_v4i32_ELE  : VLDU_G_ELE_V4<"u32", Int32Regs>;
+def INT_PTX_LDU_G_v4i8_ELE : VLDU_G_ELE_V4<"b8", Int16Regs>;
+def INT_PTX_LDU_G_v4i16_ELE : VLDU_G_ELE_V4<"b16", Int16Regs>;
+def INT_PTX_LDU_G_v4i32_ELE  : VLDU_G_ELE_V4<"b32", Int32Regs>;
 def INT_PTX_LDU_G_v4f16_ELE   : VLDU_G_ELE_V4<"b16", Int16Regs>;
 def INT_PTX_LDU_G_v4f16x2_ELE  : VLDU_G_ELE_V4<"b32", Int32Regs>;
-def INT_PTX_LDU_G_v4f32_ELE  : VLDU_G_ELE_V4<"f32", Float32Regs>;
+def INT_PTX_LDU_G_v4f32_ELE  : VLDU_G_ELE_V4<"b32", Float32Regs>;
 
 
 //-----------------------------------
@@ -2352,12 +2352,12 @@ class LDG_G<string TyStr, NVPTXRegClass regclass>
                "ld.global.nc." # TyStr # " \t$result, [$src];",
                         []>, Requires<[hasLDG]>;
 
-def INT_PTX_LDG_GLOBAL_i8 : LDG_G<"u8", Int16Regs>;
-def INT_PTX_LDG_GLOBAL_i16 : LDG_G<"u16", Int16Regs>;
-def INT_PTX_LDG_GLOBAL_i32 : LDG_G<"u32", Int32Regs>;
-def INT_PTX_LDG_GLOBAL_i64 : LDG_G<"u64", Int64Regs>;
-def INT_PTX_LDG_GLOBAL_f32 : LDG_G<"f32", Float32Regs>;
-def INT_PTX_LDG_GLOBAL_f64 : LDG_G<"f64", Float64Regs>;
+def INT_PTX_LDG_GLOBAL_i8 : LDG_G<"b8", Int16Regs>;
+def INT_PTX_LDG_GLOBAL_i16 : LDG_G<"b16", Int16Regs>;
+def INT_PTX_LDG_GLOBAL_i32 : LDG_G<"b32", Int32Regs>;
+def INT_PTX_LDG_GLOBAL_i64 : LDG_G<"b64", Int64Regs>;
+def INT_PTX_LDG_GLOBAL_f32 : LDG_G<"b32", Float32Regs>;
+def INT_PTX_LDG_GLOBAL_f64 : LDG_G<"b64", Float64Regs>;
 
 // vector
 
@@ -2374,17 +2374,17 @@ class VLDG_G_ELE_V4<string TyStr, NVPTXRegClass regclass> :
             "ld.global.nc.v4." # TyStr # " \t{{$dst1, $dst2, $dst3, $dst4}}, [$src];", []>;
 
 // FIXME: 8-bit LDG should be fixed once LDG/LDU nodes are made into proper loads.
-def INT_PTX_LDG_G_v2i8_ELE : VLDG_G_ELE_V2<"u8", Int16Regs>;
-def INT_PTX_LDG_G_v2i16_ELE : VLDG_G_ELE_V2<"u16", Int16Regs>;
-def INT_PTX_LDG_G_v2i32_ELE : VLDG_G_ELE_V2<"u32", Int32Regs>;
-def INT_PTX_LDG_G_v2f32_ELE : VLDG_G_ELE_V2<"f32", Float32Regs>;
-def INT_PTX_LDG_G_v2i64_ELE : VLDG_G_ELE_V2<"u64", Int64Regs>;
-def INT_PTX_LDG_G_v2f64_ELE : VLDG_G_ELE_V2<"f64", Float64Regs>;
-
-def INT_PTX_LDG_G_v4i8_ELE : VLDG_G_ELE_V4<"u8", Int16Regs>;
-def INT_PTX_LDG_G_v4i16_ELE : VLDG_G_ELE_V4<"u16", Int16Regs>;
-def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"u32", Int32Regs>;
-def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"f32", Float32Regs>;
+def INT_PTX_LDG_G_v2i8_ELE : VLDG_G_ELE_V2<"b8", Int16Regs>;
+def INT_PTX_LDG_G_v2i16_ELE : VLDG_G_ELE_V2<"b16", Int16Regs>;
+def INT_PTX_LDG_G_v2i32_ELE : VLDG_G_ELE_V2<"b32", Int32Regs>;
+def INT_PTX_LDG_G_v2f32_ELE : VLDG_G_ELE_V2<"b32", Float32Regs>;
+def INT_PTX_LDG_G_v2i64_ELE : VLDG_G_ELE_V2<"b64", Int64Regs>;
+def INT_PTX_LDG_G_v2f64_ELE : VLDG_G_ELE_V2<"b64", Float64Regs>;
+
+def INT_PTX_LDG_G_v4i8_ELE : VLDG_G_ELE_V4<"b8", Int16Regs>;
+def INT_PTX_LDG_G_v4i16_ELE : VLDG_G_ELE_V4<"b16", Int16Regs>;
+def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"b32", Int32Regs>;
+def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"b32", Float32Regs>;
 
 
 multiclass NG_TO_G<string Str, bit Supports32 = 1, list<Predicate> Preds = []> {
diff --git a/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll b/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll
index 8f0964c2d5eba..78b57badc06e8 100644
--- a/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll
+++ b/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll
@@ -16,8 +16,8 @@ define i32 @f(ptr %p) {
 ; ENABLED-NEXT:    .reg .b64 %rd<2>;
 ; ENABLED-EMPTY:
 ; ENABLED-NEXT:  // %bb.0:
-; ENABLED-NEXT:    ld.param.u64 %rd1, [f_param_0];
-; ENABLED-NEXT:    ld.v2.u32 {%r1, %r2}, [%rd1];
+; ENABLED-NEXT:    ld.param.b64 %rd1, [f_param_0];
+; ENABLED-NEXT:    ld.v2.b32 {%r1, %r2}, [%rd1];
 ; ENABLED-NEXT:    add.s32 %r3, %r1, %r2;
 ; ENABLED-NEXT:    st.param.b32 [func_retval0], %r3;
 ; ENABLED-NEXT:    ret;
@@ -28,9 +28,9 @@ define i32 @f(ptr %p) {
 ; DISABLED-NEXT:    .reg .b64 %rd<2>;
 ; DISABLED-EMPTY:
 ; DISABLED-NEXT:  // %bb.0:
-; DISABLED-NEXT:    ld.param.u64 %rd1, [f_param_0];
-; DISABLED-NEXT:    ld.u32 %r1, [%rd1];
-; DISABLED-NEXT:    ld.u32 %r2, [%rd1+4];
+; DISABLED-NEXT:    ld.param.b64 %rd1, [f_param_0];
+; DISABLED-NEXT:    ld.b32 %r1, [%rd1];
+; DISABLED-NEXT:    ld.b32 %r2, [%rd1+4];
 ; DISABLED-NEXT:    add.s32 %r3, %r1, %r2;
 ; DISABLED-NEXT:    st.param.b32 [func_retval0], %r3;
 ; DISABLED-NEXT:    ret;
@@ -49,7 +49,7 @@ define half @fh(ptr %p) {
 ; ENABLED-NEXT:    .reg .b64 %rd<2>;
 ; ENABLED-EMPTY:
 ; ENABLED-NEXT:  // %bb.0:
-; ENABLED-NEXT:    ld.param.u64 %rd1, [fh_param_0];
+; ENABLED-NEXT:    ld.param.b64 %rd1, [fh_param_0];
 ; ENABLED-NEXT:    ld.v4.b16 {%rs1, %rs2, %rs3, %rs4}, [%rd1];
 ; ENABLED-NEXT:    ld.b16 %rs5, [%rd1+8];
 ; ENABLED-NEXT:    cvt.f32.f16 %f1, %rs2;
@@ -78,7 +78,7 @@ define half @fh(ptr %p) {
 ; DISABLED-NEXT:    .reg .b64 %rd<2>;
 ; DISABLED-EMPTY:
 ; DISABLED-NEXT:  // %bb.0:
-; DISABLED-NEXT:    ld.param.u64 %rd1, [fh_param_0];
+; DISABLED-NEXT:    ld.param.b64 %rd1, [fh_param_0];
 ; DISABLED-NEXT:    ld.b16 %rs1, [%rd1];
 ; DISABLED-NEXT:    ld.b16 %rs2, [%rd1+2];
 ; DISABLED-NEXT:    ld.b16 %rs3, [%rd1+4];
@@ -125,14 +125,14 @@ define float @ff(ptr %p) {
 ; ENABLED-NEXT:    .reg .b64 %rd<2>;
 ; ENABLED-EMPTY:
 ; ENABLED-NEXT:  // %bb.0:
-; ENABLED-NEXT:    ld.param.u64 %rd1, [ff_param_0];
-; ENABLED-NEXT:    ld.v4.f32 {%f1, %f2, %f3, %f4}, [%rd1];
-; ENABLED-NEXT:    ld.f32 %f5, [%rd1+16];
+; ENABLED-NEXT:    ld.param.b64 %rd1, [ff_param_0];
+; ENABLED-NEXT:    ld.v4.b32 {%f1, %f2, %f3, %f4}, [%rd1];
+; ENABLED-NEXT:    ld.b32 %f5, [%rd1+16];
 ; ENABLED-NEXT:    add.rn.f32 %f6, %f1, %f2;
 ; ENABLED-NEXT:    add.rn.f32 %f7, %f3, %f4;
 ; ENABLED-NEXT:    add.rn.f32 %f8, %f6, %f7;
 ; ENABLED-NEXT:    add.rn.f32 %f9, %f8, %f5;
-; ENABLED-NEXT:    st.param.f32 [func_retval0], %f9;
+; ENABLED-NEXT:    st.param.b32 [func_retval0], %f9;
 ; ENABLED-NEXT:    ret;
 ;
 ; DISABLED-LABEL: ff(
@@ -141,17 +141,17 @@ define float @ff(ptr %p) {
 ; DISABLED-NEXT:    .reg .b64 %rd<2>;
 ; DISABLED-EMPTY:
 ; DISABLED-NEXT:  // %bb.0:
-; DISABLED-NEXT:    ld.param.u64 %rd1, [ff_param_0];
-; DISABLED-NEXT:    ld.f32 %f1, [%rd1];
-; DISABLED-NEXT:    ld.f32 %f2, [%rd1+4];
-; DISABLED-NEXT:    ld.f32 %f3, [%rd1+8];
-; DISABLED-NEXT:    ld.f32 %f4, [%rd1+12];
-; DISABLED-NEXT:    ld.f32 %f5, [%rd1+16];
+; DISABLED-NEXT:    ld.param.b64 %rd1, [ff_param_0];
+; DISABLED-NEXT:    ld.b32 %f1, [%rd1];
+; DISABLED-NEXT:    ld.b32 %f2, [%rd1+4];
+; DISABLED-NEXT:    ld.b32 %f3, [%rd1+8];
+; DISABLED-NEXT:    ld.b32 %f4, [%rd1+12];
+; DISABLED-NEXT:    ld.b32 %f5, [%rd1+16];
 ; DISABLED-NEXT:    add.rn.f32 %f6, %f1, %f2;
 ; DISABLED-NEXT:    add.rn.f32 %f7, %f3, %f4;
 ; DISABLED-NEXT:    add.rn.f32 %f8, %f6, %f7;
 ; DISABLED-NEXT:    add.rn.f32 %f9, %f8, %f5;
-; DISABLED-NEXT:    st.param.f32 [func_retval0], %f9;
+; DISABLED-NEXT:    st.param.b32 [func_retval0], %f9;
 ; DISABLED-NEXT:    ret;
   %p.1 = getelementptr float, ptr %p, i32 1
   %p.2 = getelementptr float, ptr %p, i32 2
@@ -176,9 +176,9 @@ define void @combine_v16i8(ptr noundef align 16 %ptr1, ptr noundef align 16 %ptr
 ; ENABLED-NEXT:    .reg .b64 %rd<3>;
 ; ENABLED-EMPTY:
 ; ENABLED-NEXT:  // %bb.0:
-; ENABLED-NEXT:    ld.param.u64 %rd1, [combine_v16i8_param_0];
+; ENABLED-NEXT:    ld.param.b64 %rd1, [combine_v16i8_param_0];
 ; ENABLED-NEXT:    ld.v4.b32 {%r1, %r2, %r3, %r4}, [%rd1];
-; ENABLED-NEXT:    ld.param.u64 %rd2, [combine_v16i8_param_1];
+; ENABLED-NEXT:    ld.param.b64 %rd2, [combine_v16i8_param_1];
 ; ENABLED-NEXT:    bfe.u32 %r5, %r1, 0, 8;
 ; ENABLED-NEXT:    bfe.u32 %r6, %r1, 8, 8;
 ; ENABLED-NEXT:    bfe.u32 %r7, %r1, 16, 8;
@@ -210,7 +210,7 @@ define void @combine_v16i8(ptr noundef align 16 %ptr1, ptr noundef align 16 %ptr
 ; ENABLED-NEXT:    add.s32 %r33, %r32, %r18;
 ; ENABLED-NEXT:    add.s32 %r34, %r33, %r19;
 ; ENABLED-NEXT:    add.s32 %r35, %r34, %r20;
-; ENABLED-NEXT:    st.u32 [%rd2], %r35;
+; ENABLED-NEXT:    st.b32 [%rd2], %r35;
 ; ENABLED-NEXT:    ret;
 ;
 ; DISABLED-LABEL: combine_v16i8(
@@ -219,24 +219,24 @@ define void @combine_v16i8(ptr noundef align 16 %ptr1, ptr noundef align 16 %ptr
 ; DISABLED-NEXT:    .reg .b64 %rd<3>;
 ; DISABLED-EMPTY:
 ; DISABLED-NEXT:  // %bb.0:
-; DISABLED-NEXT:    ld.param.u64 %rd1, [combine_v16i8_param_0];
-; DISABLED-NEXT:    ld.u8 %r1, [%rd1];
-; DISABLED-NEXT:    ld.param.u64 %rd2, [combine_v16i8_param_1];
-; DISABLED-NEXT:    ld.u8 %r2, [%rd1+1];
-; DISABLED-NEXT:    ld.u8 %r3, [%rd1+2];
-; DISABLED-NEXT:    ld.u8 %r4, [%rd1+3];
-; DISABLED-NEXT:    ld.u8 %r5, [%rd1+4];
-; DISABLED-NEXT:    ld.u8 %r6, [%rd1+5];
-; DISABLED-NEXT:    ld.u8 %r7, [%rd1+6];
-; DISABLED-NEXT:    ld.u8 %r8, [%rd1+7];
-; DISABLED-NEXT:    ld.u8 %r9, [%rd1+8];
-; DISABLED-NEXT:    ld.u8 %r10, [%rd1+9];
-; DISABLED-NEXT:    ld.u8 %r11, [%rd1+10];
-; DISABLED-NEXT:    ld.u8 %r12, [%rd1+11];
-; DISABLED-NEXT:    ld.u8 %r13, [%rd1+12];
-; DISABLED-NEXT:    ld.u8 %r14, [%rd1+13];
-; DISABLED-NEXT:    ld.u8 %r15, [%rd1+14];
-; DISABLED-NEXT:    ld.u8 %r16, [%rd1+15];
+; DISABLED-NEXT:    ld.param.b64 %rd1, [combine_v16i8_param_0];
+; DISABLED-NEXT:    ld.b8 %r1, [%rd1];
+; DISABLED-NEXT:    ld.param.b64 %rd2, [combine_v16i8_param_1];
+; DISABLED-NEXT:    ld.b8 %r2, [%rd1+1];
+; DISABLED-NEXT:    ld.b8 %r3, [%rd1+...
[truncated]

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think I can see where you're going and I like the direction.
LGTM.

@gonzalobg
Copy link
Contributor

This is awesome, LGTM!

@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream/untyped-ldst branch from f0cdc61 to 36abb65 Compare April 30, 2025 20:02
@llvmbot llvmbot added clang Clang issues not falling into any other category debuginfo llvm:transforms labels Apr 30, 2025
@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream/untyped-ldst branch from 36abb65 to c9d08db Compare April 30, 2025 20:44
@AlexMaclean AlexMaclean force-pushed the dev/amaclean/upstream/untyped-ldst branch from c9d08db to 4bb455f Compare May 9, 2025 21:36
@AlexMaclean AlexMaclean merged commit 369891b into llvm:main May 10, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX clang Clang issues not falling into any other category debuginfo llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants