-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[flang][OpenMP] Implement HAS_DEVICE_ADDR clause #128568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged).
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-flang-fir-hlfir Author: Krzysztof Parzyszek (kparzysz) ChangesThe HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, When entering a target region, Some Fortran objects use descriptors in their in-memory representation. If Patch is 79.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128568.diff 20 Files Affected:
diff --git a/flang/include/flang/Support/OpenMP-utils.h b/flang/include/flang/Support/OpenMP-utils.h
index d8f82e1cf94f1..7805626632487 100644
--- a/flang/include/flang/Support/OpenMP-utils.h
+++ b/flang/include/flang/Support/OpenMP-utils.h
@@ -35,6 +35,7 @@ struct EntryBlockArgsEntry {
/// arguments associated to all clauses that can define them.
struct EntryBlockArgs {
llvm::ArrayRef<mlir::Value> hostEvalVars;
+ EntryBlockArgsEntry hasDeviceAddr;
EntryBlockArgsEntry inReduction;
EntryBlockArgsEntry map;
EntryBlockArgsEntry priv;
@@ -44,21 +45,21 @@ struct EntryBlockArgs {
EntryBlockArgsEntry useDevicePtr;
bool isValid() const {
- return inReduction.isValid() && map.isValid() && priv.isValid() &&
- reduction.isValid() && taskReduction.isValid() &&
+ return hasDeviceAddr.isValid() && inReduction.isValid() && map.isValid() &&
+ priv.isValid() && reduction.isValid() && taskReduction.isValid() &&
useDeviceAddr.isValid() && useDevicePtr.isValid();
}
auto getSyms() const {
- return llvm::concat<const semantics::Symbol *const>(inReduction.syms,
- map.syms, priv.syms, reduction.syms, taskReduction.syms,
- useDeviceAddr.syms, useDevicePtr.syms);
+ return llvm::concat<const semantics::Symbol *const>(hasDeviceAddr.syms,
+ inReduction.syms, map.syms, priv.syms, reduction.syms,
+ taskReduction.syms, useDeviceAddr.syms, useDevicePtr.syms);
}
auto getVars() const {
- return llvm::concat<const mlir::Value>(hostEvalVars, inReduction.vars,
- map.vars, priv.vars, reduction.vars, taskReduction.vars,
- useDeviceAddr.vars, useDevicePtr.vars);
+ return llvm::concat<const mlir::Value>(hasDeviceAddr.vars, hostEvalVars,
+ inReduction.vars, map.vars, priv.vars, reduction.vars,
+ taskReduction.vars, useDeviceAddr.vars, useDevicePtr.vars);
}
};
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index e21d299570b86..48610fabd4288 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -913,14 +913,34 @@ bool ClauseProcessor::processDepend(mlir::omp::DependClauseOps &result) const {
}
bool ClauseProcessor::processHasDeviceAddr(
- mlir::omp::HasDeviceAddrClauseOps &result,
- llvm::SmallVectorImpl<const semantics::Symbol *> &isDeviceSyms) const {
- return findRepeatableClause<omp::clause::HasDeviceAddr>(
- [&](const omp::clause::HasDeviceAddr &devAddrClause,
- const parser::CharBlock &) {
- addUseDeviceClause(converter, devAddrClause.v, result.hasDeviceAddrVars,
- isDeviceSyms);
+ lower::StatementContext &stmtCtx, mlir::omp::HasDeviceAddrClauseOps &result,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &hasDeviceSyms) const {
+ // For HAS_DEVICE_ADDR objects, implicitly map the top-level entities.
+ // Their address (or the whole descriptor, if the entity had one) will be
+ // passed to the target region.
+ std::map<Object, OmpMapParentAndMemberData> parentMemberIndices;
+ bool clauseFound = findRepeatableClause<omp::clause::HasDeviceAddr>(
+ [&](const omp::clause::HasDeviceAddr &clause,
+ const parser::CharBlock &source) {
+ mlir::Location location = converter.genLocation(source);
+ llvm::omp::OpenMPOffloadMappingFlags mapTypeBits =
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
+ omp::ObjectList baseObjects;
+ llvm::transform(clause.v, std::back_inserter(baseObjects),
+ [&](const omp::Object &object) {
+ if (auto maybeBase = getBaseObject(object, semaCtx))
+ return *maybeBase;
+ return object;
+ });
+ processMapObjects(stmtCtx, location, baseObjects, mapTypeBits,
+ parentMemberIndices, result.hasDeviceAddrVars,
+ hasDeviceSyms);
});
+
+ insertChildMapInfoIntoParent(converter, semaCtx, stmtCtx, parentMemberIndices,
+ result.hasDeviceAddrVars, hasDeviceSyms);
+ return clauseFound;
}
bool ClauseProcessor::processIf(
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 889a09a8f0cd8..5de8798d25945 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -71,8 +71,9 @@ class ClauseProcessor {
bool processFinal(lower::StatementContext &stmtCtx,
mlir::omp::FinalClauseOps &result) const;
bool processHasDeviceAddr(
+ lower::StatementContext &stmtCtx,
mlir::omp::HasDeviceAddrClauseOps &result,
- llvm::SmallVectorImpl<const semantics::Symbol *> &isDeviceSyms) const;
+ llvm::SmallVectorImpl<const semantics::Symbol *> &hasDeviceSyms) const;
bool processHint(mlir::omp::HintClauseOps &result) const;
bool processInclusive(mlir::Location currentLocation,
mlir::omp::InclusiveClauseOps &result) const;
diff --git a/flang/lib/Lower/OpenMP/Clauses.cpp b/flang/lib/Lower/OpenMP/Clauses.cpp
index 831ba23870360..de60146e10f16 100644
--- a/flang/lib/Lower/OpenMP/Clauses.cpp
+++ b/flang/lib/Lower/OpenMP/Clauses.cpp
@@ -159,8 +159,8 @@ std::optional<Object> getBaseObject(const Object &object,
return Object{SymbolAndDesignatorExtractor::symbol_addr(comp->symbol()),
ea.Designate(evaluate::DataRef{
SymbolAndDesignatorExtractor::AsRvalueRef(*comp)})};
- } else if (base.UnwrapSymbolRef()) {
- return std::nullopt;
+ } else if (auto *symRef = base.UnwrapSymbolRef()) {
+ return Object{const_cast<semantics::Symbol *>(&**symRef), std::nullopt};
}
} else {
assert(std::holds_alternative<evaluate::CoarrayRef>(ref.u) &&
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index e0d23fc53eeca..c4e8f314400d0 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -321,6 +321,7 @@ static void bindEntryBlockArgs(lower::AbstractConverter &converter,
// Process in clause name alphabetical order to match block arguments order.
// Do not bind host_eval variables because they cannot be used inside of the
// corresponding region, except for very specific cases handled separately.
+ bindMapLike(args.hasDeviceAddr.syms, op.getHasDeviceAddrBlockArgs());
bindPrivateLike(args.inReduction.syms, args.inReduction.vars,
op.getInReductionBlockArgs());
bindMapLike(args.map.syms, op.getMapBlockArgs());
@@ -1645,7 +1646,7 @@ static void genTargetClauses(
cp.processBare(clauseOps);
cp.processDepend(clauseOps);
cp.processDevice(stmtCtx, clauseOps);
- cp.processHasDeviceAddr(clauseOps, hasDeviceAddrSyms);
+ cp.processHasDeviceAddr(stmtCtx, clauseOps, hasDeviceAddrSyms);
if (!hostEvalInfo.empty()) {
// Only process host_eval if compiling for the host device.
processHostEvalClauses(converter, semaCtx, stmtCtx, eval, loc);
@@ -2191,6 +2192,10 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
if (dsp.getAllSymbolsToPrivatize().contains(&sym))
return;
+ // These symbols are mapped individually in processHasDeviceAddr.
+ if (llvm::is_contained(hasDeviceAddrSyms, &sym))
+ return;
+
// Structure component symbols don't have bindings, and can only be
// explicitly mapped individually. If a member is captured implicitly
// we map the entirety of the derived type when we find its symbol.
@@ -2281,8 +2286,9 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
auto targetOp = firOpBuilder.create<mlir::omp::TargetOp>(loc, clauseOps);
- llvm::SmallVector<mlir::Value> mapBaseValues;
+ llvm::SmallVector<mlir::Value> mapBaseValues, hasDeviceAddrBaseValues;
extractMappedBaseValues(clauseOps.mapVars, mapBaseValues);
+ extractMappedBaseValues(clauseOps.hasDeviceAddrVars, hasDeviceAddrBaseValues);
EntryBlockArgs args;
args.hostEvalVars = clauseOps.hostEvalVars;
@@ -2291,6 +2297,8 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
args.map.vars = mapBaseValues;
args.priv.syms = dsp.getDelayedPrivSymbols();
args.priv.vars = clauseOps.privateVars;
+ args.hasDeviceAddr.syms = hasDeviceAddrSyms;
+ args.hasDeviceAddr.vars = hasDeviceAddrBaseValues;
genBodyOfTargetOp(converter, symTable, semaCtx, eval, targetOp, args, loc,
queue, item, dsp);
diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index beea7543e54b3..0d8026eed17ad 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -224,10 +224,19 @@ class MapInfoFinalizationPass
}
/// Adjusts the descriptor's map type. The main alteration that is done
- /// currently is transforming the map type to `OMP_MAP_TO` where possible.
- /// This is because we will always need to map the descriptor to device
- /// (or at the very least it seems to be the case currently with the
- /// current lowered kernel IR), as without the appropriate descriptor
+ /// currently is transforming the map type to `OMP_MAP_TO` where possible,
+ /// plus adding OMP_MAP_ALWAYS flag. Descriptors will always be copied,
+ /// even if the object was listed on the `has_device_addr` clause.
+ /// This is because the descriptor can be rematerialized by the compiler,
+ /// and so the address of the descriptor for a given object at one place in
+ /// the code may differ from that address in another place. The contents
+ /// of the descriptor (the base address in paticular) will remain unchanged
+ /// though. Non-descriptor objects listed on the `has_device_addr` clause
+ /// can be passed to the kernel by just passing their address without any
+ /// remapping.
+ /// The adding the OMP_MAP_TO flag is done because we will always need to
+ /// map the descriptor to device, especially without device address clauses,
+ /// as without the appropriate descriptor
/// information on the device there is a risk of the kernel IR
/// requesting for various data that will not have been copied to
/// perform things like indexing. This can cause segfaults and
@@ -247,16 +256,25 @@ class MapInfoFinalizationPass
mlir::omp::TargetUpdateOp>(target))
return mapTypeFlag;
- bool hasImplicitMap =
- (llvm::omp::OpenMPOffloadMappingFlags(mapTypeFlag) &
- llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT) ==
+ llvm::omp::OpenMPOffloadMappingFlags Implicit =
+ llvm::omp::OpenMPOffloadMappingFlags(mapTypeFlag) &
llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
return llvm::to_underlying(
- hasImplicitMap
- ? llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
- llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT
- : llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO);
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_ALWAYS |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO | Implicit);
+ }
+
+ /// Check if the mapOp is present in the HasDeviceAddr clause on
+ /// the userOp. Only applies to TargetOp.
+ bool isHasDeviceAddr(mlir::omp::MapInfoOp mapOp, mlir::Operation *userOp) {
+ if (auto targetOp = llvm::dyn_cast<mlir::omp::TargetOp>(userOp)) {
+ for (mlir::Value hda : targetOp.getHasDeviceAddrVars()) {
+ if (hda.getDefiningOp() == mapOp)
+ return true;
+ }
+ }
+ return false;
}
mlir::omp::MapInfoOp genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
@@ -268,8 +286,7 @@ class MapInfoFinalizationPass
// TODO: map the addendum segment of the descriptor, similarly to the
// base address/data pointer member.
mlir::Value descriptor = getDescriptorFromBoxMap(op, builder);
- auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
- op.getMapType().value_or(0), builder);
+
mlir::ArrayAttr newMembersAttr;
mlir::SmallVector<mlir::Value> newMembers;
llvm::SmallVector<llvm::SmallVector<int64_t>> memberIndices;
@@ -286,6 +303,12 @@ class MapInfoFinalizationPass
// member information to now have one new member for the base address, or
// we are expanding a parent that is a descriptor and we have to adjust
// all of its members to reflect the insertion of the base address.
+ //
+ // If we're expanding a top-level descriptor for a map operation that
+ // resulted from "has_device_addr" clause, then we want the base pointer
+ // from the descriptor to be used verbatim, i.e. without additional
+ // remapping. To avoid this remapping, simply don't generate any map
+ // information for the descriptor members.
if (!mapMemberUsers.empty()) {
// Currently, there should only be one user per map when this pass
// is executed. Either a parent map, holding the current map in its
@@ -296,6 +319,8 @@ class MapInfoFinalizationPass
assert(mapMemberUsers.size() == 1 &&
"OMPMapInfoFinalization currently only supports single users of a "
"MapInfoOp");
+ auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
+ op.getMapType().value_or(0), builder);
ParentAndPlacement mapUser = mapMemberUsers[0];
adjustMemberIndices(memberIndices, mapUser.index);
llvm::SmallVector<mlir::Value> newMemberOps;
@@ -307,7 +332,9 @@ class MapInfoFinalizationPass
mapUser.parent.getMembersMutable().assign(newMemberOps);
mapUser.parent.setMembersIndexAttr(
builder.create2DI64ArrayAttr(memberIndices));
- } else {
+ } else if (!isHasDeviceAddr(op, target)) {
+ auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
+ op.getMapType().value_or(0), builder);
newMembers.push_back(baseAddr);
if (!op.getMembers().empty()) {
for (auto &indices : memberIndices)
@@ -448,6 +475,12 @@ class MapInfoFinalizationPass
addOperands(useDevPtrMutableOpRange, target,
argIface.getUseDevicePtrBlockArgsStart() +
argIface.numUseDevicePtrBlockArgs());
+ } else if (auto targetOp = llvm::dyn_cast<mlir::omp::TargetOp>(target)) {
+ mlir::MutableOperandRange hasDevAddrMutableOpRange =
+ targetOp.getHasDeviceAddrVarsMutable();
+ addOperands(hasDevAddrMutableOpRange, target,
+ argIface.getHasDeviceAddrBlockArgsStart() +
+ argIface.numHasDeviceAddrBlockArgs());
}
}
diff --git a/flang/lib/Support/OpenMP-utils.cpp b/flang/lib/Support/OpenMP-utils.cpp
index 178a6e38dc0f2..97e7723f0be8c 100644
--- a/flang/lib/Support/OpenMP-utils.cpp
+++ b/flang/lib/Support/OpenMP-utils.cpp
@@ -18,10 +18,11 @@ mlir::Block *genEntryBlock(mlir::OpBuilder &builder, const EntryBlockArgs &args,
llvm::SmallVector<mlir::Type> types;
llvm::SmallVector<mlir::Location> locs;
- unsigned numVars = args.hostEvalVars.size() + args.inReduction.vars.size() +
- args.map.vars.size() + args.priv.vars.size() +
- args.reduction.vars.size() + args.taskReduction.vars.size() +
- args.useDeviceAddr.vars.size() + args.useDevicePtr.vars.size();
+ unsigned numVars = args.hasDeviceAddr.vars.size() + args.hostEvalVars.size() +
+ args.inReduction.vars.size() + args.map.vars.size() +
+ args.priv.vars.size() + args.reduction.vars.size() +
+ args.taskReduction.vars.size() + args.useDeviceAddr.vars.size() +
+ args.useDevicePtr.vars.size();
types.reserve(numVars);
locs.reserve(numVars);
@@ -34,6 +35,7 @@ mlir::Block *genEntryBlock(mlir::OpBuilder &builder, const EntryBlockArgs &args,
// Populate block arguments in clause name alphabetical order to match
// expected order by the BlockArgOpenMPOpInterface.
+ extractTypeLoc(args.hasDeviceAddr.vars);
extractTypeLoc(args.hostEvalVars);
extractTypeLoc(args.inReduction.vars);
extractTypeLoc(args.map.vars);
diff --git a/flang/test/Integration/OpenMP/map-types-and-sizes.f90 b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
index e0221ef254192..78e25b45f3845 100644
--- a/flang/test/Integration/OpenMP/map-types-and-sizes.f90
+++ b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
@@ -31,7 +31,7 @@ subroutine mapType_array
end subroutine mapType_array
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711173, i64 281474976711171, i64 281474976711187]
subroutine mapType_ptr
integer, pointer :: a
!$omp target
@@ -40,7 +40,7 @@ subroutine mapType_ptr
end subroutine mapType_ptr
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711173, i64 281474976711171, i64 281474976711187]
subroutine mapType_allocatable
integer, allocatable :: a
allocate(a)
@@ -51,7 +51,7 @@ subroutine mapType_allocatable
end subroutine mapType_allocatable
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_ptr_explicit
integer, pointer :: a
!$omp target map(tofrom: a)
@@ -60,7 +60,7 @@ subroutine mapType_ptr_explicit
end subroutine mapType_ptr_explicit
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_allocatable_explicit
integer, allocatable :: a
allocate(a)
@@ -212,7 +212,7 @@ subroutine mapType_derived_explicit_nested_member_with_bounds
end subroutine mapType_derived_explicit_nested_member_with_bounds
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 48, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_derived_type_alloca()
type :: one_layer
real(4) :: i
@@ -233,7 +233,7 @@ subroutine mapType_derived_type_alloca()
end subroutine
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [8 x i64] [i64 0, i64 40, i64 8, i64 0, i64 48, i64 8, i64 0, i64 4]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [8 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675, i64 281474976710657, i64 281474976710659, i64 281474976710675, i64 281474976710659]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [8 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675, i64 281474976710661, i64 281474976710659, i64 281474976710675, i64 281474976710659]
subroutine mapType_alloca_derived_type()
type :: one_layer
real(4) :: i
@@ -256,7 +256,7 @@ subroutine mapType_alloca_derived_type()
end subroutine
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [8 x i64] [i64 0, i64 40, i64 8, i64 0, i64 48, i64 8, i64 0, i64 4]
-!CHECK: @.offload_maptypes{{.*}...
[truncated]
|
@llvm/pr-subscribers-mlir-llvm Author: Krzysztof Parzyszek (kparzysz) ChangesThe HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, When entering a target region, Some Fortran objects use descriptors in their in-memory representation. If Patch is 79.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128568.diff 20 Files Affected:
diff --git a/flang/include/flang/Support/OpenMP-utils.h b/flang/include/flang/Support/OpenMP-utils.h
index d8f82e1cf94f1..7805626632487 100644
--- a/flang/include/flang/Support/OpenMP-utils.h
+++ b/flang/include/flang/Support/OpenMP-utils.h
@@ -35,6 +35,7 @@ struct EntryBlockArgsEntry {
/// arguments associated to all clauses that can define them.
struct EntryBlockArgs {
llvm::ArrayRef<mlir::Value> hostEvalVars;
+ EntryBlockArgsEntry hasDeviceAddr;
EntryBlockArgsEntry inReduction;
EntryBlockArgsEntry map;
EntryBlockArgsEntry priv;
@@ -44,21 +45,21 @@ struct EntryBlockArgs {
EntryBlockArgsEntry useDevicePtr;
bool isValid() const {
- return inReduction.isValid() && map.isValid() && priv.isValid() &&
- reduction.isValid() && taskReduction.isValid() &&
+ return hasDeviceAddr.isValid() && inReduction.isValid() && map.isValid() &&
+ priv.isValid() && reduction.isValid() && taskReduction.isValid() &&
useDeviceAddr.isValid() && useDevicePtr.isValid();
}
auto getSyms() const {
- return llvm::concat<const semantics::Symbol *const>(inReduction.syms,
- map.syms, priv.syms, reduction.syms, taskReduction.syms,
- useDeviceAddr.syms, useDevicePtr.syms);
+ return llvm::concat<const semantics::Symbol *const>(hasDeviceAddr.syms,
+ inReduction.syms, map.syms, priv.syms, reduction.syms,
+ taskReduction.syms, useDeviceAddr.syms, useDevicePtr.syms);
}
auto getVars() const {
- return llvm::concat<const mlir::Value>(hostEvalVars, inReduction.vars,
- map.vars, priv.vars, reduction.vars, taskReduction.vars,
- useDeviceAddr.vars, useDevicePtr.vars);
+ return llvm::concat<const mlir::Value>(hasDeviceAddr.vars, hostEvalVars,
+ inReduction.vars, map.vars, priv.vars, reduction.vars,
+ taskReduction.vars, useDeviceAddr.vars, useDevicePtr.vars);
}
};
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index e21d299570b86..48610fabd4288 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -913,14 +913,34 @@ bool ClauseProcessor::processDepend(mlir::omp::DependClauseOps &result) const {
}
bool ClauseProcessor::processHasDeviceAddr(
- mlir::omp::HasDeviceAddrClauseOps &result,
- llvm::SmallVectorImpl<const semantics::Symbol *> &isDeviceSyms) const {
- return findRepeatableClause<omp::clause::HasDeviceAddr>(
- [&](const omp::clause::HasDeviceAddr &devAddrClause,
- const parser::CharBlock &) {
- addUseDeviceClause(converter, devAddrClause.v, result.hasDeviceAddrVars,
- isDeviceSyms);
+ lower::StatementContext &stmtCtx, mlir::omp::HasDeviceAddrClauseOps &result,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &hasDeviceSyms) const {
+ // For HAS_DEVICE_ADDR objects, implicitly map the top-level entities.
+ // Their address (or the whole descriptor, if the entity had one) will be
+ // passed to the target region.
+ std::map<Object, OmpMapParentAndMemberData> parentMemberIndices;
+ bool clauseFound = findRepeatableClause<omp::clause::HasDeviceAddr>(
+ [&](const omp::clause::HasDeviceAddr &clause,
+ const parser::CharBlock &source) {
+ mlir::Location location = converter.genLocation(source);
+ llvm::omp::OpenMPOffloadMappingFlags mapTypeBits =
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
+ omp::ObjectList baseObjects;
+ llvm::transform(clause.v, std::back_inserter(baseObjects),
+ [&](const omp::Object &object) {
+ if (auto maybeBase = getBaseObject(object, semaCtx))
+ return *maybeBase;
+ return object;
+ });
+ processMapObjects(stmtCtx, location, baseObjects, mapTypeBits,
+ parentMemberIndices, result.hasDeviceAddrVars,
+ hasDeviceSyms);
});
+
+ insertChildMapInfoIntoParent(converter, semaCtx, stmtCtx, parentMemberIndices,
+ result.hasDeviceAddrVars, hasDeviceSyms);
+ return clauseFound;
}
bool ClauseProcessor::processIf(
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 889a09a8f0cd8..5de8798d25945 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -71,8 +71,9 @@ class ClauseProcessor {
bool processFinal(lower::StatementContext &stmtCtx,
mlir::omp::FinalClauseOps &result) const;
bool processHasDeviceAddr(
+ lower::StatementContext &stmtCtx,
mlir::omp::HasDeviceAddrClauseOps &result,
- llvm::SmallVectorImpl<const semantics::Symbol *> &isDeviceSyms) const;
+ llvm::SmallVectorImpl<const semantics::Symbol *> &hasDeviceSyms) const;
bool processHint(mlir::omp::HintClauseOps &result) const;
bool processInclusive(mlir::Location currentLocation,
mlir::omp::InclusiveClauseOps &result) const;
diff --git a/flang/lib/Lower/OpenMP/Clauses.cpp b/flang/lib/Lower/OpenMP/Clauses.cpp
index 831ba23870360..de60146e10f16 100644
--- a/flang/lib/Lower/OpenMP/Clauses.cpp
+++ b/flang/lib/Lower/OpenMP/Clauses.cpp
@@ -159,8 +159,8 @@ std::optional<Object> getBaseObject(const Object &object,
return Object{SymbolAndDesignatorExtractor::symbol_addr(comp->symbol()),
ea.Designate(evaluate::DataRef{
SymbolAndDesignatorExtractor::AsRvalueRef(*comp)})};
- } else if (base.UnwrapSymbolRef()) {
- return std::nullopt;
+ } else if (auto *symRef = base.UnwrapSymbolRef()) {
+ return Object{const_cast<semantics::Symbol *>(&**symRef), std::nullopt};
}
} else {
assert(std::holds_alternative<evaluate::CoarrayRef>(ref.u) &&
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index e0d23fc53eeca..c4e8f314400d0 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -321,6 +321,7 @@ static void bindEntryBlockArgs(lower::AbstractConverter &converter,
// Process in clause name alphabetical order to match block arguments order.
// Do not bind host_eval variables because they cannot be used inside of the
// corresponding region, except for very specific cases handled separately.
+ bindMapLike(args.hasDeviceAddr.syms, op.getHasDeviceAddrBlockArgs());
bindPrivateLike(args.inReduction.syms, args.inReduction.vars,
op.getInReductionBlockArgs());
bindMapLike(args.map.syms, op.getMapBlockArgs());
@@ -1645,7 +1646,7 @@ static void genTargetClauses(
cp.processBare(clauseOps);
cp.processDepend(clauseOps);
cp.processDevice(stmtCtx, clauseOps);
- cp.processHasDeviceAddr(clauseOps, hasDeviceAddrSyms);
+ cp.processHasDeviceAddr(stmtCtx, clauseOps, hasDeviceAddrSyms);
if (!hostEvalInfo.empty()) {
// Only process host_eval if compiling for the host device.
processHostEvalClauses(converter, semaCtx, stmtCtx, eval, loc);
@@ -2191,6 +2192,10 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
if (dsp.getAllSymbolsToPrivatize().contains(&sym))
return;
+ // These symbols are mapped individually in processHasDeviceAddr.
+ if (llvm::is_contained(hasDeviceAddrSyms, &sym))
+ return;
+
// Structure component symbols don't have bindings, and can only be
// explicitly mapped individually. If a member is captured implicitly
// we map the entirety of the derived type when we find its symbol.
@@ -2281,8 +2286,9 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
auto targetOp = firOpBuilder.create<mlir::omp::TargetOp>(loc, clauseOps);
- llvm::SmallVector<mlir::Value> mapBaseValues;
+ llvm::SmallVector<mlir::Value> mapBaseValues, hasDeviceAddrBaseValues;
extractMappedBaseValues(clauseOps.mapVars, mapBaseValues);
+ extractMappedBaseValues(clauseOps.hasDeviceAddrVars, hasDeviceAddrBaseValues);
EntryBlockArgs args;
args.hostEvalVars = clauseOps.hostEvalVars;
@@ -2291,6 +2297,8 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
args.map.vars = mapBaseValues;
args.priv.syms = dsp.getDelayedPrivSymbols();
args.priv.vars = clauseOps.privateVars;
+ args.hasDeviceAddr.syms = hasDeviceAddrSyms;
+ args.hasDeviceAddr.vars = hasDeviceAddrBaseValues;
genBodyOfTargetOp(converter, symTable, semaCtx, eval, targetOp, args, loc,
queue, item, dsp);
diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index beea7543e54b3..0d8026eed17ad 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -224,10 +224,19 @@ class MapInfoFinalizationPass
}
/// Adjusts the descriptor's map type. The main alteration that is done
- /// currently is transforming the map type to `OMP_MAP_TO` where possible.
- /// This is because we will always need to map the descriptor to device
- /// (or at the very least it seems to be the case currently with the
- /// current lowered kernel IR), as without the appropriate descriptor
+ /// currently is transforming the map type to `OMP_MAP_TO` where possible,
+ /// plus adding OMP_MAP_ALWAYS flag. Descriptors will always be copied,
+ /// even if the object was listed on the `has_device_addr` clause.
+ /// This is because the descriptor can be rematerialized by the compiler,
+ /// and so the address of the descriptor for a given object at one place in
+ /// the code may differ from that address in another place. The contents
+ /// of the descriptor (the base address in paticular) will remain unchanged
+ /// though. Non-descriptor objects listed on the `has_device_addr` clause
+ /// can be passed to the kernel by just passing their address without any
+ /// remapping.
+ /// The adding the OMP_MAP_TO flag is done because we will always need to
+ /// map the descriptor to device, especially without device address clauses,
+ /// as without the appropriate descriptor
/// information on the device there is a risk of the kernel IR
/// requesting for various data that will not have been copied to
/// perform things like indexing. This can cause segfaults and
@@ -247,16 +256,25 @@ class MapInfoFinalizationPass
mlir::omp::TargetUpdateOp>(target))
return mapTypeFlag;
- bool hasImplicitMap =
- (llvm::omp::OpenMPOffloadMappingFlags(mapTypeFlag) &
- llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT) ==
+ llvm::omp::OpenMPOffloadMappingFlags Implicit =
+ llvm::omp::OpenMPOffloadMappingFlags(mapTypeFlag) &
llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
return llvm::to_underlying(
- hasImplicitMap
- ? llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
- llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT
- : llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO);
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_ALWAYS |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO | Implicit);
+ }
+
+ /// Check if the mapOp is present in the HasDeviceAddr clause on
+ /// the userOp. Only applies to TargetOp.
+ bool isHasDeviceAddr(mlir::omp::MapInfoOp mapOp, mlir::Operation *userOp) {
+ if (auto targetOp = llvm::dyn_cast<mlir::omp::TargetOp>(userOp)) {
+ for (mlir::Value hda : targetOp.getHasDeviceAddrVars()) {
+ if (hda.getDefiningOp() == mapOp)
+ return true;
+ }
+ }
+ return false;
}
mlir::omp::MapInfoOp genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
@@ -268,8 +286,7 @@ class MapInfoFinalizationPass
// TODO: map the addendum segment of the descriptor, similarly to the
// base address/data pointer member.
mlir::Value descriptor = getDescriptorFromBoxMap(op, builder);
- auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
- op.getMapType().value_or(0), builder);
+
mlir::ArrayAttr newMembersAttr;
mlir::SmallVector<mlir::Value> newMembers;
llvm::SmallVector<llvm::SmallVector<int64_t>> memberIndices;
@@ -286,6 +303,12 @@ class MapInfoFinalizationPass
// member information to now have one new member for the base address, or
// we are expanding a parent that is a descriptor and we have to adjust
// all of its members to reflect the insertion of the base address.
+ //
+ // If we're expanding a top-level descriptor for a map operation that
+ // resulted from "has_device_addr" clause, then we want the base pointer
+ // from the descriptor to be used verbatim, i.e. without additional
+ // remapping. To avoid this remapping, simply don't generate any map
+ // information for the descriptor members.
if (!mapMemberUsers.empty()) {
// Currently, there should only be one user per map when this pass
// is executed. Either a parent map, holding the current map in its
@@ -296,6 +319,8 @@ class MapInfoFinalizationPass
assert(mapMemberUsers.size() == 1 &&
"OMPMapInfoFinalization currently only supports single users of a "
"MapInfoOp");
+ auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
+ op.getMapType().value_or(0), builder);
ParentAndPlacement mapUser = mapMemberUsers[0];
adjustMemberIndices(memberIndices, mapUser.index);
llvm::SmallVector<mlir::Value> newMemberOps;
@@ -307,7 +332,9 @@ class MapInfoFinalizationPass
mapUser.parent.getMembersMutable().assign(newMemberOps);
mapUser.parent.setMembersIndexAttr(
builder.create2DI64ArrayAttr(memberIndices));
- } else {
+ } else if (!isHasDeviceAddr(op, target)) {
+ auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
+ op.getMapType().value_or(0), builder);
newMembers.push_back(baseAddr);
if (!op.getMembers().empty()) {
for (auto &indices : memberIndices)
@@ -448,6 +475,12 @@ class MapInfoFinalizationPass
addOperands(useDevPtrMutableOpRange, target,
argIface.getUseDevicePtrBlockArgsStart() +
argIface.numUseDevicePtrBlockArgs());
+ } else if (auto targetOp = llvm::dyn_cast<mlir::omp::TargetOp>(target)) {
+ mlir::MutableOperandRange hasDevAddrMutableOpRange =
+ targetOp.getHasDeviceAddrVarsMutable();
+ addOperands(hasDevAddrMutableOpRange, target,
+ argIface.getHasDeviceAddrBlockArgsStart() +
+ argIface.numHasDeviceAddrBlockArgs());
}
}
diff --git a/flang/lib/Support/OpenMP-utils.cpp b/flang/lib/Support/OpenMP-utils.cpp
index 178a6e38dc0f2..97e7723f0be8c 100644
--- a/flang/lib/Support/OpenMP-utils.cpp
+++ b/flang/lib/Support/OpenMP-utils.cpp
@@ -18,10 +18,11 @@ mlir::Block *genEntryBlock(mlir::OpBuilder &builder, const EntryBlockArgs &args,
llvm::SmallVector<mlir::Type> types;
llvm::SmallVector<mlir::Location> locs;
- unsigned numVars = args.hostEvalVars.size() + args.inReduction.vars.size() +
- args.map.vars.size() + args.priv.vars.size() +
- args.reduction.vars.size() + args.taskReduction.vars.size() +
- args.useDeviceAddr.vars.size() + args.useDevicePtr.vars.size();
+ unsigned numVars = args.hasDeviceAddr.vars.size() + args.hostEvalVars.size() +
+ args.inReduction.vars.size() + args.map.vars.size() +
+ args.priv.vars.size() + args.reduction.vars.size() +
+ args.taskReduction.vars.size() + args.useDeviceAddr.vars.size() +
+ args.useDevicePtr.vars.size();
types.reserve(numVars);
locs.reserve(numVars);
@@ -34,6 +35,7 @@ mlir::Block *genEntryBlock(mlir::OpBuilder &builder, const EntryBlockArgs &args,
// Populate block arguments in clause name alphabetical order to match
// expected order by the BlockArgOpenMPOpInterface.
+ extractTypeLoc(args.hasDeviceAddr.vars);
extractTypeLoc(args.hostEvalVars);
extractTypeLoc(args.inReduction.vars);
extractTypeLoc(args.map.vars);
diff --git a/flang/test/Integration/OpenMP/map-types-and-sizes.f90 b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
index e0221ef254192..78e25b45f3845 100644
--- a/flang/test/Integration/OpenMP/map-types-and-sizes.f90
+++ b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
@@ -31,7 +31,7 @@ subroutine mapType_array
end subroutine mapType_array
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711173, i64 281474976711171, i64 281474976711187]
subroutine mapType_ptr
integer, pointer :: a
!$omp target
@@ -40,7 +40,7 @@ subroutine mapType_ptr
end subroutine mapType_ptr
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711173, i64 281474976711171, i64 281474976711187]
subroutine mapType_allocatable
integer, allocatable :: a
allocate(a)
@@ -51,7 +51,7 @@ subroutine mapType_allocatable
end subroutine mapType_allocatable
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_ptr_explicit
integer, pointer :: a
!$omp target map(tofrom: a)
@@ -60,7 +60,7 @@ subroutine mapType_ptr_explicit
end subroutine mapType_ptr_explicit
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_allocatable_explicit
integer, allocatable :: a
allocate(a)
@@ -212,7 +212,7 @@ subroutine mapType_derived_explicit_nested_member_with_bounds
end subroutine mapType_derived_explicit_nested_member_with_bounds
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 48, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_derived_type_alloca()
type :: one_layer
real(4) :: i
@@ -233,7 +233,7 @@ subroutine mapType_derived_type_alloca()
end subroutine
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [8 x i64] [i64 0, i64 40, i64 8, i64 0, i64 48, i64 8, i64 0, i64 4]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [8 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675, i64 281474976710657, i64 281474976710659, i64 281474976710675, i64 281474976710659]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [8 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675, i64 281474976710661, i64 281474976710659, i64 281474976710675, i64 281474976710659]
subroutine mapType_alloca_derived_type()
type :: one_layer
real(4) :: i
@@ -256,7 +256,7 @@ subroutine mapType_alloca_derived_type()
end subroutine
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [8 x i64] [i64 0, i64 40, i64 8, i64 0, i64 48, i64 8, i64 0, i64 4]
-!CHECK: @.offload_maptypes{{.*}...
[truncated]
|
@llvm/pr-subscribers-flang-openmp Author: Krzysztof Parzyszek (kparzysz) ChangesThe HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, When entering a target region, Some Fortran objects use descriptors in their in-memory representation. If Patch is 79.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128568.diff 20 Files Affected:
diff --git a/flang/include/flang/Support/OpenMP-utils.h b/flang/include/flang/Support/OpenMP-utils.h
index d8f82e1cf94f1..7805626632487 100644
--- a/flang/include/flang/Support/OpenMP-utils.h
+++ b/flang/include/flang/Support/OpenMP-utils.h
@@ -35,6 +35,7 @@ struct EntryBlockArgsEntry {
/// arguments associated to all clauses that can define them.
struct EntryBlockArgs {
llvm::ArrayRef<mlir::Value> hostEvalVars;
+ EntryBlockArgsEntry hasDeviceAddr;
EntryBlockArgsEntry inReduction;
EntryBlockArgsEntry map;
EntryBlockArgsEntry priv;
@@ -44,21 +45,21 @@ struct EntryBlockArgs {
EntryBlockArgsEntry useDevicePtr;
bool isValid() const {
- return inReduction.isValid() && map.isValid() && priv.isValid() &&
- reduction.isValid() && taskReduction.isValid() &&
+ return hasDeviceAddr.isValid() && inReduction.isValid() && map.isValid() &&
+ priv.isValid() && reduction.isValid() && taskReduction.isValid() &&
useDeviceAddr.isValid() && useDevicePtr.isValid();
}
auto getSyms() const {
- return llvm::concat<const semantics::Symbol *const>(inReduction.syms,
- map.syms, priv.syms, reduction.syms, taskReduction.syms,
- useDeviceAddr.syms, useDevicePtr.syms);
+ return llvm::concat<const semantics::Symbol *const>(hasDeviceAddr.syms,
+ inReduction.syms, map.syms, priv.syms, reduction.syms,
+ taskReduction.syms, useDeviceAddr.syms, useDevicePtr.syms);
}
auto getVars() const {
- return llvm::concat<const mlir::Value>(hostEvalVars, inReduction.vars,
- map.vars, priv.vars, reduction.vars, taskReduction.vars,
- useDeviceAddr.vars, useDevicePtr.vars);
+ return llvm::concat<const mlir::Value>(hasDeviceAddr.vars, hostEvalVars,
+ inReduction.vars, map.vars, priv.vars, reduction.vars,
+ taskReduction.vars, useDeviceAddr.vars, useDevicePtr.vars);
}
};
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index e21d299570b86..48610fabd4288 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -913,14 +913,34 @@ bool ClauseProcessor::processDepend(mlir::omp::DependClauseOps &result) const {
}
bool ClauseProcessor::processHasDeviceAddr(
- mlir::omp::HasDeviceAddrClauseOps &result,
- llvm::SmallVectorImpl<const semantics::Symbol *> &isDeviceSyms) const {
- return findRepeatableClause<omp::clause::HasDeviceAddr>(
- [&](const omp::clause::HasDeviceAddr &devAddrClause,
- const parser::CharBlock &) {
- addUseDeviceClause(converter, devAddrClause.v, result.hasDeviceAddrVars,
- isDeviceSyms);
+ lower::StatementContext &stmtCtx, mlir::omp::HasDeviceAddrClauseOps &result,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &hasDeviceSyms) const {
+ // For HAS_DEVICE_ADDR objects, implicitly map the top-level entities.
+ // Their address (or the whole descriptor, if the entity had one) will be
+ // passed to the target region.
+ std::map<Object, OmpMapParentAndMemberData> parentMemberIndices;
+ bool clauseFound = findRepeatableClause<omp::clause::HasDeviceAddr>(
+ [&](const omp::clause::HasDeviceAddr &clause,
+ const parser::CharBlock &source) {
+ mlir::Location location = converter.genLocation(source);
+ llvm::omp::OpenMPOffloadMappingFlags mapTypeBits =
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
+ omp::ObjectList baseObjects;
+ llvm::transform(clause.v, std::back_inserter(baseObjects),
+ [&](const omp::Object &object) {
+ if (auto maybeBase = getBaseObject(object, semaCtx))
+ return *maybeBase;
+ return object;
+ });
+ processMapObjects(stmtCtx, location, baseObjects, mapTypeBits,
+ parentMemberIndices, result.hasDeviceAddrVars,
+ hasDeviceSyms);
});
+
+ insertChildMapInfoIntoParent(converter, semaCtx, stmtCtx, parentMemberIndices,
+ result.hasDeviceAddrVars, hasDeviceSyms);
+ return clauseFound;
}
bool ClauseProcessor::processIf(
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 889a09a8f0cd8..5de8798d25945 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -71,8 +71,9 @@ class ClauseProcessor {
bool processFinal(lower::StatementContext &stmtCtx,
mlir::omp::FinalClauseOps &result) const;
bool processHasDeviceAddr(
+ lower::StatementContext &stmtCtx,
mlir::omp::HasDeviceAddrClauseOps &result,
- llvm::SmallVectorImpl<const semantics::Symbol *> &isDeviceSyms) const;
+ llvm::SmallVectorImpl<const semantics::Symbol *> &hasDeviceSyms) const;
bool processHint(mlir::omp::HintClauseOps &result) const;
bool processInclusive(mlir::Location currentLocation,
mlir::omp::InclusiveClauseOps &result) const;
diff --git a/flang/lib/Lower/OpenMP/Clauses.cpp b/flang/lib/Lower/OpenMP/Clauses.cpp
index 831ba23870360..de60146e10f16 100644
--- a/flang/lib/Lower/OpenMP/Clauses.cpp
+++ b/flang/lib/Lower/OpenMP/Clauses.cpp
@@ -159,8 +159,8 @@ std::optional<Object> getBaseObject(const Object &object,
return Object{SymbolAndDesignatorExtractor::symbol_addr(comp->symbol()),
ea.Designate(evaluate::DataRef{
SymbolAndDesignatorExtractor::AsRvalueRef(*comp)})};
- } else if (base.UnwrapSymbolRef()) {
- return std::nullopt;
+ } else if (auto *symRef = base.UnwrapSymbolRef()) {
+ return Object{const_cast<semantics::Symbol *>(&**symRef), std::nullopt};
}
} else {
assert(std::holds_alternative<evaluate::CoarrayRef>(ref.u) &&
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index e0d23fc53eeca..c4e8f314400d0 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -321,6 +321,7 @@ static void bindEntryBlockArgs(lower::AbstractConverter &converter,
// Process in clause name alphabetical order to match block arguments order.
// Do not bind host_eval variables because they cannot be used inside of the
// corresponding region, except for very specific cases handled separately.
+ bindMapLike(args.hasDeviceAddr.syms, op.getHasDeviceAddrBlockArgs());
bindPrivateLike(args.inReduction.syms, args.inReduction.vars,
op.getInReductionBlockArgs());
bindMapLike(args.map.syms, op.getMapBlockArgs());
@@ -1645,7 +1646,7 @@ static void genTargetClauses(
cp.processBare(clauseOps);
cp.processDepend(clauseOps);
cp.processDevice(stmtCtx, clauseOps);
- cp.processHasDeviceAddr(clauseOps, hasDeviceAddrSyms);
+ cp.processHasDeviceAddr(stmtCtx, clauseOps, hasDeviceAddrSyms);
if (!hostEvalInfo.empty()) {
// Only process host_eval if compiling for the host device.
processHostEvalClauses(converter, semaCtx, stmtCtx, eval, loc);
@@ -2191,6 +2192,10 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
if (dsp.getAllSymbolsToPrivatize().contains(&sym))
return;
+ // These symbols are mapped individually in processHasDeviceAddr.
+ if (llvm::is_contained(hasDeviceAddrSyms, &sym))
+ return;
+
// Structure component symbols don't have bindings, and can only be
// explicitly mapped individually. If a member is captured implicitly
// we map the entirety of the derived type when we find its symbol.
@@ -2281,8 +2286,9 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
auto targetOp = firOpBuilder.create<mlir::omp::TargetOp>(loc, clauseOps);
- llvm::SmallVector<mlir::Value> mapBaseValues;
+ llvm::SmallVector<mlir::Value> mapBaseValues, hasDeviceAddrBaseValues;
extractMappedBaseValues(clauseOps.mapVars, mapBaseValues);
+ extractMappedBaseValues(clauseOps.hasDeviceAddrVars, hasDeviceAddrBaseValues);
EntryBlockArgs args;
args.hostEvalVars = clauseOps.hostEvalVars;
@@ -2291,6 +2297,8 @@ genTargetOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
args.map.vars = mapBaseValues;
args.priv.syms = dsp.getDelayedPrivSymbols();
args.priv.vars = clauseOps.privateVars;
+ args.hasDeviceAddr.syms = hasDeviceAddrSyms;
+ args.hasDeviceAddr.vars = hasDeviceAddrBaseValues;
genBodyOfTargetOp(converter, symTable, semaCtx, eval, targetOp, args, loc,
queue, item, dsp);
diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index beea7543e54b3..0d8026eed17ad 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -224,10 +224,19 @@ class MapInfoFinalizationPass
}
/// Adjusts the descriptor's map type. The main alteration that is done
- /// currently is transforming the map type to `OMP_MAP_TO` where possible.
- /// This is because we will always need to map the descriptor to device
- /// (or at the very least it seems to be the case currently with the
- /// current lowered kernel IR), as without the appropriate descriptor
+ /// currently is transforming the map type to `OMP_MAP_TO` where possible,
+ /// plus adding OMP_MAP_ALWAYS flag. Descriptors will always be copied,
+ /// even if the object was listed on the `has_device_addr` clause.
+ /// This is because the descriptor can be rematerialized by the compiler,
+ /// and so the address of the descriptor for a given object at one place in
+ /// the code may differ from that address in another place. The contents
+ /// of the descriptor (the base address in paticular) will remain unchanged
+ /// though. Non-descriptor objects listed on the `has_device_addr` clause
+ /// can be passed to the kernel by just passing their address without any
+ /// remapping.
+ /// The adding the OMP_MAP_TO flag is done because we will always need to
+ /// map the descriptor to device, especially without device address clauses,
+ /// as without the appropriate descriptor
/// information on the device there is a risk of the kernel IR
/// requesting for various data that will not have been copied to
/// perform things like indexing. This can cause segfaults and
@@ -247,16 +256,25 @@ class MapInfoFinalizationPass
mlir::omp::TargetUpdateOp>(target))
return mapTypeFlag;
- bool hasImplicitMap =
- (llvm::omp::OpenMPOffloadMappingFlags(mapTypeFlag) &
- llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT) ==
+ llvm::omp::OpenMPOffloadMappingFlags Implicit =
+ llvm::omp::OpenMPOffloadMappingFlags(mapTypeFlag) &
llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT;
return llvm::to_underlying(
- hasImplicitMap
- ? llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
- llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT
- : llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO);
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_ALWAYS |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO | Implicit);
+ }
+
+ /// Check if the mapOp is present in the HasDeviceAddr clause on
+ /// the userOp. Only applies to TargetOp.
+ bool isHasDeviceAddr(mlir::omp::MapInfoOp mapOp, mlir::Operation *userOp) {
+ if (auto targetOp = llvm::dyn_cast<mlir::omp::TargetOp>(userOp)) {
+ for (mlir::Value hda : targetOp.getHasDeviceAddrVars()) {
+ if (hda.getDefiningOp() == mapOp)
+ return true;
+ }
+ }
+ return false;
}
mlir::omp::MapInfoOp genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
@@ -268,8 +286,7 @@ class MapInfoFinalizationPass
// TODO: map the addendum segment of the descriptor, similarly to the
// base address/data pointer member.
mlir::Value descriptor = getDescriptorFromBoxMap(op, builder);
- auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
- op.getMapType().value_or(0), builder);
+
mlir::ArrayAttr newMembersAttr;
mlir::SmallVector<mlir::Value> newMembers;
llvm::SmallVector<llvm::SmallVector<int64_t>> memberIndices;
@@ -286,6 +303,12 @@ class MapInfoFinalizationPass
// member information to now have one new member for the base address, or
// we are expanding a parent that is a descriptor and we have to adjust
// all of its members to reflect the insertion of the base address.
+ //
+ // If we're expanding a top-level descriptor for a map operation that
+ // resulted from "has_device_addr" clause, then we want the base pointer
+ // from the descriptor to be used verbatim, i.e. without additional
+ // remapping. To avoid this remapping, simply don't generate any map
+ // information for the descriptor members.
if (!mapMemberUsers.empty()) {
// Currently, there should only be one user per map when this pass
// is executed. Either a parent map, holding the current map in its
@@ -296,6 +319,8 @@ class MapInfoFinalizationPass
assert(mapMemberUsers.size() == 1 &&
"OMPMapInfoFinalization currently only supports single users of a "
"MapInfoOp");
+ auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
+ op.getMapType().value_or(0), builder);
ParentAndPlacement mapUser = mapMemberUsers[0];
adjustMemberIndices(memberIndices, mapUser.index);
llvm::SmallVector<mlir::Value> newMemberOps;
@@ -307,7 +332,9 @@ class MapInfoFinalizationPass
mapUser.parent.getMembersMutable().assign(newMemberOps);
mapUser.parent.setMembersIndexAttr(
builder.create2DI64ArrayAttr(memberIndices));
- } else {
+ } else if (!isHasDeviceAddr(op, target)) {
+ auto baseAddr = genBaseAddrMap(descriptor, op.getBounds(),
+ op.getMapType().value_or(0), builder);
newMembers.push_back(baseAddr);
if (!op.getMembers().empty()) {
for (auto &indices : memberIndices)
@@ -448,6 +475,12 @@ class MapInfoFinalizationPass
addOperands(useDevPtrMutableOpRange, target,
argIface.getUseDevicePtrBlockArgsStart() +
argIface.numUseDevicePtrBlockArgs());
+ } else if (auto targetOp = llvm::dyn_cast<mlir::omp::TargetOp>(target)) {
+ mlir::MutableOperandRange hasDevAddrMutableOpRange =
+ targetOp.getHasDeviceAddrVarsMutable();
+ addOperands(hasDevAddrMutableOpRange, target,
+ argIface.getHasDeviceAddrBlockArgsStart() +
+ argIface.numHasDeviceAddrBlockArgs());
}
}
diff --git a/flang/lib/Support/OpenMP-utils.cpp b/flang/lib/Support/OpenMP-utils.cpp
index 178a6e38dc0f2..97e7723f0be8c 100644
--- a/flang/lib/Support/OpenMP-utils.cpp
+++ b/flang/lib/Support/OpenMP-utils.cpp
@@ -18,10 +18,11 @@ mlir::Block *genEntryBlock(mlir::OpBuilder &builder, const EntryBlockArgs &args,
llvm::SmallVector<mlir::Type> types;
llvm::SmallVector<mlir::Location> locs;
- unsigned numVars = args.hostEvalVars.size() + args.inReduction.vars.size() +
- args.map.vars.size() + args.priv.vars.size() +
- args.reduction.vars.size() + args.taskReduction.vars.size() +
- args.useDeviceAddr.vars.size() + args.useDevicePtr.vars.size();
+ unsigned numVars = args.hasDeviceAddr.vars.size() + args.hostEvalVars.size() +
+ args.inReduction.vars.size() + args.map.vars.size() +
+ args.priv.vars.size() + args.reduction.vars.size() +
+ args.taskReduction.vars.size() + args.useDeviceAddr.vars.size() +
+ args.useDevicePtr.vars.size();
types.reserve(numVars);
locs.reserve(numVars);
@@ -34,6 +35,7 @@ mlir::Block *genEntryBlock(mlir::OpBuilder &builder, const EntryBlockArgs &args,
// Populate block arguments in clause name alphabetical order to match
// expected order by the BlockArgOpenMPOpInterface.
+ extractTypeLoc(args.hasDeviceAddr.vars);
extractTypeLoc(args.hostEvalVars);
extractTypeLoc(args.inReduction.vars);
extractTypeLoc(args.map.vars);
diff --git a/flang/test/Integration/OpenMP/map-types-and-sizes.f90 b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
index e0221ef254192..78e25b45f3845 100644
--- a/flang/test/Integration/OpenMP/map-types-and-sizes.f90
+++ b/flang/test/Integration/OpenMP/map-types-and-sizes.f90
@@ -31,7 +31,7 @@ subroutine mapType_array
end subroutine mapType_array
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711173, i64 281474976711171, i64 281474976711187]
subroutine mapType_ptr
integer, pointer :: a
!$omp target
@@ -40,7 +40,7 @@ subroutine mapType_ptr
end subroutine mapType_ptr
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711169, i64 281474976711171, i64 281474976711187]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976711173, i64 281474976711171, i64 281474976711187]
subroutine mapType_allocatable
integer, allocatable :: a
allocate(a)
@@ -51,7 +51,7 @@ subroutine mapType_allocatable
end subroutine mapType_allocatable
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_ptr_explicit
integer, pointer :: a
!$omp target map(tofrom: a)
@@ -60,7 +60,7 @@ subroutine mapType_ptr_explicit
end subroutine mapType_ptr_explicit
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 24, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_allocatable_explicit
integer, allocatable :: a
allocate(a)
@@ -212,7 +212,7 @@ subroutine mapType_derived_explicit_nested_member_with_bounds
end subroutine mapType_derived_explicit_nested_member_with_bounds
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [4 x i64] [i64 0, i64 48, i64 8, i64 0]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [4 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675]
subroutine mapType_derived_type_alloca()
type :: one_layer
real(4) :: i
@@ -233,7 +233,7 @@ subroutine mapType_derived_type_alloca()
end subroutine
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [8 x i64] [i64 0, i64 40, i64 8, i64 0, i64 48, i64 8, i64 0, i64 4]
-!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [8 x i64] [i64 32, i64 281474976710657, i64 281474976710659, i64 281474976710675, i64 281474976710657, i64 281474976710659, i64 281474976710675, i64 281474976710659]
+!CHECK: @.offload_maptypes{{.*}} = private unnamed_addr constant [8 x i64] [i64 32, i64 281474976710661, i64 281474976710659, i64 281474976710675, i64 281474976710661, i64 281474976710659, i64 281474976710675, i64 281474976710659]
subroutine mapType_alloca_derived_type()
type :: one_layer
real(4) :: i
@@ -256,7 +256,7 @@ subroutine mapType_alloca_derived_type()
end subroutine
!CHECK: @.offload_sizes{{.*}} = private unnamed_addr constant [8 x i64] [i64 0, i64 40, i64 8, i64 0, i64 48, i64 8, i64 0, i64 4]
-!CHECK: @.offload_maptypes{{.*}...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Krzysztof, this generally looks good to me. I have a few small comments and questions, though @agozillon or @TIFitis would probably know more about mapping to review changes to collectMapDataFromMapOperands
in OpenMPToLLVMIRTranslation.cpp or the MapInfoFinalizationPass
in more detail.
@@ -308,17 +308,6 @@ llvm.func @target_device(%x : i32) { | |||
|
|||
// ----- | |||
|
|||
llvm.func @target_has_device_addr(%x : !llvm.ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should now add a test to check that the OpenMP to LLVM IR translation of this clause works as expected. That could be added in openmp-llvm.mlir in this same directory, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you missed this comment, btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Sergio Afonso <[email protected]>
Co-authored-by: Sergio Afonso <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! Thank you for implementing this, and adding the always map type to the descriptor like it is downstream (or coming up with the same thought I did, and adding more credence to it!), will cut down my other PR set a bit :-)
However, worth mentioning it does in certain cases, cause issues (primarily some cases in UMT) for map, which I'll try to address in some future PRs in some manner, but not a problem for you to worry about! I would however ask you to add one or two runtime tests to the offload/test/offloading/fortran "suite", mainly just so that I am aware if any changes I (or anyone else make) break something runtime wise!
P.S. please do wait on approval from Sergio as well!
bool hasImplicitMap = | ||
(llvm::omp::OpenMPOffloadMappingFlags(mapTypeFlag) & | ||
llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_IMPLICIT) == | ||
llvm::omp::OpenMPOffloadMappingFlags Implicit = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should likely be "implicit", with a small i, to align with the case style of everything!
And as an aside to this, I have a feeling as it is just now, the always map type may break a few tests in the offloading suite (check-offload), I recollect having to implement this yet to land code blob for it: #119588 where we "cut around" the data pointer, so as not to map over it when remapping the descriptor. As unless we explicitly specify always, we should never enforce that mapping on the address portion or we'll cause problems in a lot of more complicated map cases! And even that isn't quite perfect, ran into a few edge cases in UMT, I could try and carve out the relevant bits and upstream it if you run into issues with any tests in check-offload, if that's helpful at all. P.S. the check-offload fortran tests are terrible in the sense that they don't always run depending on configuration, so do try to intentionally fail one to check if they're running! |
Yes, I remember seeing that happen with the downstream code. I was wondering why that was instead of mapping everything and then following that with PTR_AND_OBJECT for the pointer specifically. Anyways, maybe we could only mark HAS_DEVICE_ADDR descriptors as AWAYS? Would that help avoid the problem? |
In this case, just mimicking how Clang performs it for record types. We could do it that way, but we'd likely be mapping more data to device than is necessary (not that enforcing always map on the descriptor isn't doing that, but probably good to limit things where feasible), and having the support to perform that kind of mapping is still useful as it applies to other types of mapping, not just this specific use-case :-) And we again go back to the scenario of likely having to know that it's a descriptor we're mapping in the lowering to perform that kind of "specialized" mapping to keep it isolated to these Fortran cases, which we currently don't do upstream (was some pushback originally, but might change in the near future). Sure, I imagine it would work and I'm happy with that! Just please do check it doesn't break any offload cases when it's in place. The only thing is it might pose some scenarios where using has_device_addr in conjunction with a map of the same variable causes some issues, but I am not too sure, and that'll be fixed whenever the PR stack lands, so it seems more than fine! |
This version limits the use of the ALWAYS flag only to the descriptors of objects listed on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Krzysztof, I only have very minor comments, so no need for another look by me. LGTM!
@@ -4780,6 +4823,12 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder, | |||
moduleTranslation.lookupValue(mapInfoOp.getVarPtr()); | |||
moduleTranslation.mapValue(arg, mapOpValue); | |||
} | |||
for (auto [arg, mapOp] : llvm::zip_equal(hdaBlockArgs, hdaVars)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Feel free to ignore, but since this loop and the one above have the same body, maybe it makes sense to do instead:
for (auto [arg, mapOp] : llvm::zip_equal(llvm::concat(mapBlockArgs, hdaBlockArgs), llvm::concat(mapVars, hdaVars))) {
auto mapInfoOp = cast<omp::MapInfoOp>(mapOp.getDefiningOp());
llvm::Value *mapOpValue =
moduleTranslation.lookupValue(mapInfoOp.getVarPtr());
moduleTranslation.mapValue(arg, mapOpValue);
}
integer :: res1, res2 | ||
|
||
nullify (first_scalar_device_addr) | ||
errors = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this value is never updated, so you can probably delete it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically,
has_device_addr(x)
means that (in C/C++ terms)&x
is a device address.When entering a target region,
x
does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause.Some Fortran objects use descriptors in their in-memory representation. If
x
had a descriptor, both the descriptor and the contents ofx
would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects likex
, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged).