Skip to content

Commit 2ecb2c7

Browse files
wenleixfacebook-github-bot
authored andcommitted
Pass Scalar by reference (#53583)
Summary: Pull Request resolved: #53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see #53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void(*)(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void (*)(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)optional<Scalar>([, \)])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d
1 parent 4dd1c72 commit 2ecb2c7

File tree

133 files changed

+846
-828
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

133 files changed

+846
-828
lines changed

aten/src/ATen/BatchingRegistrations.cpp

+21-21
Original file line numberDiff line numberDiff line change
@@ -187,19 +187,19 @@ std::vector<Tensor> chunk_batching_rule(const Tensor& self, int64_t chunks, int6
187187
return result;
188188
}
189189

190-
Tensor clamp_batching_rule(const Tensor& self, optional<Scalar> min, optional<Scalar> max) {
190+
Tensor clamp_batching_rule(const Tensor& self, const optional<Scalar>& min, const optional<Scalar>& max) {
191191
auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self);
192192
auto result = at::clamp(self_physical.tensor(), min, max);
193193
return self_physical.getPhysicalToLogicalMap().apply(result);
194194
}
195195

196-
Tensor clamp_min_batching_rule(const Tensor& self, Scalar min) {
196+
Tensor clamp_min_batching_rule(const Tensor& self, const Scalar& min) {
197197
auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self);
198198
auto result = at::clamp_min(self_physical.tensor(), min);
199199
return self_physical.getPhysicalToLogicalMap().apply(result);
200200
}
201201

202-
Tensor clamp_max_batching_rule(const Tensor& self, Scalar max) {
202+
Tensor clamp_max_batching_rule(const Tensor& self, const Scalar& max) {
203203
auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self);
204204
auto result = at::clamp_max(self_physical.tensor(), max);
205205
return self_physical.getPhysicalToLogicalMap().apply(result);
@@ -233,7 +233,7 @@ Tensor unsqueeze_batching_rule(const Tensor& self, int64_t dim) {
233233
return self_physical.getPhysicalToLogicalMap().apply(result);
234234
}
235235

236-
Tensor& fill_inplace_scalar_batching_rule(Tensor& self, Scalar value) {
236+
Tensor& fill_inplace_scalar_batching_rule(Tensor& self, const Scalar& value) {
237237
auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self);
238238
self_physical.tensor().fill_(value);
239239
return self;
@@ -708,7 +708,7 @@ Tensor unwrap_and_call_method(const Tensor& input, ExtraArgs... extra_args) {
708708
return makeBatched(output_physical, BatchDims(old_bdims.begin(), old_bdims.end()));
709709
}
710710

711-
Tensor pow_scalar_Tensor_batching_rule(Scalar other, const Tensor& self) {
711+
Tensor pow_scalar_Tensor_batching_rule(const Scalar& other, const Tensor& self) {
712712
auto* self_batched = unsafeGetBatchedImpl(self);
713713
auto output_physical = at::pow(other, self_batched->value());
714714
auto old_bdims = self_batched->bdims();
@@ -1120,36 +1120,36 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) {
11201120
#undef TO_BATCHING_RULE
11211121
m.impl("clone", clone_batching_rule);
11221122

1123-
using TensorTensorScalarType = Tensor (*)(const Tensor&, const Tensor&, Scalar);
1123+
using TensorTensorScalarType = Tensor (*)(const Tensor&, const Tensor&, const Scalar&);
11241124
using TensorTensorType = Tensor (*)(const Tensor&, const Tensor&);
1125-
using TensorScalarType = Tensor (*)(const Tensor&, Scalar);
1125+
using TensorScalarType = Tensor (*)(const Tensor&, const Scalar&);
11261126

11271127
#define BINARY_POINTWISE(op) \
11281128
m.impl(#op".Tensor", binary_pointwise_batching_rule<TensorTensorType, at::op>); \
1129-
m.impl(#op".Scalar", unwrap_and_call<TensorScalarType, at::op, Scalar>);
1129+
m.impl(#op".Scalar", unwrap_and_call<TensorScalarType, at::op, const Scalar&>);
11301130
#define BINARY_POINTWISE_VA(op, ...) \
11311131
{ \
11321132
using Binop = Tensor (*)(const Tensor&, const Tensor&, __VA_ARGS__); \
1133-
using Unop = Tensor (*)(const Tensor&, Scalar, __VA_ARGS__); \
1133+
using Unop = Tensor (*)(const Tensor&, const Scalar&, __VA_ARGS__); \
11341134
m.impl(#op".Tensor", binary_pointwise_batching_rule<Binop, at::op, __VA_ARGS__>); \
1135-
m.impl(#op".Scalar", unwrap_and_call<Unop, at::op, Scalar, __VA_ARGS__>); \
1135+
m.impl(#op".Scalar", unwrap_and_call<Unop, at::op, const Scalar&, __VA_ARGS__>); \
11361136
}
11371137

1138-
BINARY_POINTWISE_VA(add, Scalar);
1139-
BINARY_POINTWISE_VA(sub, Scalar);
1140-
BINARY_POINTWISE_VA(rsub, Scalar);
1138+
BINARY_POINTWISE_VA(add, const Scalar&);
1139+
BINARY_POINTWISE_VA(sub, const Scalar&);
1140+
BINARY_POINTWISE_VA(rsub, const Scalar&);
11411141
BINARY_POINTWISE(mul);
11421142
BINARY_POINTWISE(div);
11431143
{
11441144
using Binop = Tensor (*)(const Tensor&, const Tensor&, std::string);
1145-
using Unop = Tensor (*)(const Tensor&, Scalar, std::string);
1145+
using Unop = Tensor (*)(const Tensor&, const Scalar&, std::string);
11461146
m.impl("div.Tensor_mode", binary_pointwise_batching_rule<Binop, at::div, std::string>);
1147-
m.impl("div.Scalar_mode", unwrap_and_call<Unop, at::div, Scalar, std::string>);
1147+
m.impl("div.Scalar_mode", unwrap_and_call<Unop, at::div, const Scalar&, std::string>);
11481148
}
11491149

11501150
// at::pow has three out-of-place overloads
11511151
m.impl("pow.Tensor_Tensor", binary_pointwise_batching_rule<TensorTensorType, at::pow>);
1152-
m.impl("pow.Tensor_Scalar", unwrap_and_call<TensorScalarType, at::pow, Scalar>);
1152+
m.impl("pow.Tensor_Scalar", unwrap_and_call<TensorScalarType, at::pow, const Scalar&>);
11531153
m.impl("pow.Scalar", pow_scalar_Tensor_batching_rule);
11541154

11551155
m.impl("sigmoid_backward", binary_pointwise_batching_rule<TensorTensorType, at::sigmoid_backward>);
@@ -1158,15 +1158,15 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) {
11581158
binary_pointwise_batching_rule<
11591159
TensorTensorScalarType,
11601160
at::threshold_backward,
1161-
Scalar>);
1161+
const Scalar&>);
11621162

11631163
// for at::result_type, call the native::result_type implementation.
11641164
// We don't have to do anything special because native::result_type operates
11651165
// on the logical shape of the tensors.
11661166
m.impl("result_type.Tensor", static_cast<ScalarType (*)(const Tensor&, const Tensor&)>(native::result_type));
1167-
m.impl("result_type.Scalar", static_cast<ScalarType (*)(const Tensor&, Scalar)>(native::result_type));
1168-
m.impl("result_type.Scalar_Tensor", static_cast<ScalarType (*)(Scalar, const Tensor&)>(native::result_type));
1169-
m.impl("result_type.Scalar_Scalar", static_cast<ScalarType (*)(Scalar, Scalar)>(native::result_type));
1167+
m.impl("result_type.Scalar", static_cast<ScalarType (*)(const Tensor&, const Scalar&)>(native::result_type));
1168+
m.impl("result_type.Scalar_Tensor", static_cast<ScalarType (*)(const Scalar&, const Tensor&)>(native::result_type));
1169+
m.impl("result_type.Scalar_Scalar", static_cast<ScalarType (*)(const Scalar&, const Scalar&)>(native::result_type));
11701170

11711171
#undef BINARY_POINTWISE_VA
11721172
#undef BINARY_POINTWISE
@@ -1207,7 +1207,7 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) {
12071207
// Comparison ops
12081208
#define COMPARISON_POINTWISE(op) \
12091209
m.impl(#op".Tensor", comparison_pointwise_batching_rule<TensorTensorType, at::op>); \
1210-
m.impl(#op".Scalar", unwrap_and_call<TensorScalarType, at::op, Scalar>);
1210+
m.impl(#op".Scalar", unwrap_and_call<TensorScalarType, at::op, const Scalar&>);
12111211

12121212
COMPARISON_POINTWISE(eq);
12131213
COMPARISON_POINTWISE(gt);

aten/src/ATen/LegacyTHFunctionsCPU.cpp

+5-5
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,7 @@ Tensor _th_std(const Tensor & self, bool unbiased) {
442442
AT_ERROR("_th_std not supported on CPUType for ", dispatch_scalar_type);
443443
}
444444
}
445-
Tensor & _th_renorm_out(Tensor & result, const Tensor & self, Scalar p, int64_t dim, Scalar maxnorm) {
445+
Tensor & _th_renorm_out(Tensor & result, const Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm) {
446446
// DeviceGuard omitted
447447
auto dispatch_scalar_type = infer_scalar_type(self);
448448

@@ -468,7 +468,7 @@ Tensor & _th_renorm_out(Tensor & result, const Tensor & self, Scalar p, int64_t
468468
}
469469
return result;
470470
}
471-
Tensor _th_renorm(const Tensor & self, Scalar p, int64_t dim, Scalar maxnorm) {
471+
Tensor _th_renorm(const Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm) {
472472
// DeviceGuard omitted
473473
auto dispatch_scalar_type = infer_scalar_type(self);
474474
auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(c10::Storage(c10::Storage::use_byte_size_t(), 0, allocator(), true),DispatchKey::CPU, scalarTypeToTypeMeta(dispatch_scalar_type)).release();
@@ -493,7 +493,7 @@ Tensor _th_renorm(const Tensor & self, Scalar p, int64_t dim, Scalar maxnorm) {
493493
}
494494
return result;
495495
}
496-
Tensor & _th_renorm_(Tensor & self, Scalar p, int64_t dim, Scalar maxnorm) {
496+
Tensor & _th_renorm_(Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm) {
497497
// DeviceGuard omitted
498498
auto dispatch_scalar_type = infer_scalar_type(self);
499499

@@ -517,7 +517,7 @@ Tensor & _th_renorm_(Tensor & self, Scalar p, int64_t dim, Scalar maxnorm) {
517517
}
518518
return self;
519519
}
520-
Tensor & _th_histc_out(Tensor & result, const Tensor & self, int64_t bins, Scalar min, Scalar max) {
520+
Tensor & _th_histc_out(Tensor & result, const Tensor & self, int64_t bins, const Scalar& min, const Scalar& max) {
521521
// DeviceGuard omitted
522522
auto dispatch_scalar_type = infer_scalar_type(self);
523523

@@ -543,7 +543,7 @@ Tensor & _th_histc_out(Tensor & result, const Tensor & self, int64_t bins, Scala
543543
}
544544
return result;
545545
}
546-
Tensor _th_histc(const Tensor & self, int64_t bins, Scalar min, Scalar max) {
546+
Tensor _th_histc(const Tensor & self, int64_t bins, const Scalar& min, const Scalar& max) {
547547
// DeviceGuard omitted
548548
auto dispatch_scalar_type = infer_scalar_type(self);
549549
auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(c10::Storage(c10::Storage::use_byte_size_t(), 0, allocator(), true),DispatchKey::CPU, scalarTypeToTypeMeta(dispatch_scalar_type)).release();

aten/src/ATen/LegacyTHFunctionsCPU.h

+5-5
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@ std::tuple<Tensor &,Tensor &> _th_mode_out(Tensor & values, Tensor & indices, co
3030
std::tuple<Tensor,Tensor> _th_mode(const Tensor & self, int64_t dim, bool keepdim);
3131
Tensor _th_var(const Tensor & self, bool unbiased);
3232
Tensor _th_std(const Tensor & self, bool unbiased);
33-
Tensor & _th_renorm_out(Tensor & result, const Tensor & self, Scalar p, int64_t dim, Scalar maxnorm);
34-
Tensor _th_renorm(const Tensor & self, Scalar p, int64_t dim, Scalar maxnorm);
35-
Tensor & _th_renorm_(Tensor & self, Scalar p, int64_t dim, Scalar maxnorm);
36-
Tensor & _th_histc_out(Tensor & result, const Tensor & self, int64_t bins, Scalar min, Scalar max);
37-
Tensor _th_histc(const Tensor & self, int64_t bins, Scalar min, Scalar max);
33+
Tensor & _th_renorm_out(Tensor & result, const Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm);
34+
Tensor _th_renorm(const Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm);
35+
Tensor & _th_renorm_(Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm);
36+
Tensor & _th_histc_out(Tensor & result, const Tensor & self, int64_t bins, const Scalar& min, const Scalar& max);
37+
Tensor _th_histc(const Tensor & self, int64_t bins, const Scalar& min, const Scalar& max);
3838
std::tuple<Tensor &,Tensor &> _th_gels_out(Tensor & res1, Tensor & res2, const Tensor & self, const Tensor & A);
3939
std::tuple<Tensor,Tensor> _th_gels(const Tensor & self, const Tensor & A);
4040
std::tuple<Tensor &,Tensor &> _th_geqrf_out(Tensor & res1, Tensor & res2, const Tensor & self);

aten/src/ATen/LegacyTHFunctionsCUDA.h

+13-13
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ namespace native {
1818
namespace legacy {
1919
namespace cuda {
2020

21-
Tensor & _th_masked_fill_(Tensor & self, const Tensor & mask, Scalar value);
22-
Tensor & _th_masked_fill_bool_(Tensor & self, const Tensor & mask, Scalar value);
21+
Tensor & _th_masked_fill_(Tensor & self, const Tensor & mask, const Scalar& value);
22+
Tensor & _th_masked_fill_bool_(Tensor & self, const Tensor & mask, const Scalar& value);
2323
Tensor & _th_index_copy_(Tensor & self, int64_t dim, const Tensor & index, const Tensor & source);
2424
Tensor & _th_take_out(Tensor & result, const Tensor & self, const Tensor & index);
2525
Tensor _th_take(const Tensor & self, const Tensor & index);
@@ -32,9 +32,9 @@ std::tuple<Tensor &,Tensor &> _th_sort_out_stable(Tensor & values, Tensor & indi
3232
std::tuple<Tensor,Tensor> _th_sort_stable(const Tensor & self, c10::optional<bool> stable, int64_t dim, bool descending);
3333
std::tuple<Tensor &,Tensor &> _th_topk_out(Tensor & values, Tensor & indices, const Tensor & self, int64_t k, int64_t dim, bool largest, bool sorted);
3434
std::tuple<Tensor,Tensor> _th_topk(const Tensor & self, int64_t k, int64_t dim, bool largest, bool sorted);
35-
Tensor & _th_renorm_out(Tensor & result, const Tensor & self, Scalar p, int64_t dim, Scalar maxnorm);
36-
Tensor _th_renorm(const Tensor & self, Scalar p, int64_t dim, Scalar maxnorm);
37-
Tensor & _th_renorm_(Tensor & self, Scalar p, int64_t dim, Scalar maxnorm);
35+
Tensor & _th_renorm_out(Tensor & result, const Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm);
36+
Tensor _th_renorm(const Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm);
37+
Tensor & _th_renorm_(Tensor & self, const Scalar& p, int64_t dim, const Scalar& maxnorm);
3838
Tensor & _th_cross_kernel_out(Tensor & result, const Tensor & self, const Tensor & other, int64_t dim);
3939
Tensor _th_cross_kernel(const Tensor & self, const Tensor & other, int64_t dim);
4040
std::tuple<Tensor &,Tensor &> _th_gels_out(Tensor & res1, Tensor & res2, const Tensor & self, const Tensor & A);
@@ -44,10 +44,10 @@ Tensor _th_potri(const Tensor & self, bool upper);
4444
std::tuple<Tensor &,Tensor &> _th_geqrf_out(Tensor & res1, Tensor & res2, const Tensor & self);
4545
std::tuple<Tensor,Tensor> _th_geqrf(const Tensor & self);
4646
Tensor & _th_copy_ignoring_overlaps_(Tensor & self, const Tensor & src);
47-
Tensor & _thnn_multi_margin_loss_forward_out(Tensor & output, const Tensor & self, const Tensor & target, Scalar p, Scalar margin, const Tensor & weight, int64_t reduction);
48-
Tensor _thnn_multi_margin_loss_forward(const Tensor & self, const Tensor & target, Scalar p, Scalar margin, const Tensor & weight, int64_t reduction);
49-
Tensor & _thnn_multi_margin_loss_backward_out(Tensor & grad_input, const Tensor & grad_output, const Tensor & self, const Tensor & target, Scalar p, Scalar margin, const Tensor & weight, int64_t reduction);
50-
Tensor _thnn_multi_margin_loss_backward(const Tensor & grad_output, const Tensor & self, const Tensor & target, Scalar p, Scalar margin, const Tensor & weight, int64_t reduction);
47+
Tensor & _thnn_multi_margin_loss_forward_out(Tensor & output, const Tensor & self, const Tensor & target, const Scalar& p, const Scalar& margin, const Tensor & weight, int64_t reduction);
48+
Tensor _thnn_multi_margin_loss_forward(const Tensor & self, const Tensor & target, const Scalar& p, const Scalar& margin, const Tensor & weight, int64_t reduction);
49+
Tensor & _thnn_multi_margin_loss_backward_out(Tensor & grad_input, const Tensor & grad_output, const Tensor & self, const Tensor & target, const Scalar& p, const Scalar& margin, const Tensor & weight, int64_t reduction);
50+
Tensor _thnn_multi_margin_loss_backward(const Tensor & grad_output, const Tensor & self, const Tensor & target, const Scalar& p, const Scalar& margin, const Tensor & weight, int64_t reduction);
5151
std::tuple<Tensor &,Tensor &> _thnn_multilabel_margin_loss_forward_out(Tensor & output, Tensor & is_target, const Tensor & self, const Tensor & target, int64_t reduction);
5252
std::tuple<Tensor,Tensor> _thnn_multilabel_margin_loss_forward(const Tensor & self, const Tensor & target, int64_t reduction);
5353
Tensor & _thnn_multilabel_margin_loss_backward_out(Tensor & grad_input, const Tensor & grad_output, const Tensor & self, const Tensor & target, int64_t reduction, const Tensor & is_target);
@@ -68,10 +68,10 @@ std::tuple<Tensor &,Tensor &> _thnn_log_sigmoid_forward_out(Tensor & output, Ten
6868
std::tuple<Tensor,Tensor> _thnn_log_sigmoid_forward(const Tensor & self);
6969
Tensor & _thnn_log_sigmoid_backward_out(Tensor & grad_input, const Tensor & grad_output, const Tensor & self, const Tensor & buffer);
7070
Tensor _thnn_log_sigmoid_backward(const Tensor & grad_output, const Tensor & self, const Tensor & buffer);
71-
Tensor & _thnn_rrelu_with_noise_forward_out(Tensor & output, const Tensor & self, const Tensor & noise, Scalar lower, Scalar upper, bool training, c10::optional<at::Generator> generator);
72-
Tensor _thnn_rrelu_with_noise_forward(const Tensor & self, const Tensor & noise, Scalar lower, Scalar upper, bool training, c10::optional<at::Generator> generator);
73-
Tensor _thnn_rrelu_with_noise_backward(const Tensor & grad_output, const Tensor & self, const Tensor & noise, Scalar lower, Scalar upper, bool training);
74-
Tensor & _thnn_rrelu_with_noise_forward_(Tensor & self, const Tensor & noise, Scalar lower, Scalar upper, bool training, c10::optional<at::Generator> generator);
71+
Tensor & _thnn_rrelu_with_noise_forward_out(Tensor & output, const Tensor & self, const Tensor & noise, const Scalar& lower, const Scalar& upper, bool training, c10::optional<at::Generator> generator);
72+
Tensor _thnn_rrelu_with_noise_forward(const Tensor & self, const Tensor & noise, const Scalar& lower, const Scalar& upper, bool training, c10::optional<at::Generator> generator);
73+
Tensor _thnn_rrelu_with_noise_backward(const Tensor & grad_output, const Tensor & self, const Tensor & noise, const Scalar& lower, const Scalar& upper, bool training);
74+
Tensor & _thnn_rrelu_with_noise_forward_(Tensor & self, const Tensor & noise, const Scalar& lower, const Scalar& upper, bool training, c10::optional<at::Generator> generator);
7575
std::tuple<Tensor &,Tensor &,Tensor &> _thnn_conv2d_forward_out(Tensor & output, Tensor & columns, Tensor & ones, const Tensor & self, const Tensor & weight, IntArrayRef kernel_size, const Tensor & bias, IntArrayRef stride, IntArrayRef padding);
7676
std::tuple<Tensor,Tensor,Tensor> _thnn_conv2d_forward(const Tensor & self, const Tensor & weight, IntArrayRef kernel_size, const Tensor & bias, IntArrayRef stride, IntArrayRef padding);
7777
std::tuple<Tensor &,Tensor &,Tensor &> _thnn_conv2d_backward_out(Tensor & grad_input, Tensor & grad_weight, Tensor & grad_bias, const Tensor & grad_output, const Tensor & self, const Tensor & weight, IntArrayRef kernel_size, IntArrayRef stride, IntArrayRef padding, const Tensor & columns, const Tensor & ones);

aten/src/ATen/ScalarOps.cpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,23 +13,23 @@
1313
namespace at {
1414
namespace {
1515
template <typename scalar_t>
16-
inline void fill_inplace(Tensor& self, Scalar value_scalar) {
16+
inline void fill_inplace(Tensor& self, const Scalar& value_scalar) {
1717
auto value = value_scalar.to<scalar_t>();
1818
scalar_t* dptr = static_cast<scalar_t*>(self.data_ptr());
1919
*dptr = value;
2020
}
2121
}
2222

2323
namespace detail {
24-
Tensor& scalar_fill(Tensor& self, Scalar value) {
24+
Tensor& scalar_fill(Tensor& self, const Scalar& value) {
2525
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(
2626
kHalf, kBool, kBFloat16, self.scalar_type(), "fill_out", [&]() {
2727
fill_inplace<scalar_t>(self, value);
2828
});
2929
return self;
3030
}
3131

32-
Tensor scalar_tensor_static(Scalar s, c10::optional<ScalarType> dtype_opt, c10::optional<Device> device_opt) {
32+
Tensor scalar_tensor_static(const Scalar& s, c10::optional<ScalarType> dtype_opt, c10::optional<Device> device_opt) {
3333
at::tracer::impl::NoTracerDispatchMode tracer_guard;
3434
at::AutoNonVariableTypeMode non_var_type_mode(true);
3535
auto result = at::detail::empty_cpu({}, dtype_opt, c10::nullopt, device_opt, c10::nullopt, c10::nullopt);

aten/src/ATen/ScalarOps.h

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ namespace detail {
1111
// Ideally this fast pass should be implemented in TensorIterator,
1212
// but we also want to skip compute_types which in not avoidable
1313
// in TensorIterator for now.
14-
Tensor& scalar_fill(Tensor& self, Scalar value);
15-
TORCH_API Tensor scalar_tensor_static(Scalar s, c10::optional<ScalarType> dtype_opt, c10::optional<Device> device_opt);
14+
Tensor& scalar_fill(Tensor& self, const Scalar& value);
15+
TORCH_API Tensor scalar_tensor_static(const Scalar& s, c10::optional<ScalarType> dtype_opt, c10::optional<Device> device_opt);
1616
} // namespace detail
1717
} // namespace at
1818

@@ -21,7 +21,7 @@ namespace c10 {
2121

2222
// FIXME: this should be (and was) Scalar::toTensor, but there is currently no way
2323
// to implement this without going through Derived Types (which are not part of core).
24-
inline at::Tensor scalar_to_tensor(Scalar s, const Device device = at::kCPU) {
24+
inline at::Tensor scalar_to_tensor(const Scalar& s, const Device device = at::kCPU) {
2525
// This is the fast track we have for CPU scalar tensors.
2626
if (device == at::kCPU) {
2727
if (s.isFloatingPoint()) {

0 commit comments

Comments
 (0)