Simplify copy kernel #28428

zasdfgbnm · 2019-10-22T17:37:52Z

Stack from ghstack:

Simplify copy kernel #28428 Simplify copy kernel
Make TensorIterator stop promoting types by copying #28427 Make TensorIterator stop promoting types by copying
Move type casting to c10/util/TypeCast.h #28426 Move type casting to c10/util/TypeCast.h

Using the new type promotion and dynamic casting added to
TensorIterator, the copy kernels could be greatly simplified.

Benchmark on CUDA:

import torch
import timeit
import pandas
import itertools
from tqdm.notebook import tqdm
import math
print(torch.__version__)
print()

_10M = 10 * 1024 ** 2

d = {}

for from_, to in tqdm(itertools.product(torch.testing.get_all_dtypes(), repeat=2)):
    if from_ not in d:
        d[from_] = {}
    a = torch.empty(_10M, dtype=from_, device='cuda')
    min_ = math.inf
    for i in range(100):
        torch.cuda.synchronize()
        start = timeit.default_timer()
        a.to(to)
        torch.cuda.synchronize()
        end = timeit.default_timer()
        elapsed = end - start
        if elapsed < min_:
            min_ = elapsed
    d[from_][to] = int(min_ * 1000 * 1000)
    
pandas.DataFrame(d)

original:

new:

Differential Revision: D18170995

Using the new type promotion and dynamic casting added to `TensorIterator`, the copy kernels could be greatly simplified. **Script:** ```python import torch import timeit import pandas import itertools from tqdm import tqdm import math print(torch.__version__) print() _10M = 10 * 1024 ** 2 d = {} for from_, to in tqdm(itertools.product(torch.testing.get_all_dtypes(), repeat=2)): if from_ not in d: d[from_] = {} a = torch.zeros(_10M, dtype=from_) min_ = math.inf for i in range(100): start = timeit.default_timer() a.to(to) end = timeit.default_timer() elapsed = end - start if elapsed < min_: min_ = elapsed d[from_][to] = int(elapsed * 1000 * 1000) pandas.DataFrame(d) ``` **Before:** ![image](https://user-images.githubusercontent.com/1032377/67171274-2e93d000-f36b-11e9-8fa0-91edd7dbc8ec.png) **After:** ![image](https://user-images.githubusercontent.com/1032377/67171200-d361dd80-f36a-11e9-9b22-66292e395a09.png) ghstack-source-id: 8754f6a Pull Request resolved: #28428

Using the new type promotion and dynamic casting added to `TensorIterator`, the copy kernels could be greatly simplified. **Script:** ```python import torch import timeit import pandas import itertools from tqdm import tqdm import math print(torch.__version__) print() _10M = 10 * 1024 ** 2 d = {} for from_, to in tqdm(itertools.product(torch.testing.get_all_dtypes(), repeat=2)): if from_ not in d: d[from_] = {} a = torch.zeros(_10M, dtype=from_) min_ = math.inf for i in range(100): start = timeit.default_timer() a.to(to) end = timeit.default_timer() elapsed = end - start if elapsed < min_: min_ = elapsed d[from_][to] = int(elapsed * 1000 * 1000) pandas.DataFrame(d) ``` **Before:** ![image](https://user-images.githubusercontent.com/1032377/67171274-2e93d000-f36b-11e9-8fa0-91edd7dbc8ec.png) **After:** ![image](https://user-images.githubusercontent.com/1032377/67171200-d361dd80-f36a-11e9-9b22-66292e395a09.png) [ghstack-poisoned]

Using the new type promotion and dynamic casting added to `TensorIterator`, the copy kernels could be greatly simplified. **Script:** ```python import torch import timeit import pandas import itertools from tqdm import tqdm import math print(torch.__version__) print() _10M = 10 * 1024 ** 2 d = {} for from_, to in tqdm(itertools.product(torch.testing.get_all_dtypes(), repeat=2)): if from_ not in d: d[from_] = {} a = torch.zeros(_10M, dtype=from_) min_ = math.inf for i in range(100): start = timeit.default_timer() a.to(to) end = timeit.default_timer() elapsed = end - start if elapsed < min_: min_ = elapsed d[from_][to] = int(elapsed * 1000 * 1000) pandas.DataFrame(d) ``` **Before:** ![image](https://user-images.githubusercontent.com/1032377/67171274-2e93d000-f36b-11e9-8fa0-91edd7dbc8ec.png) **After:** ![image](https://user-images.githubusercontent.com/1032377/67171200-d361dd80-f36a-11e9-9b22-66292e395a09.png) ghstack-source-id: b764aff Pull Request resolved: #28428