Search Microsoft.com
Explore
"NVIDIA CUDA"
Direct GPU/FPGA communication Via PCI express
For our testing, we chose an nVidia GeForce GTX 580, a high-end consumer GPU that supports the whole CUDA 4.1 API, with the sole exception of the...
High Performance Discrete Fourier Transforms on Graphics Processors
We implemented our algorithms using the NVIDIA CUDA API and compared their performance with NVIDIA’s CUFFT library and an optimized CPU-implementation...
DeepSpeed & ZeRO-2: Shattering barriers of deep learning speed & scale
Compared to the best-known result from NVIDIA that takes 47 minutes using 1,472 V100 GPUs, DeepSpeed is faster while using 30% less resources. While using...
Microsoft Translator enhanced with Z-code Mixture of Experts models
We partnered with NVIDIA to optimize faster engines that can be used at runtime to deploy the new Z-code/MoE models on GPUs. NVIDIA developed custom CUDA...
Direct GPU/FPGA Communication Via PCI Express
TEST PROCEDURE For our testing, we chose an nVidia GeForce GTX 580, a high-end consumer GPU that supports the CUDA 4.1 API (with the exception of...
Esri helps cities gain key climate science insights using geospatial ...
Esri’s Reality Engine, the technology behind reality mapping, uses NVIDIA CUDA®-enabled GPUs to accelerate processing. CUDA provides a development...
Runtime Performance Prediction for Deep Learning Models with Graph ...
DL frameworks provide a hybrid programming paradigm: developers invoke high-level interfaces only to construct DL models, while low-level computational...
DeepSpeed: Accelerating large-scale model inference and training via ...
Inference-optimized CUDA kernels boost per-GPU efficiency by fully utilizing the GPU resources through deep fusion and novel kernel scheduling. Effective...
AI Inference Task Migration from CPU to GPU: Methodology Overview
This repository ties together the entire methodology with a minimalistic example: first identify computational hotspots on the CPU, then rewrite loops...
FAST COMPUTATION OF GENERAL FOURIER TRANSFORMS ON GPUS
using both an NVIDIA GeForce 8800 GTX and an ATI XT1900 on a PC with an Intel Core 2 Duo E6600 CPU clocked at 2.66 GHz. We compared the performance of both...
Estimating GPU Memory Consumption of Deep Learning Models
It is hard to analyze the GPU memory usage of low-level framework operators (e.g., Conv2d), since they are usually implemented with proprietary NVIDIA...
Graviton: Trusted Execution Environments on GPUs
We review the NVIDIA GPU architecture and the CUDA programming model to illustrate how a compute task is offloaded and executed on the GPU. We focus on the...
gpusurface-v3 - microsoft.com
Our parallel surface reconstruction algorithm is implemented us-ing NVIDIA’s CUDA [NVIDIA 2007]. In addition to providing a general-purpose C language...
Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
Table 1 shows the throughput of image classification (forward propagation only) using well-known models such as CIFAR-10 based on cuda-convnet [4], and...
PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning ...
We use the Nvidia RTX 2060 GPU, of which the computing ability is 6.4 teraflops, as the experimental platform, which has comparable performance to the Xbox...
Deploying Azure ND H100 v5 Instances in AKS with NVIDIA MIG GPU Slicing
Azure’s ND H100 v5 VM series offers powerful NVIDIA H100 GPUs for high-end AI training and HPC workloads . This guide walks through creating an AKS cluster...
DeepSpeed: Extreme-scale model training for everyone
Using a machine with a single NVIDIA V100 GPU, our users can run models of up to 13 billion parameters without running out of memory, 10x bigger than the...
Tutel: An efficient mixture-of-experts implementation for large DNN ...
Tutel also implements a fast cumsum-minus-one operator, achieving a 24x speedup compared with the fairseq implementation. Tutel also leverages NVRTC, a...
Supercharge Your Deep Learning Workflows with NVIDIA Nsight Systems
What is NVIDIA Nsight Systems? NVIDIA Nsight Systems is a performance analysis tool designed to help you analyze and optimize your applications' behavior on...
Cryptojacking: Understanding and defending against cloud compute ...
When comparing NVIDIA GPU performance for cryptomining, the number of Compute Unified Device Architecture (CUDA) cores can be used as a rough representation...
Can’t find what you’re looking for?
Search tips
- Make sure all words are spelled correctly.
- Try different keywords.
- Search the web with Bing