Skip to content

Commit a2c69c7

Browse files
Loop unroll sample (oneapi-src#306)
* Update simple add sample Signed-off-by: Maria, Moushumi <[email protected]> * Update make files Signed-off-by: Maria, Moushumi <[email protected]> * Update fpga make file Signed-off-by: Maria, Moushumi <[email protected]> * Add dpc_common.hpp * Update sample.json * Fix Makefile.win * Update Makefile.win * Update sample.json * Remove dpc_common.hpp * Update VS project file * Update README.md * Update sample.json * Add stb * Update read me file * Initial commit * Update License.txt * Change location of matrix multiplication sample * Fix matrix mul sample VS project file * Update samples for beta10 release * Fix for Windows * Fix for FPGA * Fix for FPGA * Fix for FPGA to support both beta09 and beta10 * Add header comment * Samples: block apsp and merge spmv * Add readme files * Update readme file * Update sample.json * Update sample.json * Update samples * Update sample.json * Loop unroll sample * Add readme file Co-authored-by: JoeOster <[email protected]>
1 parent ece7702 commit a2c69c7

File tree

8 files changed

+412
-2
lines changed

8 files changed

+412
-2
lines changed
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
cmake_minimum_required (VERSION 3.5)
2+
project (loop-unroll)
3+
4+
set(CMAKE_CXX_COMPILER dpcpp)
5+
6+
# Set default build type to RelWithDebInfo if not specified
7+
if (NOT CMAKE_BUILD_TYPE)
8+
message (STATUS "Default CMAKE_BUILD_TYPE not set using Release with Debug Info")
9+
set (CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE
10+
STRING "Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel"
11+
FORCE)
12+
endif()
13+
14+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -std=c++17")
15+
16+
add_executable(loop-unroll src/loop-unroll.cpp)
17+
18+
if(WIN32)
19+
add_custom_target(run loop-unroll.exe)
20+
else()
21+
add_custom_target(run ./loop-unroll)
22+
endif()
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright Intel Corporation
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
2+
# Unrolling Loops
3+
The Loop Unroll demonstrates a simple example of unrolling loops to improve the throughput of a DPC++ program for GPU offload.
4+
5+
For comprehensive instructions regarding DPC++ Programming, go to https://software.intel.com/en-us/oneapi-programming-guide and search based on relevant terms noted in the comments.
6+
7+
| Optimized for | Description
8+
|:--- |:---
9+
| OS | Linux* Ubuntu* 18.04,
10+
| Hardware | Skylake with GEN9 or newer,
11+
| Software | Intel® oneAPI DPC++ Compiler (beta)
12+
| What you will learn | how to perform reduction with oneAPI on cpu and gpu
13+
| Time to complete | 30 min
14+
15+
16+
## Purpose
17+
18+
The loop unrolling mechanism is used to increase program parallelism by duplicating the compute logic within a loop. The number of times the loop logic is duplicated is called the *unroll factor*. Depending on whether the *unroll factor* is equal to the number of loop iterations or not, loop unroll methods can be categorized as *full-loop unrolling* and *partial-loop unrolling*. A full unroll is a special case where the unroll factor is equal to the number of loop iterations.
19+
20+
21+
## Key Concepts
22+
* Basics of loop unrolling.
23+
* How to unroll loops in your program.
24+
* Determining the optimal unroll factor for your program.
25+
26+
## License
27+
This code sample is licensed under MIT license.
28+
29+
30+
## Building the `loop_unroll` Tutorial
31+
32+
> Note: if you have not already done so, set up your CLI
33+
> environment by sourcing the setvars script located in
34+
> the root of your oneAPI installation.
35+
>
36+
> Linux Sudo: . /opt/intel/oneapi/setvars.sh
37+
> Linux User: . ~/intel/oneapi/setvars.sh
38+
> Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat
39+
40+
### Include Files
41+
The included header `dpc_common.hpp` is located at `%ONEAPI_ROOT%\dev-utilities\latest\include` on your development system.
42+
43+
### Running Samples in DevCloud
44+
If running a sample in the Intel DevCloud, remember that you must specify the compute node (fpga_compile or fpga_runtime) as well as whether to run in batch or interactive mode. For more information see the Intel® oneAPI Base Toolkit Get Started Guide ([https://devcloud.intel.com/oneapi/get-started/base-toolkit/](https://devcloud.intel.com/oneapi/get-started/base-toolkit/)).
45+
46+
47+
## Building the `loop-unroll` Program for CPU and GPU
48+
49+
### Running Samples In DevCloud
50+
51+
If running a sample in the Intel DevCloud remember that you must
52+
specify the compute node (CPU, GPU, FPGA) as well whether to run in
53+
batch or interactive mode. For more information see the Intel® oneAPI
54+
Base Toolkit Get Started Guide
55+
(https://devcloud.intel.com/oneapi/get-started/base-toolkit/)
56+
57+
### On a Linux* System
58+
1. Build the program using the following `cmake` commands.
59+
60+
```
61+
$ cd loop-unroll
62+
$ mkdir build
63+
$ cd build
64+
$ cmake ..
65+
$ make
66+
```
67+
68+
2. Run the program
69+
70+
```
71+
$ make run
72+
```
73+
74+
3. Clean the program
75+
76+
```
77+
$ make clean
78+
```
79+
80+
### On a Windows* System
81+
82+
* Build the program using VS2017 or VS2019 Right click on the solution
83+
file and open using either VS2017 or VS2019 IDE. Right click on the
84+
project in Solution explorer and select Rebuild. From top menu
85+
select Debug -> Start without Debugging.
86+
87+
* Build the program using MSBuild Open "x64 Native Tools Command
88+
Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019"
89+
Run - MSBuild 1d_HeatTransfer.sln /t:Rebuild
90+
/p:Configuration="Release"
91+
92+
93+
## Running the Sample
94+
95+
96+
### Example of Output
97+
```
98+
Input array size: 67108864
99+
Running on device: Intel(R) Gen9
100+
Unroll factor: 1 Kernel time: 13710.9 ms
101+
Throughput for kernel with unroll factor 1: 0.005 GFlops
102+
Unroll factor: 2 Kernel time: 8906.831 ms
103+
Throughput for kernel with unroll factor 2: 0.008 GFlops
104+
Unroll factor: 4 Kernel time: 4661.967 ms
105+
Throughput for kernel with unroll factor 4: 0.014 GFlops
106+
Unroll factor: 8 Kernel time: 2669.343 ms
107+
Throughput for kernel with unroll factor 8: 0.025 GFlops
108+
Unroll factor: 16 Kernel time: 2421.305 ms
109+
Throughput for kernel with unroll factor 16: 0.028 GFlops
110+
PASSED: The results are correct.
111+
```
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
2+
Microsoft Visual Studio Solution File, Format Version 12.00
3+
# Visual Studio Version 16
4+
VisualStudioVersion = 16.0.30104.148
5+
MinimumVisualStudioVersion = 10.0.40219.1
6+
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "loop-unroll", "loop-unroll.vcxproj", "{FF83A9FA-449F-425D-80EB-08AF092538EB}"
7+
EndProject
8+
Global
9+
GlobalSection(SolutionConfigurationPlatforms) = preSolution
10+
Debug|x64 = Debug|x64
11+
Release|x64 = Release|x64
12+
EndGlobalSection
13+
GlobalSection(ProjectConfigurationPlatforms) = postSolution
14+
{FF83A9FA-449F-425D-80EB-08AF092538EB}.Debug|x64.ActiveCfg = Debug|x64
15+
{FF83A9FA-449F-425D-80EB-08AF092538EB}.Debug|x64.Build.0 = Debug|x64
16+
{FF83A9FA-449F-425D-80EB-08AF092538EB}.Release|x64.ActiveCfg = Release|x64
17+
{FF83A9FA-449F-425D-80EB-08AF092538EB}.Release|x64.Build.0 = Release|x64
18+
EndGlobalSection
19+
GlobalSection(SolutionProperties) = preSolution
20+
HideSolutionNode = FALSE
21+
EndGlobalSection
22+
GlobalSection(ExtensibilityGlobals) = postSolution
23+
SolutionGuid = {55B4768C-0988-4CD1-8FE6-6BD999E8002A}
24+
EndGlobalSection
25+
EndGlobal
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
3+
<ItemGroup Label="ProjectConfigurations">
4+
<ProjectConfiguration Include="Debug|x64">
5+
<Configuration>Debug</Configuration>
6+
<Platform>x64</Platform>
7+
</ProjectConfiguration>
8+
<ProjectConfiguration Include="Release|x64">
9+
<Configuration>Release</Configuration>
10+
<Platform>x64</Platform>
11+
</ProjectConfiguration>
12+
</ItemGroup>
13+
<PropertyGroup Label="Globals">
14+
<VCProjectVersion>15.0</VCProjectVersion>
15+
<ProjectGuid>{ff83a9fa-449f-425d-80eb-08af092538eb}</ProjectGuid>
16+
<Keyword>Win32Proj</Keyword>
17+
<RootNamespace>loop_unroll</RootNamespace>
18+
</PropertyGroup>
19+
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
20+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
21+
<ConfigurationType>Application</ConfigurationType>
22+
<UseDebugLibraries>true</UseDebugLibraries>
23+
<PlatformToolset>Intel(R) oneAPI DPC++ Compiler</PlatformToolset>
24+
<CharacterSet>Unicode</CharacterSet>
25+
</PropertyGroup>
26+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
27+
<ConfigurationType>Application</ConfigurationType>
28+
<UseDebugLibraries>false</UseDebugLibraries>
29+
<PlatformToolset>Intel(R) oneAPI DPC++ Compiler</PlatformToolset>
30+
<WholeProgramOptimization>true</WholeProgramOptimization>
31+
<CharacterSet>Unicode</CharacterSet>
32+
</PropertyGroup>
33+
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
34+
<ImportGroup Label="ExtensionSettings">
35+
</ImportGroup>
36+
<ImportGroup Label="Shared">
37+
</ImportGroup>
38+
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
39+
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
40+
</ImportGroup>
41+
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
42+
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
43+
</ImportGroup>
44+
<PropertyGroup Label="UserMacros" />
45+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
46+
<LinkIncremental>true</LinkIncremental>
47+
</PropertyGroup>
48+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
49+
<LinkIncremental>false</LinkIncremental>
50+
</PropertyGroup>
51+
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
52+
<ClCompile>
53+
<PrecompiledHeader>
54+
</PrecompiledHeader>
55+
<PrecompiledHeaderFile>
56+
</PrecompiledHeaderFile>
57+
<SYCLWarningLevel>Level3</SYCLWarningLevel>
58+
<AdditionalIncludeDirectories>%ONEAPI_ROOT%\dev-utilities\latest\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
59+
</ClCompile>
60+
<Link>
61+
<SubSystem>Console</SubSystem>
62+
<GenerateDebugInformation>true</GenerateDebugInformation>
63+
</Link>
64+
</ItemDefinitionGroup>
65+
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
66+
<ClCompile>
67+
<PrecompiledHeader>
68+
</PrecompiledHeader>
69+
<PrecompiledHeaderFile>
70+
</PrecompiledHeaderFile>
71+
<SYCLWarningLevel>Level3</SYCLWarningLevel>
72+
<AdditionalIncludeDirectories>%ONEAPI_ROOT%\dev-utilities\latest\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
73+
</ClCompile>
74+
<Link>
75+
<SubSystem>Console</SubSystem>
76+
<EnableCOMDATFolding>true</EnableCOMDATFolding>
77+
<OptimizeReferences>true</OptimizeReferences>
78+
<GenerateDebugInformation>true</GenerateDebugInformation>
79+
</Link>
80+
</ItemDefinitionGroup>
81+
<ItemGroup>
82+
<ClCompile Include="src/loop-unroll.cpp" />
83+
</ItemGroup>
84+
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
85+
<ImportGroup Label="ExtensionTargets">
86+
</ImportGroup>
87+
</Project>
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"guid": "EB08D6D9-84ED-4C64-BA0F-69D3B9C5A136",
3+
"name": "loop-unroll",
4+
"categories": [ "Toolkit/Intel® oneAPI Base Toolkit/oneAPI DPC++/C++ Compiler/CPU and GPU" ],
5+
"description": "Demonstrates the use of loop unrolling as a simple optimization technique to speed up compute and increase memory access throughput.",
6+
"toolchain": [ "dpcpp" ],
7+
"targetDevice": [ "CPU", "GPU" ],
8+
"languages": [ { "cpp": {} } ],
9+
"os": [ "linux", "windows" ],
10+
"builder": [ "ide", "cmake" ],
11+
"ciTests": {
12+
"linux": [{
13+
"steps": [
14+
"mkdir build",
15+
"cd build",
16+
"cmake ..",
17+
"make",
18+
"make run"
19+
]
20+
}],
21+
"windows": [{
22+
"steps": [
23+
"MSBuild loop-unroll.sln /t:Rebuild /p:Configuration=\"Release\"",
24+
"cd x64/Release",
25+
"loop-unroll.exe"
26+
]
27+
}]
28+
}
29+
}

0 commit comments

Comments
 (0)