0% found this document useful (0 votes)
60 views

vroc-vs-hba-performance-comparison

Uploaded by

myTert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

vroc-vs-hba-performance-comparison

Uploaded by

myTert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Intel® Virtual RAID on CPU

(Intel® VROC)
Detailed Comparison to RAID HBA
Notices and Disclaimers
Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​​.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​upda tes. See backup for configuration details. No
product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the
property of others. ​

Intel Optane Group 2


Purpose:
Broad categorical comparison of Intel VROC (Integrated RAID) vs HW RAID HBAs
on features, performance, latency, CPU% and power usage.

Agenda:
1. Architecture and Feature Comparison
2. Key findings
3. Intel® Optane™ SSD Comparisons
4. Test Configuration Details
5. Pass-thru Mode (No RAID) Comparison
6. RAID0/1/5/10 Performance Results
7. Detailed RAID0/5 Review (Latency, CPU%, Power)

Intel Optane Group 3


Architecture and Feature Comparison

Department or Event Name 4


Intel® VROC vs RAID HBA
Legacy RAID Architecture Intel® Xeon® Scalable
RAID HBA Intel VROC Processor
Product: Product:
• MegaRAID 9560-16i • Intel VROC
Category: Category:
• HW RAID • Integrated RAID
PCIe Generation: PCIe Generation:
Potential • Gen. 4
PCIe Uplink • Gen. 4
Bottleneck
Storage Uplink: Storage Uplink:
• x8 PCIe Lanes • X4 PCIe per SSD
# Drives: # Drives:
• 4 SSDs • 4 SSDs

Intel® VROC onboards RAID HBA functionality onto Intel® Xeon® CPUs1
1-Intel VROC and Intel VMD are available on all generations (Gen. 1, 2 and 3) and SKUs (Bronze, Silver, Gold, and Platinum) of Intel Xeon Scalable Processor
Intel Optane Group 5
Intel® VROC vs RAID HBA
Major RAID Features HW RAID VROC Intel® VROC Comment
Both architectures isolates SSD error/event handling to reduce OS
Error Handling/Isolation √ √ crash/reboot

Reliable data storage √ √ Enterprise data protection, even when power loss occurs

Boot support √ √ Redundant system volume = less down-time/crashes

In-band Management Tools √ √ Various UEFI, GUI, and CLI Utilities for each

Out-of-band RAID Config. √ X Intel VROC has OOB on roadmap for upcoming releases

Full NVMe SSD x4 Bandwidth X √ Intel VROC + Intel VMD allows full x4 access to SSDs, no HW Uplink

Uses powerful Intel® Xeon® CPU to RAID the fast NVMe* SSDs. Better
RAID Processing Location On HBA On Intel® Xeon scaling for heavy workloads (see Detailed CPU Review)

Supported RAID Levels 0/1/5/6/10/50/60 0/1/5/10 RAID6/50/60 not needed for perf./AFR of NVMe SSDs

Integrated Caching +
Write back cache DRAM + BBU Replace DRAM WB Cache + BBU with persistent Intel® Optane™ media
Intel® Optane™ SSD
SED Key Management On HBA Platform Integrated Intel VROC uses platform protocols and remote KMS to manage keys

Idle Power1 577W 562W Tested 15W reduction in Idle Power Usage with Intel VROC

See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks..
Intel Optane Group 6
Key Findings

Department or Event Name 7


Summary (Highlights)1,2
1. Intel VROC has compelling features to replace RAID HBA, plus a roadmap to
fill any gaps (OOB)
2. Intel VROC is the only RAID solution that scales with the Intel Optane SSD solution to
deliver extraordinary performance (Over 5.6M IOPS!)
3. Intel VROC performance for all RAID levels is equal or better than RAID HBA (↑
Performance, ↓ Latency)
4. Intel VROC can improve resource utilization by removing the HBA and related choke
points (↓ CPU Usage, ↓ Power)
5. Intel VROC has a scalable, integrated design that is better designed for NVMe SSDs
(↑ IOPS/CPU Core, ↑ IOPS/W)
See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks..

Intel Optane Group 8


Test Configuration Details

Department or Event Name 9


Test Configuration Details (Optane)
Legacy RAID HBA 4 x 400GB Intel Optane P5800X SSDs Intel® Xeon® Scalable
• Write Spec: 1,150,000 IOPS Processor
• Read Spec: 1,500,000 IOPS
Tested Configurations:
• Single Drive Performance
• 4x Drives pass-thru in parallel (no RAID)
• 4x Drive RAID0/5/10
• 2x Drive RAID1
Workload Details:
• 4k Random: 70/30 R/W
• 16 Threads, 16 IODepth
Metrics
• Performance: IOPS
• Bandwidth: MB/sec
• Latency: µsec
• CPU Usage*: Effective Intel Xeon Cores used
Data on Slides 11-14
*CPU Usage measured as total platform CPU % consumption, includes workload generation, storage stack (RAID) usage, and background activity
Measured as “Cores Used” = CPU% report out * # cores on system (64 cores)

Intel Optane Group 10


Test Configuration Details (NAND)
Legacy RAID HBA 4x 3.84TB Intel D7 P5510 SSDs Intel® Xeon® Scalable
• Write Spec: 170,000 IOPS Processor
• Read Spec: 700,000 IOPS
Tested Configurations:
• Single Drive Performance
• 4x Drives pass-thru in parallel (no RAID)
• 4x Drive RAID0/5/10
• 2x Drive RAID1
Workload Details:
• 4k Random: 100% Reads, 70/30 R/W, 100% Writes
• 1 Threads, 1 IODepth (Isolate Storage Path)
• 16 Threads, 64 and 256 IODepth (Peak performance)
Metrics
• Performance: IOPS
• Power: Watts (Idle and under load)
• Latency: µsec
• CPU Usage*: Effective Intel Xeon Cores used
Data on slides15-28

*CPU Usage measured as total platform CPU % consumption, includes workload generation, storage stack (RAID) usage, and background activity
Measured as “Cores Used” = CPU% report out * # cores on system (64 cores)

Intel Optane Group 11


Intel Optane Comparisons

Department or Event Name 12


RAID Levels Performance Comparison1
Intel® Optane™ SSDs: 16 Thread, 16 IODepth: 70/30 R/W
R/W IOPS Comparison
6,000,000
(higher is better)

Intel VROC achieves up to 5.6 million IOPS with


RAID0 on mixed workloads 5,000,000

4,000,000

Intel VROC has up to:

IOPS
3,000,000
161% more IOPS on RAID0
50% more IOPS on RAID5 2,000,000

248% more IOPS on RAID10


1,000,000
138% more IOPS on RAID1
-
RAID0 RAID5 RAID10 RAID1

Intel VROC RAID5 > HBA RAID10 performance VROC Reads/Writes HBA Reads/Writes

See backup for configuration details. Results may vary

Intel Optane Group 13


RAID0 Simultaneous Read/Write Comparison1
Intel® Optane™ SSDs: 16 Thread, 16 IODepth: 70/30 R/W
6,000,000 150

Intel VROC RAID0 reads/writes provides: 4,000,000 100


5,689,962

• ↑ IOPS IOPS
Hi gher is better 111
Latency (µsec)
Lower i s better
2,000,000 50

• ↓ Latency 2,177,404

VROC 43
0 HBA 0
• ↓ CPU Usage
12 24,000

• ↑ Bandwidth 22,760
8 16,000
RAID0 provides higher performance metrics 9 CPU Cores Used Bandwidth
Lower i s better (MB/sec)
but with lower resource usage (CPU) 4
8
8,000
Hi gher is better

Up to 161% more Read/Write IOPS 8,710

Up to 61% lower latency 0 0

See backup for configuration details. Results may vary

Intel Optane Group 14


RAID5 Simultaneous Read/Write Comparison1
Intel® Optane™ SSDs: 16 Thread, 16 IODepth: 70/30 R/W
1,200,000 600

Intel VROC RAID5 reads/write provides: 800,000 1,121,999 400

• ↑ IOPS
IOPS Latency (µsec)
461
Hi gher is better Lower i s better
362
400,000 743,373
200

• ↓ Latency VROC
0 HBA 0
• ↑ CPU Usage*
9 4,800

• ↓ Bandwidth 4,488
6 3,200
*RAID5 uses 4 more cores but delivers up to 7
CPU Cores Used Bandwidth
Lower i s better (MB/sec)
380K additional IOPS 3 1,600
2,973 Hi gher is better

3
Up to 50% more Read/Write IOPS
0 0
Up to 50% more Bandwidth

See backup for configuration details. Results may vary

Intel Optane Group 15


NAND SSD Comparisons

Department or Event Name 16


Pass-thru Mode (No RAID) Comparison

Department or Event Name 17


Low Workload, Pass-Thru Comparison2
NAND SSDs: 1 Thread, 1 IODepth

(higher is better)
1x Pass-thru Comparisons 4x Pass-thru Comparisons
(lower is better) (higher is better) (lower is better)
80,000 100 300,000 100

60,000 75 225,000 75

Latency (usec)
Latency (usec)

IOPS
IOPS

40,000 50 150,000 50

20,000 25 75,000 25

0 0 0 0
100% Writes 70/30 R/W 100% Read 100% Writes 70/30 R/W 100% Read

VROC 1x Pt IOPS HBA 1x Pt IOPS VROC 4x Pt's IOPS HBA 4x Pt's IOPS
VROC 1x Pt Ave. Latency HBA 1x Pt Ave. Latency VROC 4x Pt's Ave. Latency HBA 4x Pt's Ave. Latency

Intel VROC provides unimpeded access to storage for lower latency I/0
▪ Single Drive, 100% Write: {40% IOPS ↑, 32% Latency ↓}
▪ Single Drive, 100% Read: {29% IOPS ↑, 23% Latency ↓}

Single drive performance improvements scales to multiple drives


See backup for configuration details. Results may vary

Intel Optane Group 18


Peak Performance, Pass-Thru Comparison2
NAND SSDs: 16 Thread, 64 IODepth

(higher is better)
1x Pass-thru Comparisons 4x Pass-thru Comparisons
(lower is better) (higher is better) (lower is better)
800,000 2.0 3,000,000 15
1x 4x
IO
WΔ WΔ
600,000 1.5

CPU Cores Used

CPU Cores Used


2,000,000 10
13W Write 20W

IOPS
IOPS

400,000 1.0
17W 70/30 30W
1,000,000 5
200,000 0.5
22W Read 46W

0 0.0 Power (W) Usage Delta 0 0


100% Writes 70/30 R/W 100% Read RAID HBA (W) – Intel VROC (W) 100% Writes 70/30 R/W 100% Read

VROC 1x Pt IOPS HBA 1x Pt IOPS VROC 4x Pt's IOPS HBA 4x Pt's IOPS
VROC 1x Pt CPU Cores Used HBA 1x Pt CPU Cores Used VROC 4x Pt's CPU Cores Used HBA 4x Pt's CPU Cores Used

Higher workloads saturate the storage on both solutions


▪ Latency differences are masked, performance becomes equivalent

Other architecture differences are exposed: Power and CPU usage


▪ Additional HBA power draw creates positive WΔ; Intel VROC ↓ Power
▪ RAID HBA on card processing is oversaturated by larger workloads; Intel VROC ↓ CPU Usage (See detailed CPU Review)
See backup for configuration details. Results may vary 19
Intel Optane Group
RAID0/1/5/10 Performance Results

Department or Event Name 20


RAID Levels Performance Comparison2
NAND SSDs: 16 Thread, 64 IODepth
Write IOPS Comparison Reads IOPS Comparison RAID Level
700,000 3,000,000 Ma x. IOPS

600,000 2,500,000

500,000
2,000,000
400,000
IOPS

IOPS
1,500,000
300,000
1,000,000
200,000

100,000 500,000

- -
RAID0 RAID5 RAID10 RAID1 RAID0 RAID5 RAID10 RAID1

VROC Writes HBA Writes VROC Reads HBA Reads

▪ Intel VROC has 33% more IOPS on RAID5 writes

Intel VROC Read Performance scales to maximum 4x SSD Spec (~2.8M IOPS RAID0/5/10)
▪ HBA hits 2.2M IOPS Bottleneck; Intel VROC delivers up to 27% more IOPS on RAID0/5/10 reads

See backup for configuration details. Results may vary

Intel Optane Group 21


Detailed RAID0/5 Review (Latency, CPU%,
Power)

Department or Event Name 22


RAID0/5 Read Comparison2
NAND SSDs: 16 Thread, 64 IODepth VROC HBA

RAID0 RAID5
Intel VROC RAID0/5 reads provides: 3,000,000

• ↑ IOPS
2,000,000 2,808,785
2,811,404 IOPS
2,255,698 2,196,795 Hi gher is better
1,000,000

• ↓ Latency
0

600

• ↓ CPU Usage 300


447 461
Latency/usec
Lower i s better
362 362

• ↓ Power Consumption 0

10

5 8 7 CPU Cores Used


Integrated RAID is a more effective RAID 4 5 Lower i s better
0
architecture for NVMe SSDs
780

Up to 30% more Read IOPS/W 771 768 Power (W)


750
748 751 Lower i s better
Up to 164% more Read IOPS/CPU Cores Used 720

See backup for configuration details. Results may vary

Intel Optane Group 23


RAID0/5 Write Comparison2
NAND SSDs: 16 Thread, 64 IODepth VROC HBA

RAID0 RAID5
Intel VROC RAID0/5 reads provides: 700,000

• ↑ IOPS 350,000 601,711 595,907 IOPS


Hi gher is better
214,286 161,029

• ↓ Latency
0

3,200

RAID 0 also ↓ CPU Usage and Power Usage 1,600


3,178 Latency/usec
2,386 Lower i s better

RAID5 provides higher performance metrics but with 0 850 857

higher resource usage (CPU and Power).... 3.0

This is not the whole story 1.5


2.62
CPU Cores Used
1.87 0.50 Lower i s better
1.23
0.0

Up to 28% more Write IOPS/W


740

See ‘CPU% Usage Explained’ for more 730 Power (W)


720
707 726 Lower i s better
703
700

See backup for configuration details. Results may vary

Intel Optane Group 24


CPU% Usage Explained

Department or Event Name 25


CPU% Usage-Perception2
Common perception: RAID HBA consumes less host CPU resources due to HBA offload
Reality: Intel VROC can deliver ↑ performance and consumes ↓ CPU resources!

IOPS v CPU (writes) IOPS v CPU (reads)


750,000 6 3,000,000 15
RAID5 Write

CPU Cores Used


CPU Cores Used
Exception

IOPS
500,000 4 2,000,000 10
IOPS

250,000 2 1,000,000 5

0 0
0 0
1x Pt 4x Pt RAID0 RAID5
1x Pt 4x Pt RAID0 RAID5

VROC IOPS HBA IOPS VROC CPU Cores Used HBA CPU Cores Used

HOW?
See backup for configuration details. Results may vary

Intel Optane Group 26


CPU% Usage-Reality Explained2

NVMe SSD performance can overwhelm RAID HBA offload design


16 Threads 64 IODepth → 100k’s Write IOPS and 1M’s Read IOPS

HBA architecture has choke points that can bottleneck performance:


1. Limited PCIe Uplink (x8 PCIe lanes) Full x4 bandwidth per NVMe SSD
Intel® VROC
2. Fixed amount of RAID processing Improvements Scaled compute on powerful Intel Xeon

3. SCSi-based RAID stack NVMe optimized RAID stack

These limitations cause thrash on CPU%....and can lead to iowait%


See backup for configuration details. Results may vary

Intel Optane Group 27


iowait% Closer Look2
NAND SSDs: 16 Thread, 64 IODepth → 16 Thread, 256 IODepth

VROC HBA
RAID5 writes require high CPU%
RAID5 64-Thread RAID5 256-Thread • Highest of any Intel VROC supported RAID level per IOP
300,000
RAID HBA offload generates iowait at higher
150,000 219,960
workloads:
214,286
161,031 IOPS
161,029
Hi gher is better • If limits of HBA architecture are reached (more IO), host CPU
0
usage ramps up in iowait%

4
• Iowait could be wasted cycles depending on application

Intel VROC is more efficient for RAID5 writes:


CPU Cores Used
2 3.7 Lower i s better
2.6 2.2
Cores i n
No ramping of iowait
i owait
0.5 0.5
0 Up to 4% more Write IOPS/CPU Cores Used*
*when accounting for i owait%

See backup for configuration details. Results may vary

Intel Optane Group 28


CPU% Usage-Customer Impact

Server design must plan for Peak Storage Load


Peak Storage Load (PSL): Max. IO during data center operation
RAID HBA Intel VROC
• Bottleneck performance • Scale performance to absorb PSL
RAID Solution • Iowait% ramp and higher latency • Proportionally ramp CPU usage and latency
Response to PSL • Operational Thrash if storage • Mitigate server thrash with fewer CPU cores
architecture not properly planned dedicated for RAID

Intel VROC servers often require fewer CPU cores


to handle Peak Storage Load
See backup for configuration details. Results may vary

Intel Optane Group 29


Backup

Department or Event Name 30


Configuration Details
1. Intel VROC vs RADI HBA Comparison (Optane)
System configuration: Beta Coyote Pass M50CYP2SB2U/M50CYP2SBSTD (chassis M50CYP2UR208BPP), 2 x Intel® Xeon® Platinum 8358 CPU @ 2.60GHz,
32 cores each, DRAM 128GB , BIOS Release 04/02/2021, BIOS Version: SE5C6200.86B.0020.P24.2104020811
OS: RedHat* Enterprise Linux 8.1, kernel-4.18.0-147.el8.x86_64, mdadm - v4.1 - 2018-10-01, Intel® VROC Pre-OS version 7.5.0.1152
Storage: Both configurations used 4 x 400GB Intel Optane P5800X PCIe Gen4 U.2 SSDs (Model: SSDPF21Q400GB, Firmware: L0310100) connected to
backplane which is connected via SlimSAS cables directly to a Broadcom 9560-16i (x8) card on Riser 2, PCIe slot 1 on CPU2 BIOS setting:
SpeedStep(Enabled), Turbo(Enabled), ProcessorC6(Enabled), PackageC-State(C0/C1 State), CPU_PowerAndPerformancePolicy(Performance),
HardwareP-States(NativeMode), WorkloadConfiguration(I/O Sensitive)
RAID Configurations: 4-Disk RAID0/5/10 and 2-Disk RAID1 with Intel VROC and Broadcom MegaRAID 9560-16i
Workload Generator: FIO 3.25, 16-thread 16-IODepth
Performance results are based on testing as of 6/25/2021 and may not reflect all publicly available updates. See configuration disclosure for details. No
product can be absolutely secure.

Intel Optane Group 31


Configuration Details
2. Intel VROC vs RADI HBA Comparison (NAND)
System configuration: Beta Coyote Pass M50CYP2SB2U/M50CYP2SBSTD (chassis M50CYP2UR208BPP), 2 x Intel® Xeon® Platinum 8358 CPU @ 2.60GHz,
32 cores each, DRAM 128GB , BIOS Release 03/22/2021, BIOS Version: SE5C6200.86B.0022.D08.2103221623
OS: RedHat* Enterprise Linux 8.1, kernel-4.18.0-147.el8.x86_64, mdadm - v4.1 - 2018-10-01, Intel® VROC Pre-OS version 7.5.0.1152
Storage: Both configurations used 4x 3.84 TB Intel® D7-P5510 Series SSDs (Model: SSDPF2KX038TZ, Firmware: JCV10016) connected to internal
backplane. With Intel VROC config, backplane connect directly to CPU2 via SlimSAS. With RAID HBA, backplane connect to RAID HBA on Riser 2, PCIe slot
1 on CPU2
BIOS setting: SpeedStep(Enabled), Turbo(Enabled), ProcessorC6(Enabled), PackageC-State(C0/C1 State),
CPU_PowerAndPerformancePolicy(Performance), HardwareP-States(NativeMode), WorkloadConfiguration(I/O Sensitive)
RAID Configurations: 4-Disk RAID0/5/10 and 2-Disk RAID1 with Intel VROC and Broadcom MegaRAID 9560-16i
Workload Generator: FIO 3.25, 1-thread 1-IODepth, 16 thread 64/256 IODepth
Performance results are based on testing as of 5/3/2020 and may not reflect all publicly available updates. See configuration disclosure for details. No
product can be absolutely secure.

Intel Optane Group 32


33

You might also like