0% found this document useful (0 votes)
10 views

2019 Dac SSD Vibration

Curso sdd

Uploaded by

Noelia Noelia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

2019 Dac SSD Vibration

Curso sdd

Uploaded by

Noelia Noelia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

What does Vibration do to Your SSD?

Janki Bhimani Tirthak Patel Ningfang Mi Devesh Tiwari


[email protected] [email protected] [email protected] [email protected]
Northeastern University Northeastern University Northeastern University Northeastern University

Abstract the adverse effects of vibration. In particular, this is the first work
to investigate the following research questions (RQs):
Vibration generated in modern computing environments such
as autonomous vehicles, edge computing infrastructure, and data
RQ1: What is the impact of vibration on SSD performance (e.g.,
center systems is an increasing concern. In this paper, we system-
I/O operation latency and bandwidth)?
atically measure, quantify and characterize the impact of vibration
on the performance of SSD devices. Our experiments and analysis
RQ2: Does the impact of vibration on SSD performance vary across
uncover that exposure to both short-term and long-term vibration,
different SSD vendors and I/O operation types (e.g., read and write)?
even within the vendor-specified limits, can significantly affect SSD
I/O performance and reliability.
RQ3: Is the performance of SSD devices sensitive to the length of
Keywords vibration exposure?
SSD; Vibration; Reliability; Data Centers; Autonomous Vehicles
ACM Reference Format: In this work, we systematically measure, quantify and character-
Janki Bhimani, Tirthak Patel, Ningfang Mi, and Devesh Tiwari. 2019. What ize the impact of vibration on the performance of SSD devices. Our
does Vibration do to Your SSD?. In The 56th Annual Design Automation results show that exposure to vibration, even within the vendor-
Conference 2019 (DAC ’19), June 2–6, 2019, Las Vegas, NV, USA. ACM, New specified limits, can significantly affect the performance of SSD I/O
York, NY, USA, 6 pages. https://doi.org/10.1145/3316781.3317931 performance. Our experiments discover that the degree of impact
varies across vendors and workload types – in some cases, vibration
1 Introduction can negatively affect the read/write tail latency by more than 10%,
There has been an increasing concern about the effect of noise critical for safety in autonomous vehicles [15] and performance in
and vibration on the performance of computing and storage in- data center computing environments [11]. Interestingly, we also ob-
frastructures from data centers to autonomous vehicles [13, 15, 23]. serve that repeated exposure to short-term vibration has lingering
Recent events have highlighted the significant disruptions caused after effects on SSD performance even in the absence of vibration.
by noise and vibration to operations of computing centers [11, 24]. On the other hand, long-term exposure to vibration may result in
Most notable and severely affected examples include the Nasdaq more than 30% performance degradation. Long-term exposure to
Nordic stock exchange data center in Finland (2018), the Microsoft vibration can lead to performance slowdowns and abrupt failures,
Azure data center in Europe (2017), and the ING Bank data center in although SSDs continue to function again after a restart.
Romania (2016) [23, 25, 26]. As computing and storage devices will During this study, we experimented for thousands of SSD-hours
increasingly operate in harsh environments such as space explo- with close to one hundred SSDs from different vendors. By the end
rations, edge computing, and autonomous vehicles [9, 13, 15, 16], of it, many SSDs came out permanently bruised due to vibration,
the effects of vibration will continue to worsen. and some SSDs succumbed to its adverse effects, despite the
Vibration has been shown to majorly and primarily affect the vibration being within the vendor-specified limits. We analyzed a
performance of hard disk drives (HDDs) because HDDs have large amount of sensor and performance data from these SSDs via
moving mechanical parts which can be physically perturbed by various I/O tools. However, we share only selected findings and ob-
vibration [5, 6]. However, HDDs are increasingly getting replaced servations that we could conclude with high statistical significance.
with solid state drives (SSDs) due to their lower I/O access latency Anonymized experimental data is being made publicly available at
and higher I/O bandwidth. SSDs have also been hypothesized https://github.com/GoodwillComputingLab/SSDVibration
to be less prone to vibration related side-effects because of the for the research community to better understand, model, and
absence of any mechanical components [6, 21]. This work aims to mitigate the impact of vibration on SSDs.
perform a systematic study to investigate if SSDs are resistant to
2 Background
This project was partially supported by the NSF Career Award CNS-1452751. This section describes the architectural components of a SSD and
Permission to make digital or hard copies of all or part of this work for personal or prior works that study the impact of vibration on storage devices.
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation 2.1 SSD Internal Components
on the first page. Copyrights for components of this work owned by others than ACM As mentioned earlier, SSDs provide higher I/O bandwidth and lower
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a I/O operation latency than HDDs, and hence, are becoming increas-
fee. Request permissions from [email protected]. ingly prevalent from data centers to autonomous systems. A SSD
DAC ’19, June 2–6, 2019, Las Vegas, NV, USA uses semiconductor chips to persistently store data, as opposed to a
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6725-7/19/06. . . $15.00 HDD, which uses magnetic tapes. The absence of mechanical com-
https://doi.org/10.1145/3316781.3317931 ponents distinguishes SSDs from conventional electro-mechanical
Figure 1: Internal components of an SSD.
HDDs, which contain spinning disks and movable read/write heads.
As shown in Fig. 1, the two main components that compose a SSD
are the flash chips and the flash memory controller. The flash chips
are made of logical NAND gates that store data bits, and the flash
memory controller manages all the I/O operations.
Figure 2: Experimental setup for SSD vibration tests.
NAND Flash Memory is a type of nonvolatile storage technology
that does not require power to retain data and uses NAND flash As SSDs do not have any moving mechanical parts, they are
cells to store data. There are different types of NAND flash mem- believed to be more resistant to vibration [6, 21]. However, as dis-
ories depending on the number of bits stored in each cell and the cussed earlier, SSDs are composed of sensitive integrated circuit
arrangement of the cells. (IC) assemblies for NAND chips and controller – on which effect of
The Flash Memory Controller manages the data stored on flash vibration is not studied. Therefore, to bridge this knowledge gap,
memory and communicates with the compute system. The flash we study the impact of vibration on SSDs.
memory controller includes the Flash Translation Layer (FTL), which
maps the host side logical block addresses (LBAs) to the physical ad- 3 Experimental Methodology
dresses of the flash memory. This controller is responsible for imple-
menting flash management algorithms such as over-provisioning, In this section, we describe the experimental methodology to sys-
wear leveling, and garbage collection. To make the storage device tematically study the impact of vibration on SSDs in our controlled
operate properly, the controller maps out bad flash memory cells environment. More specifically, we create a controlled experimental
and allocates spare cells to be substituted for future failed cells from setup that enables us to accurately capture and analyze the effect
the over-provisioned area. To mitigate write-endurance issues, the of vibration on SSD performance. Previous works have performed
controller performs wear-leveling to distribute write I/O operations field-studies to observe and analyze the effects of vibration on
uniformly to ensure similar rate of aging among data blocks. The HDDs in large-scale data centers [6, 11, 24]. However, performing
controller also periodically performs garbage collection to improve accurate, interference-free and fine-grained experiments to develop
endurance, but it also causes high tail latency. systematic understanding of vibration’s impact is often not possi-
We note that the flash management algorithms including over- ble in real-world data centers and autonomous vehicles. Therefore,
provisioning, wear leveling, and garbage collection have significant the approach of this study is to draw conclusions via performing
impact on SSD performance, but the vendors do not disclose the controlled experiments on different types of SSDs.
details of this proprietary information. This limits our ability to 3.1 Experimental Platform Setup
identify the root causes and provide explanations for our findings
Fig. 2 shows the major components of our experimental platform
about the impact of vibration on SSD performance.
setup and how these components are connected to each other. The
SSD is placed on the vibration plate of the vibration generator. The
2.2 Effect of Vibration on Storage Devices intensity of the vibration in the vibration generator is controlled
Autonomous vehicles operate in a dynamic environment where by the wave function generator. We place the Operating System
vibration, shock, high temperature, humidity and other environ- (OS) on a separate disk that is insulated from vibration mounted
mental conditions can affect the computing and storage devices on on a rack to ensure that the system kernel is decoupled from the
board [9, 13, 15, 16]. Data centers house a large number of server and impact of vibration. The SSDs are extended from system connec-
storage racks with sophisticated power supply and cooling systems tor by SATA extension cables to ensure that none of the system
which maintain efficient operations. Thus, data center vibration can components are impacted by vibration except the SSDs. The SATA
be generated via multiple sources including computer servers and cable connecting SSDs is tightly secured to guard it against loose
power/cooling infrastructure. In a data center, servers with high connection problems while performing vibration experiments. We
load, high-velocity airflow, large fans, cooling units, chillers, com- have kept the SSD tight in-place using tapes and metal plates while
pressors, and standby power sources can contribute to vibration. under vibration. However, we note that tightly packed devices in a
Prior related works have attempted to identify the effects of this data center rack or a moving vehicle are still affected by vibration
vibration on different components of the computing and storage since not all metals can absorb vibration. Specifically designed racks
systems [6, 24]. These works have observed that the performance that absorb vibrations are made of are typically 4x more expensive
of traditional HDDs is significantly impacted by vibration [12]. I/O than traditional racks [2].
performance of HDDs can degrade by up to 50% in some cases [7]. We note that the whole setup is inside an isolated room under
Different electro-mechanical faults on storage drives have also been normal operating conditions, away from other kinds of vibration,
linked to vibration [8, 11, 28]. heat or external factors that may affect our conclusions. The servers
2
performance metrics. We use SSDs from three different major SSD
vendors. For anonymity reasons, we do not disclose the vendor
names, but they are major representative SSD vendors who share
a large market fraction. We have chosen these vendors to ensure
broad coverage of NAND type and variety in the flash management
algorithms (although the details are proprietary).

Table 1: Testbed configuration.


Component Specs
Server Optiplex9020
Processor Intel(R) Core(TM) i7-4770 CPU
Processor Speed 3.40GHz
Figure 3: Parallel and perpendicular orientations of the SSD Processor Cores 16 Cores
L3 Cache Size 8192K
w.r.t. the vibration axis. Memory Capacity 16GB
Operating System Ubuntu 16.04 LTS
are placed on vibration absorbing carpet. We exclusively place only Kernel 4.4.0-137-generic
SSD Capacity 120 GB
one SSD on single vibration generator to ensure that the magnitude SSD Type Vendor A, Vendor B, and Vendor C
of applied vibration is not dampened because of the weight of SSDs. SATA Version SATA 3.2, 6.0 Gb/s
Form Factor 2.5 inches
Throughout our experiments, we preserve SSDs in the same form Vibration Wave Type Sinusoidal
Vibration Intensity 10A, 20Hz
as what we receive from vendors without removing their IC from
the original chassis of SSDs. To generate vibration that replicates
3.4 I/O Workloads
data center vibration, we use high accuracy equipment from “3B
Scientific”. Specifically, we use the FG100 function generator and FIO (Flexible I/O Tester) benchmark [3] is used to generate different
U56001 vibration generator which can produce sine, square and saw- types of I/O operations via “libaio” I/O engine. We perform direct I/O
tooth vibration waves with adjustable amplitude and frequency. operations to the SSDs, bypassing the host file system. To emulate
the operations of real applications, we configure the I/O depth as
3.2 Vibration Environment 16 and formulate different FIO workloads. We primarily focus on
Type: To generate the vibration, we use a classical sine wave (e.g., 60 random I/O patterns because random I/O is noted to be more critical
Hz AC power). We also induced vibrations with different waveforms for obtaining high performance and challenging to guarantee SLAs
including square and saw-tooth waveforms and observed that they for tail latency. Our workload generates random I/O of different
lead to similar trends and results. sizes from 4KB to 1MB and is composed of both read and write
Intensity: We ensure that our vibration intensity does not violate requests. We report I/O tail latency and bandwidth as the primary
the limit specified in the warranty sheet of the respective SSD ven- metrics for SSD performance [9, 11, 15].
dor. We have chosen 10A-20Hz vibration intensity for consistency
3.5 Short-term vs Long-term Vibration Phases
across all SSD types, and this intensity is well below the typical
threshold specified by SSD vendors [14, 17–20]. We evaluate the impact of vibration using multiple SSDs from differ-
Orientation: In a data center, the angle between the axis of vi- ent vendors. For each of our experiments, we use brand-new SSDs
bration and alignment of the components of SSDs can be between which were never exposed to vibration to the best of our knowledge.
0◦ to 180◦ as different servers comprise of different types of rack We precondition a fresh SSD by writing through its whole address
mounting equipment, and also the sources of vibration originate space twice to ensure steady-state performance. Then, to explore
from different relative angles. Therefore, in this work, to capture the impacts of “short-term” vibration on SSDs, we first execute
real-world settings and tractability for performing experiments, an I/O workload for six hours without vibration for the baseline
we consider two major angles/types of orientations of SSD with performance. Then, on the same SSD, we run the same workload for
respect to the axis of vibration: parallel and perpendicular. another six hours while vibrating the SSD and compare it with the
As shown by the setup of equipment on the left-hand side of no-vibration phase performance on the same SSD to avoid manu-
Fig. 3, for the parallel orientation of the SSD to the axis of vibra- facturing variability. We chose six hours as the short-term window
tion, we place SSD horizontally on the plate that is vibrating in to attain statistical significance with high number of samples (over
the up-down motion. The short-hand notation used in this paper 42,000 samples in six hours for each SSD).
for representing the parallel/horizontal orientation is the symbol We perform this set of experiments with multiple SSDs from
“ = ”. The right-hand side of the Fig. 3 shows the perpendicular the same vendor and observed very small differences in perfor-
orientation of SSD. The short-hand notation used in this paper for mance across SSDs from the same vendor. If we identified an outlier
representing the perpendicular orientation is the symbol “ ⊥ ”. SSD (inherently slow SSD in a set of large SSDs from the same
vendor), we dropped it from our set-up. To study the impact of
3.3 Testbed Setup “long-term” vibration, we execute a workload continuously for 120
Table 1 summarizes the configuration of our testbed. The SSDs are hours. We use three separate SSDs to conduct no-vibration, parallel
setup as hot plug components to enable and disable their connection (“ = ”) vibration, and perpendicular (“ ⊥ ”) vibration tests. Again,
to a running computer system without significant interruption to we carefully conduct our experiments on different SSDs for differ-
the system’s operation. We use open source measurement tools ent vibration types to avoid interference and post-effects of one
(dstat [27], iostat [10], blktrace [4], smartctl [1]) to measure vibration type on another vibration type. We collect over 840,000
3
Figure 4: Read tail latencies increase significantly under ⊥ Figure 6: Short-term exposure to = and ⊥ vibration may have
vibration, and slightly under = vibration. Results are nor- post-effect on read tail latencies (even when the SSD is not
malized to the no-vibration case for respective percentiles. under vibration).

Figure 5: Write tail latencies increase significantly under ⊥


vibration, and slightly under = vibration. Results are nor- Figure 7: Short-term exposure to = and ⊥ vibration may have
malized to the no-vibration case for respective percentiles. post-effect on write tail latencies (even when the SSD is not
under vibration).
samples in 120 hours for each SSD to attain statistical significance
for the long-term experiments. We also ensure that no SSD sur- some cases and upto 30% in worst-case scenario. Second, all three
passes or reaches near its threshold for write-endurance during vendors observe performance degradation in tail latency, although
our experiments. To avoid performance interference for different by varying degree and without clear trends. For example, vendor C
workloads, we repeat the same experiment process with a new set shows only small degradation in the read tail latency (less than 5%),
of SSDs from each vendor and avoid manufacturing variability by but experiences large degradation in the write tail latency (up to
discarding inherently slow SSDs under no vibration. 18%). Finally, our results show that perpendicular vibration has a
relatively higher negative impact on the tail latency compared to par-
4 Results and Analysis allel vibration – this is true almost in all cases, for both read and write
In this section, we present our results and analysis of the impact operations and different vendor types. The difference in performance
of vibration on the performance of SSD devices. We begin by dis- degradation with different orientations can be as significant as 30%.
cussing how SSD performance is affected during active vibration. We examined the SMART attributes to identify hidden patterns
Then, we discuss the post-effects of vibration on SSD performance and potential root causes. We did not notice any considerable differ-
when the SSD is not under active vibration. Finally, we discuss the ences in “media-wearout” of NAND flash chip between no-vibration
long-term impact of vibration on SSD performance. and short-term vibration. Also, we did not find a higher rate of in-
Effect of vibration on SSD performance during active vi- crease in corrected ECC errors under vibration. We theorize that
bration phase: First, we assess SSD performance by measuring the increase in tail latency under short-term vibration might be due
the bandwidth and latency of I/O operations under vibration. We to FTL operations getting affected as the flash memory controller
found that the mean performance is not affected by vibration. That consists of CPUs which are susceptible to vibration effects [22].
is, the mean I/O bandwidth and latency for both read and write However, lack of proprietary knowledge of FTL and flash controller
operations under vibration are the same as when the SSD is not workings limit our ability to pinpoint the exact root-cause.
under vibration. Interestingly, further analysis revealed that while Post-effect of short-term vibration on SSD performance
mean I/O latency is not affected, the tail latency is significantly during no-vibration phase: Next, we investigate if exposure to
impacted. Fig. 4 and 5 show the degradation in read and write tail short-term vibration has any post-effects. That is, how does SSD
latencies under vibration as compared to the baselines, which are performance change between two no-vibration periods only apart
the corresponding read and write tail latencies when the SSD is not by short-time exposure to vibration in between.
subjected to any kind of vibration (referred to as “no-vibration”). As the baseline result, we noted that the performance does not
The degradation in performance is normalized to the correspond- change significantly across no-vibration periods when vibration is
ing no-vibration percentile value (i.e., our results isolate the effect not applied in between. However, when we apply short-term vibration
of increasing tail latency as the percentile number grows and de- between the two short-term no-vibration periods, the performance
picts the degradation corresponding to the base case for the chosen of the second no-vibration period gets affected. Fig. 6, and 7 show
percentile). We make several interesting observations. the degradation in read and write tail latencies of the second no-
First, both the read and write tail latencies are affected by vibration. vibration period compared to the first no-vibration period when
The observed performance degradation can be more than 10% in = or ⊥ vibration is applied in between them. Vibration seems to
4
Figure 8: Long-term exposure to vibration significantly degrades the SSD read tail latency under both vibration types. This is
especially true for vendors B and C.

Figure 9: Long-term exposure to vibration significantly degrades the SSD write tail latency under both vibration types. This is
especially true for vendors B and C.

kernel: [1209891.438012] sd 0:0:0:0: [sda] Synchronizing SCSI cache


kernel: [1209891.438033] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result:
hostbyte=DID_BAD_TARGET
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
driverbyte=DRIVER_OK
kernel: [1209891.438034] sd 0:0:0:0: [sda] Stopping disk
kernel: [1209891.438038] sd 0:0:0:0: [sda] Start/Stop Unit failed: Result:
hostbyte=DID_BAD_TARGET
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
driverbyte=DRIVER_OK
system-udevd[28027]: Process ‘/lib/udev/hdparm’ failed with exit code 5.

Figure 11: Error reported by syslog upon SSD failure.

Figure 10: Long-term exposure to vibration can also decrease compared to the other vendors, potentially because of the NAND
the mean read and write I/O bandwidth. type (e.g., MLC, TLC). Third, our results reveal that = vibration has a
much higher impact compared to ⊥ vibration. Note that ⊥ vibration
maintain some post-effect. However, the magnitude is not as large results in high-performance degradation in the short-term itself, while
as the degradation during the vibration period. Interestingly, in = vibration does not. Our result shows that while harmless in short-
some cases, = vibration appears to have relatively higher post-effect term, = vibration becomes harmful in long-term, almost as bad as the
than ⊥ vibration, although the effect of = vibration on tail latency short-term effect of ⊥ vibration. The degradation due to ⊥ vibration
during the vibration phase itself is lower than ⊥ vibration. Lingering does not further increase dramatically over the long-term.
post-effects of vibration during no-vibration phases, albeit small but Interestingly, we observed that after long-term exposure to vi-
persistent, could be the root cause of fail-slow type performance defects bration, the mean Bandwidth also drops noticeably, up to 10%
observed in field studies [11]. for = vibration (Fig. 10). While not shown in the results due to
Effect of long-term vibration on SSD performance: Moti- space constraints, we observed higher variation in observed band-
vated by the significant immediate impact of vibration during the width under = and ⊥ vibration compared to no-vibration after
short-term, we explore if long-term exposure to vibration affects running the SSD for the specified long-term period. We also ex-
SSD performance. To capture and compare the effects methodically amined SMART attributes such as - Media_Wearout_Indicator,
against no-vibration, we employ multiple SSDs. All SSDs run the Available_Reservd_Space, and Hardware_ECC_Recovered, but
same random read-write workload for an equal amount of time; did not observe any conclusive impact of vibration. This indicates
each one-third of the group is kept under no-vibration, = vibration, that even when vibration is causing the performance degradation,
and ⊥ vibration. Performance after the long-term is normalized the corresponding symptoms may not be visible even in long-term
with respect to the initial short-term period in each group to avoid via traditional performance and health check tools.
manufacturing variability across devices. Long-term exposure to vibration can lead to silent fail-
Fig. 8, and 9 show the degradation in read and write tail latencies ures: We continued our long-term vibration experiments with an
across vibration types and vendors. We make several interesting intent to let it continue until the SSD wears out by writing more
observation. First, when the SSD experiences no vibration, the perfor- data than what is specified in the warranty sheet. To our surprise,
mance deteriorates negligibly over the long term (considered as 120 some SSDs running under vibration started observing silent and tran-
hours in this study) across all vendor types and I/O operation type. sient failures much before they surpassed the write endurance limit
However, the long-term impact of vibration on SSD performance is and soon after the length of our long-term window. We note that these
dramatic, up to 45% in many cases. Second, the degree of performance failures were not observed in all SSDs under vibration – making
degradation due to long-term exposure varies significantly across it difficult to predict and proactively manage such failures. Also,
vendors. Vendor A observes relatively small impact (less than 5%)

5
contain and mitigate the side-effects of vibration on SSD perfor-
mance in variable computing environments including autonomous
vehicles and data centers.
5 Conclusion
Figure 12: Write bandwidth CDF and mean write bandwidth This paper begins by posing a simple question for investigation:
of vendor A SSD may change in between the stop failures. “how does vibration impact the performance of your SSD?”. We
conclude by observing for the first time that vibration can have
a severe impact on the tail latency of the SSD and this impact is
dependent on the vendor. We discovered that exposure to vibration
can, surprisingly, leave post-effects even when the SSD is not under
vibration. Additionally, it can damage the SSD performance in the
long-term, which has serious implications for data center SLAs and
Figure 13: Write bandwidth CDF and mean write bandwidth
usage of SSDs in autonomous vehicles.
of vendor C SSD may change in between the stop failures.
References
our previous long-term discussion did not include any SSD perfor- [1] Bruce Allen. 2018. smartmontools. https://linux.die.net/man/8/smartctl
mance data with such failures. On the other hand, any SSD which [2] Startup Takes Aim at Performance-Killing Vibration in Datacenter. 2010. vibra-
tionrack. https://bit.ly/2FGNH6L
was under no-vibration did not show any such behavior. [3] Jens Axboe. 2018. FIO. https://fio.readthedocs.io/en/latest/fio_doc.html
The SSD failures resulted in the running workload being termi- [4] Jens et al. Axboe. 2018. blktrace. https://linux.die.net/man/8/blktrace
nated unexpectedly; however, the SSDs worked fine after a restart [5] Ethan Brush. 2018. Noise and Vibration Considerations for Data Centers and IT
Facilities. https://bit.ly/2UpXK9u
until the next failure. Fig. 11 shows the the syslog snippet for [6] Christine S Chan, Boxiang Pan, et al. 2014. Correcting Vibration-Induced Per-
one such occurrence of this type of SSD failure. The error in sys- formance Degradation in Enterprise Servers. ACM SIGMETRICS Performance
Evaluation Review 41, 3 (2014), 83–88.
log indicates that the SSD, which is connected as “sda” suddenly [7] Trinoy Dutta and Andrew R Barnard. 2017. Performance of Hard Disk Drives in
goes undetected. Then, upon relaunching the workload, it resumes High Noise Environments. Noise Control Engineering Journal 65, 5 (2017).
proper execution. Also, Linux command “lsblk” reports the SSD [8] Takehiko Eguchi, Yohei Asai, et al. 2017. Airborne and Structure-Borne Trans-
mission of High Frequency Fan Vibration in a Storage Box. In Conference on
correctly. Thus, this transient stop fault of SSD is prone to go unde- Information Storage and Processing Systems. ASME.
tected or be classified as a NDF (no defect found) in the data center [9] Ming Yang et al. 2019. Re-thinking CNN Frameworks for Time-Sensitive
setting. We performed more in-depth analysis to understand the Autonomous-Driving Applications: Addressing an Industrial Challenge. IEEE
Real-Time and Embedded Technology and Applications Symposium (RTAS). (2019).
performance trend during these transient failures. [10] Sebastien Godard. 2018. iostat. https://linux.die.net/man/1/iostat
Notably, we observed that the performance of SSDs which experi- [11] Haryadi S Gunawi, Riza O Suminto, et al. 2018. Fail-slow at Scale: Evidence of
Hardware Performance Faults in Large Production Systems. ACM Transactions
ence such transient faults drops significantly. Fig. 12, and 13 show on Storage (TOS) 14, 3 (2018), 23.
the Cumulative Distribution Function (CDF) and mean of band- [12] YY Hu, S Yoshida, et al. 2009. Analysis of Built-In Speaker-Induced Structural-
width for two failing SSDs from vendor A and C under different Acoustic Vibration of Hard Disk Drives in Notebook PCs. IEEE Transactions on
Magnetics 45, 11 (2009), 4950–4955.
types of vibration, as representative examples. These figures show [13] R Wayne et al. Johnson. 2004. The changing automotive environment: high-
the performance during the period between multiple transient fail- temperature electronics. IEEE Transactions on Electronics Packaging Manufactur-
ures. We make several new observations. First, when we compare ing 27, 3 (2004), 164–176.
[14] Kingston. 2018. A400 SSD. https://bit.ly/2WF6hTk
the SSD I/O bandwidth without vibration and with vibration, we [15] Shih-Chieh Lin, Yunqi Zhang, Chang-Hong Hsu, Matt Skach, Md E Haque, Lingjia
observe a significant performance drop after the transient failure. Tang, and Jason Mars. 2018. The architectural implications of autonomous driving:
Constraints and acceleration. In ASPLOS 2018. ACM, 751–766.
Second, between consecutive phases of no-vibration separated by [16] Shaoshan Liu, Jie Tang, Zhe Zhang, and Jean-Luc Gaudiot. 2017. Computer
transient failures, the bandwidth decreases by more than 20% in architectures for autonomous driving. Computer 50, 8 (2017), 18–25.
one case. We note that this decrease is larger than the decrease [17] Micron. 2018. 5100 Series. https://bit.ly/2Upqpvt
[18] Mydigitalssd. 2018. Superboot. https://mydigitalssd.com/2.5-inch-sata-ssd.php
observed in long-term under no-vibration. Upon further inspection, [19] Samsung. 2018. 850 EVO SSD. https://images-eu.ssl-images-amazon.com/
we estimated that the “media wear-out” increases at a higher rate images/I/61HGJaHYy-L.pdf
suddenly, despite the fact that SSD should have been far from its [20] Sandisk. 2018. Extreme II. http://mp3support.sandisk.com/downloads/qsg/
extreme2-ssd-datasheet.pdf
write-endurance limit. This is potentially because the damaged [21] Christine Taylor. 2018. SSD vs. HDD. http://www.enterprisestorageforum.com/
NAND cells are replaced by spare NAND cells of the over-provision storage-hardware/ssd-vs.-hdd.html
[22] Techspot. 2018. Effect of Vibrations on CPU. https://www.techspot.com/
(OP) region. Essentially, these failures appear to be silent and tran- community/topics/cpu-fan-vibrating.99261/
sient at first but lead to premature death of the SSD eventually as [23] Iain Thomson. 2018. Azure Fell over for 7 Hours in Europe because Someone
we found in several cases during our study. Accidentally Set Off the Fire Extinguishers. https://www.theregister.co.uk/2017/
10/03/faulty_fire_systems_take_down_azure_across_northern_europe/
Future Implications: Tail latency and performance slow- [24] Julian Turner. 2010. Effects of Data Center Vibration on Compute System Perfor-
downs are the most critical factors for both autonomous systems mance. In SustainIT.
to make real-time decisions and data center providers to guarantee [25] Marcel van den Berg. 2018. Bank’s Data Center Shut Down. http://up2v.nl/2016/
09/12/a-loud-sound-just-shut-down-a-banks-data-center-for-10-hours/
SLAs. Our results show that vibration may have a significant impact [26] Marcel van den Berg. 2018. Datacenter Failure. http://up2v.nl/2018/04/25/
on these factors for SSD devices. Thus, storage system researchers nasdaq-nordic-datacenter-failure-because-of-noise-of-fire-suppression-system/
[27] Dag Wieers. 2018. dstat. https://dag.wiee.rs/home-made/dstat
and practitioners need to pay closer attention to such impacts for [28] Jiaping Yang, Cheng Peng Tan, et al. 2017. An Effective System-Level Vibration
better provisioning and management of SSDs. Our results also in- Prediction Analysis Approach for Data Storage System Chassis. Microsystem
dicate that SSD manufacturers need to devise better strategies to Technologies 23, 8 (2017), 3097–3105.

You might also like