0% found this document useful (0 votes)
105 views

LinuxKI 4.1 CURSO

1. Docker containers may introduce additional overhead from containerization that impacts performance. 2. Filesystem access across container boundaries could be slower than accessing local storage directly. 3. Resource contention from running multiple Oracle instances on the same host may degrade performance more significantly in Docker containers.

Uploaded by

Xavi Milan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

LinuxKI 4.1 CURSO

1. Docker containers may introduce additional overhead from containerization that impacts performance. 2. Filesystem access across container boundaries could be slower than accessing local storage directly. 3. Resource contention from running multiple Oracle instances on the same host may degrade performance more significantly in Docker containers.

Uploaded by

Xavi Milan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

The LinuxKI Toolset 4.

1
LinuxKI is ALIVE!!!
Mark C. Ray, Global Solutions Engineering

March 8th, 2016


Typical Performance Troubleshooting/Tuning Methodology

– Infer root cause based on system – Check Google, Bugzilla – Sometimes we get lucky!
level statistics
– Read the whitepapers – Often we don’t
– Look for top CPU users, high Disk
I/O, Network activity, Memory – Apply best practices – Customers frustrated by “try this, try
Usage that approach
– Apply popular tunables
– Misses a whole class of problems – You waste time
– Upgrade patches, firmware, OS
– You lose credibility
– Shotgun changes
– Customer loses $$$
– Guess!
– Resulting in lost business for HPE

2
The need

– A new approach for demanding mission critical customers and the new style of IT
– Take out the guess work!
– Application-centric, systematic approach to performance tuning and troubleshooting
– Make troubleshooting and tuning less complex for customers and internal HPE
engineers
– Reduce time-to resolve critical performance issue
– Effectively tune customer, PoC and benchmark systems in order to compete in the
market

With a worldwide capability to quickly and reliably solve performance problems we


would win more deals and have happier customers

3
What is the LinuxKI Toolset?

Linux kernel tracing tool designed to answer


the following questions:

If it’s running, what is it doing?

If it’s waiting, what is it waiting for?

4
LinuxKI Toolset
How does it work?
ki_all.<hostname>.<timestamp>.tgz
kernel KI dump
static
LinuxKI
LiKI ftp reports
kiinfo -likidump 2 kiinfo -kiall

1 runki 3 kiall
per-cpu
trace Online analysis kiinfo -kipid -a <secs>
real-time
buffers kiinfo -kidsk -a <secs> LinuxKI
kiinfo -kitrace pid=63721 -a <secs> reports

Tracepoints collect Trace data is For KI dumps, kiinfo program reads the
data from streamed to disk configuration data binary trace data and
performance- (KI dump) or can and trace data is generates the Linux KI
sensitive parts of the be analyzed in ftp’ed to HPE reports
kernel real time

5
6
7
LinuxKI Toolset
A Brief History
– 1.0 - Initial Release 03/28/2013
– Runki data collection and kiall post processing
– Support for RHEL/SLES
– 2.0 - 10/11/2013
– Expanded Linux distribution support (included Ubuntu, CentOS, OEL, etc.)
– Single rpm or deb package
– No password for HP physical servers
– IRQ reporting, SCSI trace records
– Online Analysis with kiinfo reporting
– Generate optional CSV formatted files

8
LinuxKI Toolset
A Brief History
– 3.0 - 03/12/2014
– Cluster reporting enhancements (clparse, CMU integration)
– Visualization charts and graphs (JSON files)
– Monthly passwords for non-HP physical servers
– User stack traces and symbol table lookups
– io_submit/io_getevents syscall enhancements
– IP:Port addresses on Network system calls
– Included LiKI DLKM source for building DKLM on wide variety of systems
– 4.0 - 12/07/2015
– PID/TGID/CPU/DEV filtering during data collection (LiKI DLKM)
– Stealtime reporting for VM Guests (KVM only for now)
– Reporting of inode/dev for filesystem system calls
– System call filtering

9
LinuxKI Toolset version

– 4.1 - 03/04/2016
– Curses based user interface
– New PID timeline and Task Scheduler timeline visualization
– Support through 4.2.0-16 Linux kernels
– Reduced memory usage and memory leaks
– Add Java thread names in KI reports when jstack output is collected
– runki script no longer collects perf data, sar data, or collectl/MW logs by default (use -M -U -X)
– Identify UDP and IPv6 socket related system calls
– Bug fixes, bug fixes, and more bug fixes
– ARM support???

10
Where to use Linux KI?
If it runs Linux...
– Supported on RHEL, SLES, Ubuntu, Fedora, Centos, OEL, UEK, hlinux, kylin, l4tm
– Mission Critical systems - Oracle, DB2, SAP HANA, Java, banking, telecom, manufacturing,
web servers
– New Style of IT
– Virtualization - KVM, Vmware
– Cloud - Helion/Openstack
– Big Data - Hadoop, Vertica, Cassandra
– High Performance Computing
– Emerging Technologies
– Dockers
– NVDIMM
– The Machine

11
LinuxKI is ALIVE!!!

12
Live Demo #1
Online Analysis

13
LinuxKI Toolset
Curses Based User Interface
kernel

LiKI Live analysis

kiinfo -live -a 5
per-cpu
trace
buffers

14
Demo #1

15
Live Demo #2
Cassandra / Java

16
LinuxKI Toolset
Curses Based User Interface
ki_all.<hostname>.<timestamp>.tgz
kernel KI dump
static
LinuxKI
LiKI ftp reports
kiinfo -likidump 2 kiinfo -kiall

1 runki 3 kiall
per-cpu
kiinfo -live -ts 0214_0123
trace
buffers

17
Demo #2 - Cassandra/Java
Using jstack to get Java thread names
– Collect LinuxKI dump and include Java jstack output
$ runki -j [-J path_to_jstack]
– Primary purpose of demo is to show java thread names

18
Demo #2

19
Case Study #3
Dockers Part 1

20
Demo #3 - Dockers
Poor performing Docker containers
– HPE Internal Dockers testing
– SuperdomeX (8 blade / 16 sockets)
– RHEL 7.1
– One Oracle Instance per Docker container per SDx blade
– 2 of the 8 Docker containers showed very bad Oracle read times:

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Tim e (sec) Wait Avg(m s) % DB tim e Wait Class
db file scattered read 126,247 13.5K 107.05 48.0 User I/O
db file sequential read 128,698 13.5K 104.90 48.0 User I/O
enq: TX - row lock contention 3,879 869.1 224.05 3.1 Application
DB CPU 322.3 1.1

21
Demo #3

22
Case Study #3
Dockers
– Conclusion
– There are pages in the file/page cache when Direct I/O is performed
– The pages overlapping the I/O request MUST be kicked out of cached
– Pages were likely brought into cache by non-Oracle programs, such as tar, dd, cp, etc.
– Potential Solutions (choose one)
– Drop all the pages from the page cache
$ echo 1 >/proc/sys/vm/drop_caches
– Unmount and re-mount the filesystem
– Install RHEL 7.2

23
Case Study #4
Dockers Part 2

24
Demo #4 - Dockers
Poor Dockers performance compared to non-Docker environment
– HPE Internal Dockers testing
– SuperdomeX (8 blade / 16 sockets)
– RHEL 7.1
– One Oracle Instance per Docker container per SDx blade
– Performance is far less when using Dockers compared to non-Docker environment

25
Demo #4

26
Case Study #4
Dockers
– Conclusion
– Hard interrupts (hardirq) for network cards spread out across all CPUs in node 0
– Soft interrupts occurred on all CPUs in node 0, BUT wakeups forwarded to CPU 244
– Soft interrupts saturate CPU 244
– Since ALL Docker containers share same network card, performance of all Dockers are impacted
– Potential Solutions (choose one)
– Change from docker network model from -net=bridge to -net=host
– Implement Receive Packet Steering (RPS) and direct soft interrupts to all CPUs in node 0
– Configure one network card on appropriate blade for each Docker Container

27
Visualization Enhancements
How-to and getting started guide

28
Visualization Charts
Getting started - kiall -V When KI data is processed using “kiall –V” or
“kiall –r –V” the kparse report
(kp.<timestamp>.html) will have PID numbers
linked to the PID specific visualizations.
Other stand-alone charts are also created.

Section 8.3 will have links to the other


system-wide Visualization charts.

A good starting point for system-wide


overview is the Server Activity Timeline chart

29
Kparse (kp.<ts>.html) PID links

Although not shown


here, below these
charts, you’ll find the
usual kipid text report.

30
System-wide Activity Timeline

Choose encoding metrics (44+) for color and height


Right click/drag on upper timeline to zoom/span.
Right click on lower timeline for popup with detail
and text report links

31
PID Activity Timeline
Right click/drag on upper timeline to zoom/span.
Right click on lower timeline for popup with detail
Choose encoding of color and height and text report links

Interval scheduling timeline link


32
Task Scheduling Timeline

Right click and drag to zoom/pan

33
Task Scheduling Timeline

Right click on expanded portion for context popup


with record details and links for text file creation

34
Other chart remain unchanged

Network Chord chart Disk and Futex scatter charts Generic CSV viewer charts
network.html kidsk_scatter.html kidsk.html
futex_scatter.html kifile.html
kippid_io.html & kipid_sched.html
kirunq.html
kiwait.html
35
The future of LinuxKI
“To Infinity, and Beyond” - Buzz Lightyear - Space Ranger

36
The future of LinuxKI
Planned features
–Include CPU counters in sched_switch records
– Calculate Last Level Cache (LLC) hit rate on a per-PID and per-CPU basis
– Calculate Cycles Per Instruction (CPI) on a per-PID and per-CPU basis
– Determine Turbo Boost ratio
– Other metrics are under investigation
–Docker metrics
– Show CPU usage per Docker container
– Top tasks using CPU on each Docker
–Logical I/O by device/volume

37
The future of LinuxKI

–Analyze new workloads


– NVDIMM
– High Performance Computing
– The Machine
– Software Defined Storage
–New uses for LinuxKI
– Data mining / Machine learning
– Integrate LinuxKI with application tracing

38
Linux KI Toolset
For more information
–Linux KI Masterclass
http://intranet.hp.com/tsg/WW2/CPT/Linux/LinuxKIMasterClass/LinuxKIMasterclass.aspx
–Technical Seminars
http://intranet.hp.com/tsg/WW2/CPT/Linux/Pages/Training.aspx
– The Linux KI Toolset - Redefining Performance Analysis on Linux (May 2013)
– Linux KI Toolset v2.0 (Dec 2013)
– Linux KI Toolset v3.0 (March 2015)
–Performance Articles
http://intranet.hp.com/tsg/WW2/CPT/Linux/Pages/PerformanceArticles.aspx
– Poor Direct I/O read performance using XFS
– ksoftirqd using 100% CPU on RHEL 7.1 and 7.2
– Poor Oracle LogWriter performance using XFS
– Understanding EMC ScaleIO Architecture using LinuxKI
Thank You!

40
Backup Slides
“In case of fire, break glass”

41
Demo #1

42
g - Global Task List

43
? - Help

44
l - Global Node Stats

45
c - Global CPU Stats

46
C - Select CPU Stats

47
h - Global HT CPU Stats

48
i - Global IRQ Stats

49
f - Global File Stats

50
n - Global Socket Stats

51
u - Global Futex Stats

52
X - Select Futex Stats

53
d - Global Disk Stats

54
T - Select Disk Stats

55
s - Select Task

56
P - Task Profile Stats

57
W - Task Wait Stats

58
O - Task Coop Stats

59
L - Task System Calls

60
F - Task File Stats

61
Demo #2

62
g - Global Task List

63
s - Select Task

64
O - Task Coop Stats

65
Demo #3

66
g - Global Task List

Will discuss this in the next demo

67
t - Global IO by PID

Read performances is bad, so let’s


pick one and hope we get lucky.

68
s - Select Task

Looks like heavy mutex contention

I/O service times look good!

Logical I/O looks bad!

69
F - Task File Stats

Here is the file with mutex contetion

70
W - Task Wait Stats

Need to check source to see


why there is lock contention in
xfs_file_aio_read()

71
If Direct I/O is used and there are
pages in the file/page cache, then
the shared lock is dropped and
the exclusive lock is obtained.

72
Demo #4

73
g - Global Task List

ksoftirqd/244 is pretty busy

74
l - Global Node Stats

NUMA node 0 is doing the most interrupt processing

75
s - Select Node Stats

CPU 244 is saturated

76
C - Select CPU Stats

2.042 msecs per softirq

77
C - Select CPU Stats

0.011 msecs per softirq

78

You might also like