LinuxKI 4.1 CURSO
LinuxKI 4.1 CURSO
1
LinuxKI is ALIVE!!!
Mark C. Ray, Global Solutions Engineering
– Infer root cause based on system – Check Google, Bugzilla – Sometimes we get lucky!
level statistics
– Read the whitepapers – Often we don’t
– Look for top CPU users, high Disk
I/O, Network activity, Memory – Apply best practices – Customers frustrated by “try this, try
Usage that approach
– Apply popular tunables
– Misses a whole class of problems – You waste time
– Upgrade patches, firmware, OS
– You lose credibility
– Shotgun changes
– Customer loses $$$
– Guess!
– Resulting in lost business for HPE
2
The need
– A new approach for demanding mission critical customers and the new style of IT
– Take out the guess work!
– Application-centric, systematic approach to performance tuning and troubleshooting
– Make troubleshooting and tuning less complex for customers and internal HPE
engineers
– Reduce time-to resolve critical performance issue
– Effectively tune customer, PoC and benchmark systems in order to compete in the
market
3
What is the LinuxKI Toolset?
4
LinuxKI Toolset
How does it work?
ki_all.<hostname>.<timestamp>.tgz
kernel KI dump
static
LinuxKI
LiKI ftp reports
kiinfo -likidump 2 kiinfo -kiall
1 runki 3 kiall
per-cpu
trace Online analysis kiinfo -kipid -a <secs>
real-time
buffers kiinfo -kidsk -a <secs> LinuxKI
kiinfo -kitrace pid=63721 -a <secs> reports
Tracepoints collect Trace data is For KI dumps, kiinfo program reads the
data from streamed to disk configuration data binary trace data and
performance- (KI dump) or can and trace data is generates the Linux KI
sensitive parts of the be analyzed in ftp’ed to HPE reports
kernel real time
5
6
7
LinuxKI Toolset
A Brief History
– 1.0 - Initial Release 03/28/2013
– Runki data collection and kiall post processing
– Support for RHEL/SLES
– 2.0 - 10/11/2013
– Expanded Linux distribution support (included Ubuntu, CentOS, OEL, etc.)
– Single rpm or deb package
– No password for HP physical servers
– IRQ reporting, SCSI trace records
– Online Analysis with kiinfo reporting
– Generate optional CSV formatted files
8
LinuxKI Toolset
A Brief History
– 3.0 - 03/12/2014
– Cluster reporting enhancements (clparse, CMU integration)
– Visualization charts and graphs (JSON files)
– Monthly passwords for non-HP physical servers
– User stack traces and symbol table lookups
– io_submit/io_getevents syscall enhancements
– IP:Port addresses on Network system calls
– Included LiKI DLKM source for building DKLM on wide variety of systems
– 4.0 - 12/07/2015
– PID/TGID/CPU/DEV filtering during data collection (LiKI DLKM)
– Stealtime reporting for VM Guests (KVM only for now)
– Reporting of inode/dev for filesystem system calls
– System call filtering
9
LinuxKI Toolset version
– 4.1 - 03/04/2016
– Curses based user interface
– New PID timeline and Task Scheduler timeline visualization
– Support through 4.2.0-16 Linux kernels
– Reduced memory usage and memory leaks
– Add Java thread names in KI reports when jstack output is collected
– runki script no longer collects perf data, sar data, or collectl/MW logs by default (use -M -U -X)
– Identify UDP and IPv6 socket related system calls
– Bug fixes, bug fixes, and more bug fixes
– ARM support???
10
Where to use Linux KI?
If it runs Linux...
– Supported on RHEL, SLES, Ubuntu, Fedora, Centos, OEL, UEK, hlinux, kylin, l4tm
– Mission Critical systems - Oracle, DB2, SAP HANA, Java, banking, telecom, manufacturing,
web servers
– New Style of IT
– Virtualization - KVM, Vmware
– Cloud - Helion/Openstack
– Big Data - Hadoop, Vertica, Cassandra
– High Performance Computing
– Emerging Technologies
– Dockers
– NVDIMM
– The Machine
11
LinuxKI is ALIVE!!!
12
Live Demo #1
Online Analysis
13
LinuxKI Toolset
Curses Based User Interface
kernel
kiinfo -live -a 5
per-cpu
trace
buffers
14
Demo #1
15
Live Demo #2
Cassandra / Java
16
LinuxKI Toolset
Curses Based User Interface
ki_all.<hostname>.<timestamp>.tgz
kernel KI dump
static
LinuxKI
LiKI ftp reports
kiinfo -likidump 2 kiinfo -kiall
1 runki 3 kiall
per-cpu
kiinfo -live -ts 0214_0123
trace
buffers
17
Demo #2 - Cassandra/Java
Using jstack to get Java thread names
– Collect LinuxKI dump and include Java jstack output
$ runki -j [-J path_to_jstack]
– Primary purpose of demo is to show java thread names
18
Demo #2
19
Case Study #3
Dockers Part 1
20
Demo #3 - Dockers
Poor performing Docker containers
– HPE Internal Dockers testing
– SuperdomeX (8 blade / 16 sockets)
– RHEL 7.1
– One Oracle Instance per Docker container per SDx blade
– 2 of the 8 Docker containers showed very bad Oracle read times:
Event Waits Total Wait Tim e (sec) Wait Avg(m s) % DB tim e Wait Class
db file scattered read 126,247 13.5K 107.05 48.0 User I/O
db file sequential read 128,698 13.5K 104.90 48.0 User I/O
enq: TX - row lock contention 3,879 869.1 224.05 3.1 Application
DB CPU 322.3 1.1
21
Demo #3
22
Case Study #3
Dockers
– Conclusion
– There are pages in the file/page cache when Direct I/O is performed
– The pages overlapping the I/O request MUST be kicked out of cached
– Pages were likely brought into cache by non-Oracle programs, such as tar, dd, cp, etc.
– Potential Solutions (choose one)
– Drop all the pages from the page cache
$ echo 1 >/proc/sys/vm/drop_caches
– Unmount and re-mount the filesystem
– Install RHEL 7.2
23
Case Study #4
Dockers Part 2
24
Demo #4 - Dockers
Poor Dockers performance compared to non-Docker environment
– HPE Internal Dockers testing
– SuperdomeX (8 blade / 16 sockets)
– RHEL 7.1
– One Oracle Instance per Docker container per SDx blade
– Performance is far less when using Dockers compared to non-Docker environment
25
Demo #4
26
Case Study #4
Dockers
– Conclusion
– Hard interrupts (hardirq) for network cards spread out across all CPUs in node 0
– Soft interrupts occurred on all CPUs in node 0, BUT wakeups forwarded to CPU 244
– Soft interrupts saturate CPU 244
– Since ALL Docker containers share same network card, performance of all Dockers are impacted
– Potential Solutions (choose one)
– Change from docker network model from -net=bridge to -net=host
– Implement Receive Packet Steering (RPS) and direct soft interrupts to all CPUs in node 0
– Configure one network card on appropriate blade for each Docker Container
27
Visualization Enhancements
How-to and getting started guide
28
Visualization Charts
Getting started - kiall -V When KI data is processed using “kiall –V” or
“kiall –r –V” the kparse report
(kp.<timestamp>.html) will have PID numbers
linked to the PID specific visualizations.
Other stand-alone charts are also created.
29
Kparse (kp.<ts>.html) PID links
30
System-wide Activity Timeline
31
PID Activity Timeline
Right click/drag on upper timeline to zoom/span.
Right click on lower timeline for popup with detail
Choose encoding of color and height and text report links
33
Task Scheduling Timeline
34
Other chart remain unchanged
Network Chord chart Disk and Futex scatter charts Generic CSV viewer charts
network.html kidsk_scatter.html kidsk.html
futex_scatter.html kifile.html
kippid_io.html & kipid_sched.html
kirunq.html
kiwait.html
35
The future of LinuxKI
“To Infinity, and Beyond” - Buzz Lightyear - Space Ranger
36
The future of LinuxKI
Planned features
–Include CPU counters in sched_switch records
– Calculate Last Level Cache (LLC) hit rate on a per-PID and per-CPU basis
– Calculate Cycles Per Instruction (CPI) on a per-PID and per-CPU basis
– Determine Turbo Boost ratio
– Other metrics are under investigation
–Docker metrics
– Show CPU usage per Docker container
– Top tasks using CPU on each Docker
–Logical I/O by device/volume
37
The future of LinuxKI
38
Linux KI Toolset
For more information
–Linux KI Masterclass
http://intranet.hp.com/tsg/WW2/CPT/Linux/LinuxKIMasterClass/LinuxKIMasterclass.aspx
–Technical Seminars
http://intranet.hp.com/tsg/WW2/CPT/Linux/Pages/Training.aspx
– The Linux KI Toolset - Redefining Performance Analysis on Linux (May 2013)
– Linux KI Toolset v2.0 (Dec 2013)
– Linux KI Toolset v3.0 (March 2015)
–Performance Articles
http://intranet.hp.com/tsg/WW2/CPT/Linux/Pages/PerformanceArticles.aspx
– Poor Direct I/O read performance using XFS
– ksoftirqd using 100% CPU on RHEL 7.1 and 7.2
– Poor Oracle LogWriter performance using XFS
– Understanding EMC ScaleIO Architecture using LinuxKI
Thank You!
40
Backup Slides
“In case of fire, break glass”
41
Demo #1
42
g - Global Task List
43
? - Help
44
l - Global Node Stats
45
c - Global CPU Stats
46
C - Select CPU Stats
47
h - Global HT CPU Stats
48
i - Global IRQ Stats
49
f - Global File Stats
50
n - Global Socket Stats
51
u - Global Futex Stats
52
X - Select Futex Stats
53
d - Global Disk Stats
54
T - Select Disk Stats
55
s - Select Task
56
P - Task Profile Stats
57
W - Task Wait Stats
58
O - Task Coop Stats
59
L - Task System Calls
60
F - Task File Stats
61
Demo #2
62
g - Global Task List
63
s - Select Task
64
O - Task Coop Stats
65
Demo #3
66
g - Global Task List
67
t - Global IO by PID
68
s - Select Task
69
F - Task File Stats
70
W - Task Wait Stats
71
If Direct I/O is used and there are
pages in the file/page cache, then
the shared lock is dropped and
the exclusive lock is obtained.
72
Demo #4
73
g - Global Task List
74
l - Global Node Stats
75
s - Select Node Stats
76
C - Select CPU Stats
77
C - Select CPU Stats
78