0% found this document useful (0 votes)
14 views

hpc_debug

Uploaded by

Rajul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

hpc_debug

Uploaded by

Rajul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

HPC debugging

Victor Eijkhout

2022

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Profiling and debugging;


optimization and
programming strategies.

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

1 Analysis basics

• Measurements: repeated and controlled


beware of transients, do you know where your data is?
• Document everything
• Script everything

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

2 Compiler options

• Defaults are a starting point


• use reporting options: -opt-report, -vec-report
useful to check if optimization happened / could not happen
• test numerical correctness before/after optimization change
(there are options for numerical corretness)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

3 Optimization basics

• Use libraries when possible: don’t reinvent the wheel


• Premature optimization is the root of all evil (Knuth)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

4 Code design for performance

• Keep inner loops simple: no conditionals, function calls, casts


• Avoid small functions: try macros or inlining
• Keep in mind all the cache,TLB, SIMD stuff from before
• SIMD: Fortran array syntax helps

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

5 Multicore / multithread

• Use numactl: prevent process migration


• ‘first touch’ policy: allocate data where it will be used
• Scaling behaviour mostly influenced by bandwidth

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

6 Multinode performance

• Influenced by load balancing


• Use HPCtoolkit, Scalasca, TAU for plotting
• Explore ‘eager’ limit (mvapich2: environment variables)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

7 Classes of programming errors

Logic errors:
functions behave differently from how you thought,
or interact in ways you didn’t envision

Hard to debug

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

8 More classes of errors

Coding errors:
send without receive
forget to allocate buffer

Debuggers can help

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Defensive programming

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

9 Defensive programming

• Keep It Simple (‘restrict expressivity’)


• Example: use collective instead of spelling it out
• easier to write / harder to get wrong
the library and runtime are likely to be better at optimizing than
you

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

10 Memory management

Beware of memory leaks:


keep allocation and free in same lexical scope

C++ does this automatically with RAII

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

11 Modular design

Design for debuggability, also easier to optimize

Separation of concerns: try to keep code aspects separate

Premature optimization is the root of all evil (Knuth)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

12 MPI performance design

Be aware of latencies: bundle messages


(this may go again separation of concerns)

Consider ‘eager limit’

Process placement, reduction in number of processes

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

13

Debugging is like being the detective in a crime movie


where you are also the murderer. (Filipe Fortes, 2013)

What do you do when your program misbehaves?

• Insert print statements, recompile, run again.


• Run your program in a debugger
• (also: attach a debugger, inspect a core dump)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

14 Simple example: listing

tutorials/gdb/c/hello.c

#include <stdlib.h>
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

15 Simple example: running


%% cc -g -o hello hello.c
# regular invocation:
%% ./hello
hello world
# invocation from gdb:
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... [version info]
Copyright 2004 Free Software Foundation, Inc. .... [copyright info
(gdb) run
Starting program: /home/eijkhout/tutorials/gdb/hello
Reading symbols for shared libraries +. done
hello world

Program exited normally.


(gdb) quit
%%
Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

16 Source listing

%% cc -o hello hello.c
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... version info
(gdb) list

Important to use the -g compile option!

Eijkhout: programming
Defensive programming
Debugging

17 Run with arguments


Memory debugging
Parallel Debugging

tutorials/gdb/c/say.c

#include <stdlib.h>
#include <stdio.h>
int main(int argc,char **argv) {
int i;
for (i=0; i<atoi(argv[1]); i++)
printf("hello world\n");
return 0;
}

%% gdb say
.... the usual messages ...
(gdb) run 2
Starting program: /home/eijkhout/tutorials/gdb/c/say 2
Reading symbols for shared libraries +. done
hello world
hello world
Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging
18 Memory problems 1
// square.c
int nmax,i;
float *squares,sum;

fscanf(stdin,"%d",nmax);
for (i=1; i<=nmax; i++) {
squares[i] = 1./(i*i); sum += squares[i];
}
printf("Sum: %e\n",sum);

%% cc -g -o square square.c
%% ./square
5000
Segmentation fault

The debugger will stop at the problem.


Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

19 Stack trace

Displaying a stack trace


gdb lldb
(gdb) where (lldb) thread backtrace

(gdb) backtrace
#0 0x00007fff824295ca in __svfscanf_l ()
#1 0x00007fff8244011b in fscanf ()
#2 0x0000000100000e89 in main (argc=1, argv=0x7fff5fbfc7c0) at sq

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

20 Inspecting a stack frame

Investigate a specific frame


gdb clang
frame 2 frame select 2

Then print variables and such.

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

21 Out-of-bounds errors

// up.c
int nlocal = 100,i;
double s, *array = (double*) malloc(nlocal*sizeof(double));
for (i=0; i<nlocal; i++) {
double di = (double)i;
array[i] = 1/(di*di);
}
s = 0.;
for (i=nlocal-1; i>=0; i++) {
double di = (double)i;
s += array[i];
}

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

22 Out of bounds in debugger

Program received signal EXC_BAD_ACCESS, Could not access memo


Reason: KERN_INVALID_ADDRESS at address: 0x0000000100200000
0x0000000100000f43 in main (argc=1, argv=0x7fff5fbfe2c0) at u
15 s += array[i];
(gdb) print array
$1 = (double *) 0x100104d00
(gdb) print i
$2 = 128608

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

23 Breakpoints

Set a breakpoint at a line


gdb lldb
break foo.c:12 breakpoint set [ -f foo.c ] -l 12

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

24 Stepping

Stepping through a program


gdb lldb meaning
run start a run
cont continue from breakpoint
next next statement on same level
step next statement, this level or next

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Memory debugging

Eijkhout: programming
Defensive programming
Debugging
25 Program with problems
Memory debugging
Parallel Debugging

tutorials/gdb/c/square1.c

#include <stdlib.h>
#include <stdio.h>
//codesnippet gdbsquare1c
int main(int argc,char **argv) {
int nmax,i;
float *squares,sum;

fscanf(stdin,"%d",&nmax);
squares = (float*) malloc(nmax*sizeof(float));
for (i=1; i<=nmax; i++) {
squares[i] = 1./(i*i);
sum += squares[i];
}
printf("Sum: %e\n",sum);
//codesnippet end

return 0;
}
Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

26 Valgrind output

%% valgrind square1
==53695== Memcheck, a memory error detector
==53695== [stuff]
10
==53695== Invalid write of size 4
==53695== at 0x100000EB0: main (square1.c:10)
==53695== Address 0x10027e148 is 0 bytes after a block of si
==53695== at 0x1000101EF: malloc (vg_replace_malloc.c:236)
==53695== by 0x100000E77: main (square1.c:8)
==53695==

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Parallel Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

27 Debugging

I assume you know about gdb and valgrind. . .

• Interactive use of gdb, starting up multiple xterms


feasible on small scale
• Use gdb to inspect dump:
can be useful, often a program crashes hard and leaves no dump

Note: compile options -g -O0

Eijkhout: programming
Defensive programming
Debugging
28 Parallel debuggers
Memory debugging
Parallel Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

29 Buggy code

for (it=0; ; it++) {


double randomnumber = ntids * ( rand() / (double)RAND_MAX )
printf("[%d] iteration %d, random %e\n",mytid,it,randomnumb
if (randomnumber>mytid && randomnumber<mytid+1./(ntids+1))
MPI_Finalize();
MPI_Barrier(comm);
}

Eijkhout: programming
Defensive programming
Debugging
30 Parallel inspection
Memory debugging
Parallel Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

31 Stack trace

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

32 Variable inspection

Eijkhout: programming

You might also like