Multi Threading
Multi Threading
Rajkumar Buyya
School of Computer Science and Software Engineering Monash Technology Melbourne, Australia Email: [email protected] URL: http://www.dgs.monash.edu.au/~rajkumar
Objectives
Explain the parallel computing right from architecture, OS, programming paradigm, and applications Explain the multithreading paradigm, and all aspects of how to use it in an application Cover all basic MT concepts Explore issues related to MT Contrast Solaris, POSIX, Java threads Look at the APIs in detail Examine some Solaris, POSIX, and Java code examples Debate on: MPP and Cluster Computing 2
Agenda
Overview of Computing Operating Systems Issues Threads Basics Multithreading with Solaris and POSIX threads Multithreading in Java Distributed Computing Grand Challenges Solaris, POSIX, and Java example code
Computing Elements
Applications
P Processor
Thread
Process
Parallel Era
Tablet has 3 calculating positions. Infer that multiple positions: Reliability/ Speed
Motivating Factors
d Just as we learned to fly, not by constructing a machine that flaps its wings like birds, but by applying aerodynamics principles demonstrated by nature...
Motivating Factors
Aggregated speed with which complex calculations carried out by individual neurons response is slow (ms) - demonstrate feasibility of PP
8
Computation Sequential
requirements are ever increasing -- visualization, distributed databases, simulations, scientific prediction (earthquake), etc.. architectures reaching physical limitation (speed of light, thermodynamics)
9
Technical Computing
Solving technology problems using computer modeling, simulation and analysis
Geographic Information Systems Life Sciences Aerospace
10
Multiprocessor
C.P.I.
Uniprocessor
2. . . .
No. of Processors
11
Vertical
Horizontal
Growth
5 10 15 20 25 30
35
40
45 . . . .
Age
12
13
Hardware Vector
improvements like Pipelining, Superscalar, etc., are nonscalable and requires sophisticated Compiler Technology. Processing works well for certain kind of problems.
14
Multiple processes active simultaneously solving a given problem, general multiple processors. Communication and synchronization of its processes (forms the core of parallel programming efforts).
15
16
Processing Elements
Simple classification by Flynn:
(No. of instruction and data streams)
SISD - conventional SIMD - data parallel, vector computing MISD - systolic arrays MIMD - very general, multiple approaches.
Current focus is on MIMD model, using general purpose processors. (No shared memory)
17
Data Input
Processor
Data Output
Speed is limited by the rate at which computer can transfer information internally.
Ex:PC, Macintosh, Workstations
18
Instruction Stream C
Processor
B C
More of an intellectual exercise than a practical configuration. Few built, but commercially not available
19
SIMD Architecture
Instruction Stream
Processor
A
Processor
B C
20
MIMD Architecture
Instruction Instruction Instruction Stream A Stream B Stream C
Processor
A
Processor
B C
Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD
21
M E M B O U R S Y
M E M B O U R S Y
M E M B O U R S Y
22
IPC channel
M E M B O U R S Y
M E M B O U R S Y
M E M B O U R S Y
Memory System A
Memory System B
Memory System C
Communication : IPC on High Speed Network. Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD easily/ readily expandable Highly reliable (any CPU failure does not affect the whole system)
23
Laws of caution.....
(speed = cost2)
S
24
Caution....
Very fast development in PP and related area have blurred concept boundaries, causing lot of terminological confusion : concurrent computing/ programming, parallel computing/ processing, multiprocessing, distributed computing, etc..
25
26
Caution....
27
Caution....
There is no strict delimiters for contributors to the area of parallel processing : CA, OS, HLLs, databases, computer networks, all have a role to play. This makes it a Hot Topic of Research
28
29
Q
Please
30
High Performance Computing function1( ) { //......function stuff } function2( ) { //......function stuff } Parallel Machine : MPP function1( ) || function2 ( ) massively parallel system containing thousands of CPUs Time : max (t1, t2) 31
t1
t2
Serial Machine function1 ( ): function2 ( ): Single CPU Time : add (t1, t2)
32
CPU
33
Multi-threading, continued...
Multi-threaded OS enables parallel, scalable I/O
Application
Application
Application
OS Kernel
CPU
Multiple, independent I/O requests can be satisfied simultaneously because all the major disk, tape, and network drivers have been multithreaded, allowing any given driver to run on multiple CPUs simultaneously.
CPU
CPU
34
STACK
DATA TEXT
DATA TEXT
processes
processes 35
Running
THREAD STACK
Independent executables All threads are parts of a process hence communication easier and simpler.
37
Levels of Parallelism
Task i-l Task i Task i+1
Code-Granularity Code Item Large grain (task level) Program
a ( 0 ) =.. b ( 0 ) =..
a ( 1 )=.. b ( 1 )=..
a ( 2 )=.. b ( 2 )=..
Fine grain (data level) Loop Very fine grain (multiple issue) With hardware
Load
38
39
Multithreading - Uniprocessors
Concurrency Vs Parallelism
Concurrency P1 P2 P3 CPU
time
41
Multithreading Multiprocessors
Concurrency Vs Parallelism
CPU P1 CPU P2 P3 CPU
time
42
Computational Model
User Level Threads Virtual Processors
Physical Processors User-Level Schedule (User) Kernel-Level Schedule (Kernel)
True Parallelism :
threads : processor map = 1:1 43
to
kernel
Process Parallelism
int add (int a, int b, int & result) // function stuff int sub(int a, int b, int & result) Processor // function stuff IS1 add
pthread t1, t2; pthread-create(&t1, add, a,b, & r1); pthread-create(&t2, sub, c,d, & r2); pthread-par (2, t1, t2);
Data a b r1 c d r2
45
Processor
IS2
sub
Data Parallelism
sort( int *array, int count) //...... //......
pthread-t, thread1, thread2; pthread-create(& thread1, sort, array, N/2); pthread-create(& thread2, sort, array, N/2); pthread-par(2, thread1, thread2);
Data
Processor
Sort IS
Processor
do dn/2 dn2/+1 dn
46
Sort
SIMD Processing
Purpose
Creation of a new thread
Process Model
fork ( )
Threads Model
thr_create( )
exec( )
[ thr_create() builds the new thread and starts the execution thr_join()
wait( )
exit( )
thr_exit()
47
Code Comparison
Segment (Process) Segment(Thread) main ( ) { fork ( ); fork ( ); fork ( ); } main() { thread_create(0,0,func(),0,0); thread_create(0,0,func(),0,0); thread_create(0,0,func(),0,0); }
48
Printing Thread
Editing Thread
49
Independent Threads
printing() { - - - - - - - - - - - } editing() { - - - - - - - - - - - } main() { - - - - - - - - - - - id1 = thread_create(printing); id2 = thread_create(editing); thread_run(id1, id2); - - - - - - - - - - - }
50
buff[0] buff[1]
51
RPC Call
Client Server
52
Multithreaded Server
Client Process Server Process
Server Threads
Client Process
53
Multithreaded Compiler
Source Code
Preprocessor Thread
Compiler Thread
Object Code
54
55
Program
Workers
taskX
Resources
Files Databases
Boss
taskY
Input (Stream)
main ( )
Disks
taskZ
Special Devices
56
Example
main() /* the boss */ { forever { get a request; switch( request ) case X: pthread_create(....,taskX); case X: pthread_create(....,taskX); .... } } taskX() /* worker */ { perform the task, sync if accessing shared resources } taskY() /* worker */ { perform the task, sync if accessing shared resources } .... --Above runtime overhead of creating thread can be solved by thread pool * the boss thread creates all worker thread at program initialization and each worker thread suspends itself immediately for a wakeup call from boss
57
Program
Workers
taskX
Resources
Files Databases
Input
(static)
taskY
Disks
taskZ
Special Devices
58
Example
main() { pthread_create(....,thread1...task1); pthread_create(....,thread2...task2); .... signal all workers to start wait for all workers to finish do any cleanup } } task1() /* worker */ { wait for start perform the task, sync if accessing shared resources } task2() /* worker */ { wait for start perform the task, sync if accessing shared resources }
59
A thread pipeline
Program
Input (Stream)
Filter Threads
Stage 1 Stage 2 Stage 3
Resources
Special Devices
Special Devices
Special Devices
60
Example
main() { pthread_create(....,stage1); pthread_create(....,stage2); .... wait for all pipeline threads to finish do any cleanup } stage1() { get next input for the program do stage 1 processing of the input pass result to next thread in pipeline } stage2(){ get input from previous thread in pipeline do stage 2 processing of the input pass result to next thread in pipeline } stageN() { get input from previous thread in pipeline do stage N processing of the input pass result to program output. }
61
X A B
= C
C[1,1] = A[1,1]*B[1,1]+A[1,2]*B[2,1].. . C[m,n]=sum of product of corresponding elements in row of A and column of B. Each resultant element can be computed independently. 62
63
Multithreaded Server...
void main( int argc, char *argv[] ) { int server_socket, client_socket, clilen; struct sockaddr_in serv_addr, cli_addr; int one, port_id; #ifdef _POSIX_THREADS pthread_t service_thr; #endif port_id = 4000; /* default port_id */ if( (server_socket = socket( AF_INET, SOCK_STREAM, 0 )) < 0 ) { printf("Error: Unable to open socket in parmon server.\n"); exit( 1 ); } memset( (char*) &serv_addr, 0, sizeof(serv_addr)); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); serv_addr.sin_port = htons( port_id ); setsockopt(server_socket, SOL_SOCKET, SO_REUSEADDR, (char *)&one, sizeof(one));
64
Multithreaded Server...
if( bind( server_socket, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0 ) { printf( "Error: Unable to bind socket in parmon server->%d\n",errno ); exit( 1 ); } listen( server_socket, 5); while( 1 ) { clilen = sizeof(cli_addr); client_socket = accept( server_socket, (struct sockaddr *)&serv_addr, &clilen ); if( client_socket < 0 ) { printf( "connection to client failed in server.\n" ); continue; } #ifdef POSIX_THREADS pthread_create( &service_thr, NULL, service_dispatch, client_socket); #else thr_create( NULL, 0, service_dispatch, client_socket, THR_DETACHED, &service_thr); #endif } }
65
Multithreaded Server
// Service function -- Thread Funtion void *service_dispatch(int client_socket) { Get USER Request if( readline( client_socket, command, 100 ) > 0 ) { IDENTI|FY USER REQUEST .Do NECESSARY Processing ..Send Results to Server } CLOSE Connect and Terminate THREAD close( client_socket ); #ifdef POSIX_THREADS pthread_exit( (void *)0); #endif }
66
The Value of MT
Program structure Parallelism Throughput Responsiveness System resource usage Distributed objects Single source across platforms (POSIX) Single binary for any number of CPUs
67
68
If all operations are CPU intensive do not go far on multithreading Thread creation is very cheap, it is not free
thread that has only five lines of code would not be useful
69
DOS Code
Hardware
70
Multitasking OSs
Process User Space Process Structure Kernel Space UNIX
Hardware
Multitasking Systems
Processes P1 P2 P3 P4
Multithreaded Process
T1s SP T3sPC T1sPC T2sPC T1s SP T2s SP User Code Global Data Process Structure
The Kernel
Kernel Structures
Traditional UNIX Process Structure
Process ID UID GID EUID EGID CWD.
File Descriptors
File Descriptors
74
M:1 HP-UNIX
M:M
2-level
75
Proc 2
Proc 3
Proc 4
Proc 5
T2 main() { ... pthread_create( func, arg); ... } void * func() { .... } POSIX main() { thr_create( ..func..,arg..); ... }
Solaris
77
T2 main() { ... pthread_join(T2); ... } void * func() { .... } POSIX main() { thr_join( T2,&val_ptr); ... }
Solaris
78
Stop
RUNNABLE
Continue Preempt Stop
Wakeup
STOPPED
SLEEPING
Stop
ACTIVE
Sleep
79
Preemption
The process of rudely interrupting a thread and forcing it to relinquish its LWP (or CPU) to another. CPU2 cannot change CPU3s registers directly. It can only issue a hardware interrupt to CPU3. It is up to CPU3s interrupt handler to look at CPU2s request and decide what to do. Higher priority threads always preempt lower priority threads. Preemption ! = Time slicing All of the libraries are preemptive 80
Cancellation
Cancellation is the means by which a thread can tell another thread that it should exit.
(pthread exit)
T1
(pthread cancel()
POSIX
OS/2
T2 Windows NT
main() main() main() {... {... {... pthread_cancel (T1); DosKillThread(T1); TerminateThread(T1) } } } There is no special relation between the killer of a thread and the victim. (UI threads must roll their own using signals)
82
Type
PTHREAD_CANCEL_ASYNCHRONOUS (any time what-so-ever) (not generally used) PTHREAD_CANCEL_DEFERRED (Only at cancellation points)
(Only POSIX has state and type) (OS/2 is effectively always enabled asynchronous) (NT is effectively always enabled asynchronous) 83
84
Returning Status
POSIX and UI
A detached thread cannot be joined. It cannot return status. An undetached thread must be joined, and can return a status.
OS/2
Any thread can be waited for No thread can return status No thread needs to be waited for.
NT
No threads can be waited for Any thread can return status
85
Suspending a Thread
T1
suspend() continue()
86
Be Careful
87
88
Synchronization
Websters: To represent or arrange events to indicate coincidence or coexistence. Lewis : To arrange events so that they occur in a specified order. * Serialized access to controlled resources. Synchronization is not just an MP issue. It is not even strictly an MT issue! 89
Threads Synchronization :
On
shared memory : shared variables semaphores On distributed memory : within a task : semaphores Across the tasks : By passing messages
90
Your->BankBalance+= deposit;
91
Atomic Actions
An action which must be started and completed with no possibility of interruption. A machine instruction could need to be atomic. (not all are!) A line of C code could need to be atomic. (not all are) An entire database transaction could need to be atomic. All MP machines provide at least one complex atomic instruction, from which you can build anything. A section of code which you have forced to be atomic is a Critical Section. 92
T2
writer() { - - - - - - - - - lock(DISK); .............. .............. unlock(DISK); - - - - - - - - - }
Shared Data 93
T2
writer() { - - - - - - - - - .............. .............. - - - - - - - - - }
Shared Data 94
95
Mutexes
Thread 1
item = create_and_fill_item(); mutex_lock( &m ); item->next = list; list = item; mutex_unlock(&m);
Thread2
mutex_lock( &m ); this_item = list; list = list_next; mutex_unlock(&m); .....func(this-item);
POSIX and UI : Owner not recorded, block in priority order. OS/2 and NT. Owner recorded, block in FIFO order. 96
Synchronization Variable
Process 2 S
Thread
97
Synchronization Problems
98
Deadlocks
Thread 1
lock( M1 ); lock( M2 );
Thread 2
lock( M2 ); lock( M1 );
Thread1 is waiting for the resource(M2) locked by Thread2 and Thread2 is waiting for the resource (M1) locked by Thread1 99
Avoiding Deadlocks
Establish a hierarchy : Always lock Mutex_1 before Mutex_2, etc..,. Use the trylock primitives if you must violate the hierarchy.
{ while (1) { pthread_mutex_lock (&m2); if( EBUSY |= pthread mutex_trylock (&m1)) break; else { pthread _mutex_unlock (&m1); wait_around_or_do_something_else(); } } do_real work(); /* Got `em both! */ }
Use lockllint or some similar static analysis program to scan your code for hierarchy violations.
100
Race Conditions
A race condition is where the results of a program are different depending upon the timing of the events within the program. Some race conditions result in different answers and are clearly bugs.
Thread 1 mutex_lock (&m) v = v - 1; mutex_unlock (&m) Thread 2 mutex_lock (&m) v = v * 2; mutex_unlock (&m)
--> if v = 1, the result can be 0 or 1based on which thread gets chance to enter CR first
101
102
Library Goals
Make it fast! Make it MT safe! Retain UNIX semantics!
103
104
ERRNO
In UNIX, the distinguished variable errno is used to hold the error code for any system calls that fail. Clearly, should two threads both be issuing system calls around the same time, it would not be possible to figure out which one set the value for errno. Therefore errno is defined in the header file to be a call to thread-specific data. This is done only when the flag_REENTRANT (UI) _POSIX_C_SOURCE=199506L (POSIX) is passed to the compiler, allowing older, non-MT programs to continue to run. There is the potential for problems if you use some libraries which are not reentrant. (This is often a problem when using third party libraries.)
105
106
107
The APIs
108
Base Primitives Scheduling Classes Local/ Global Mutexes Simple Counting Semaphores Simple R/W Locks Simple Buildable Condition Variables Simple Multiple-Object Buildable Synchronization Thread Suspension Yes Cancellation Buildable Thread-Specific Data Yes Signal-Handling Primitives Yes Compiler Changes Required No Vendor Libraries MT-safe? Moat ISV Libraries MT-safe? Some
109
suspend
exit key creation semaphore vars priorities sigmask create thread specific data concurrency setting mutex vars kill reader/ writer vars condition vars
daemon threads
110
111
Attribute Objects
UI, OS/2, and NT all use flags and direct arguments to indicate what the special details of the objects being created should be. POSIX requires the use of Attribute objects: thr_create(NULL, NULL, foo, NULL, THR_DETACHED); Vs: pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_DETACHED); pthread_create(NULL, &attr, foo, NULL);
112
Attribute Objects
Although a bit of pain in the *** compared to passing all the arguments directly, attribute objects allow the designers of the threads library more latitude to add functionality without changing the old interfaces. (If they decide they really want to, say, pass the signal mask at creation time, they just add a function pthread_attr_set_signal_mask() instead of adding a new argument to pthread_create().) There are attribute objects for: Threads stack size, stack base, scheduling policy, scheduling class, scheduling scope, scheduling inheritance, detach state. Mutexes Cross process, priority inheritance Condition Variables Cross process
113
Attribute Objects
Attribute objects must be: Allocated Initialized Values set (presumably) Used Destroyed (if they are to be freed) pthread_attr_t attr; pthread_attr_init (&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) pthread_create(NULL, &attr, foo, NULL); pthread_attr_destroy (&attr);
114
115
116
117
118
119
120
121
122
123
124
int sema_wait int sema_post int sema_trywait int sema_destroy int sem_init count) int sem_post int sem_trywait int sem_destroy
(POSIX semaphores are not part of pthread. Use the libposix4.so and posix4.h)
125
126
127
Cancellation (POSIX)
int pthread_cancel int pthread cleanup_pop int pthread_cleanup_push (pthread_thread_t thread) (int execute) (void (*funtion) (void *), void *arg) (int state, int *old_state) (void) 128
Other APIs
thr_self(void) thr_yield() int pthread_atfork (void (*prepare) (void), void (*parent) (void), void (*child) (void) pthread_equal (pthread_thread_t tl, pthread_thread_t t2) pthread_once (pthread_once_t *once_control, void (*init_routine) (void)) pthread_self (void) pthread_yield()
(Thread IDs in Solaris recycle every 2^32 threads, or about once a month if you do create/exit as fast as possible.)
129
Compiling
130
Solaris Libraries
Solaris has three libraries: libthread.so, libpthread.so, libposix4.so Corresponding new include files: synch.h, thread.h, pthread.h, posix4.h Bundled with all O/S releases Running an MT program requires no extra effort Compiling an MT program requires only a compiler (any compiler!) Writing an MT program requires only a compiler (but a few MT tools will come in very handy)
131
All MT-safe libraries should be compiled using the _REENTRANT flag, even though they may be used single in a threaded program.
132
All MT-safe libraries should be compiled using the _POSIX_C_SOURCE=199506L flag, even though they may be used single in a threaded program
133
134
Summary
Threads provide a more natural programming paradigm Improve efficiency on uniprocessor systems Allows to take full advantage of multiprocessor Hardware Improve Throughput: simple to implement asynchronous I/O Leverage special features of the OS Many applications are already multithreaded MT is not a silver bullet for all programming problems. Threre is already standard for multithreading--POSIX Multithreading support already available in the form of language syntax--Java Threads allows to model the real world object (ex: in Java)
135
Java
Multithreading in Java
136
Java - An Introduction
Java - The new programming language from Sun Microsystems Java -Allows anyone to publish a web page with Java code in it Java - CPU Independent language Created for consumer electronics Java - James , Arthur Van , and others Java -The name that survived a patent search Oak -The predecessor of Java Java is C++ -- ++ 137
138
139
140
141
Threads
Java has built in thread support for Multithreading Synchronization Thread Scheduling Inter-Thread Communication: currentThread start setPriority yield run getPriority sleep stop suspend resume Java Garbage Collector is a low-priority thread
142
new MyThread();
143
144
name);
145
146
147
148
149
Thread Priority...
// HiLoPri.java class Clicker implements Runnable { int click = 0; private Thread t; private boolean running = true; public Clicker(int p) { t = new Thread(this); t.setPriority(p); } public void run() { while(running) click++; } public void start() { t.start(); } public void stop() { running = false; } }
150
...Thread Priority
class HiLoPri { public static void main(String args[]) { Thread.currentThread().setPriority(Thread.MAX_PRIORITY); Clicker Hi = new Clicker(Thread.NORM_PRIORITY+2); Clicker Lo = new Clicker(Thread.NORM_PRIORITY-2); Lo.start(); Hi.start(); try { Thread.sleep(10000); } catch (Exception e) { } Lo.stop(); Hi.stop(); System.out.println(Lo.click + " vs. " + Hi.click); } } Run1: (on Solaris) 0 vs. 956228 Run2: (Window 95) 304300 vs. 4066666
151
Threads Synchronisation...
// Synch.java: race-condition without synchronisation class Callme { // Check synchronized and unsynchronized methods /* synchronized */ void call(String msg) { System.out.print("["+msg); try { Thread.sleep(1000); } catch(Exception e) { } System.out.println("]"); } } class Caller implements Runnable { String msg; Callme Target; public Caller(Callme t, String s) { Target = t; msg = s; new Thread(this).start(); }
153
...Threads Synchronisation.
public void run() { Target.call(msg); } } class Synch { public static void main(String args[]) { Callme Target = new Callme(); new Caller(Target, "Hello"); new Caller(Target, "Synchronized"); new Caller(Target, "World"); } } Run 1: With unsynchronized call method (race condition) [Hello[Synchronized[World] ] ] Run 2: With synchronized call method [Hello] [Synchronized] [World] Run3: With Synchronized object synchronized(Target) { Target.call(msg); } The output is the same as Run2
154
155
156
157
158
159
160
161
Deadlock...
// DeadLock.java class A { synchronized void foo(B b) { String name = Thread.currentThread().getName(); System.out.println(name + " entered A.foo"); try { Thread.sleep(1000); } catch(Exception e) { } System.out.println(name + " trying to call B.last()"); b.last(); } synchronized void last() { System.out.println("Inside A.last"); } }
162
Deadlock...
class B { synchronized void bar(A a) { String name = Thread.currentThread().getName(); System.out.println(name + " entered B.bar"); try { Thread.sleep(1000); } catch(Exception e) { } System.out.println(name + " trying to call A.last()"); a.last(); } synchronized void last() { System.out.println("Inside B.last"); } }
163
...Deadlock.
class DeadLock implements Runnable { A a = new A(); B b = new B(); DeadLock() { Thread.currentThread().setName("Main Thread"); new Thread(this).start(); a.foo(b); System.out.println("Back in the main thread."); } public void run() { Thread.currentThread().setName("Racing Thread"); b.bar(a); System.out.println("Back in the other thread"); } public static void main(String args[]) { new DeadLock(); } } Run: Main Thread entered A.foo Racing Thread entered B.bar Main Thread trying to call B.last() Racing Thread trying to call A.last() ^C
164
165
166
2100
2100
2100
2100
G F L O P S
2100
2100 2100 2100 2100
Single Processor
Shared Memory
167
168