Threads

WHERE DOES DIFFERENT KIND OF VARIABLES GOES

Memory of a process is divided into different memory segments :- stack, heap, data, code.

code section contain executable instructions.

Global/Stack Variable :- data

Const Variable :- code

Local Variable/Temporay Variables :- stack

Dynamically allocated memory :- heap

Pointers :- stack/heap/data. ( const pointer goes to data, non constant pointer goes to stack, ptr itself is in either stack or data but pointed memory may also be a heap)

Threads of a process have their own stack, PC, Registers, But it shares code, data, openfiles, address space, etc.

When we say threads share the same address space we mean that when accessing a variable foo is global scope, all threads will see this foo variable. And similarly all threads will be able to call a exact global function.

Modern operating system also has a thread local storage for variable of global scope which are not shared. eg errno.

Threads have independent call stack however memory in other thread’s stack is still accessible.

Why does threads share heap.

May be so that they could share data between them that is not global. But now it do need synchronization.

Memory allocation and deallocation have a significant overhead.

Benefits of Multi-Threading

Efficiency
Responsive :- In UI
Resource Sharing
Multi-Processing

Concurrency vs Multi-processing

One process can work at most one process at a time.

When many process are interleaved with each other. Means one processor is letting many process to make progress giving a illusion of multi-processing.

Multi-processing is when two process are actually running parally.

Problems faced by programmers with multithreading.

Identifying
Balancing
Data dependency
Testing and debugging

Data Parallelism vs Task Parallelism

In data Parallelism, we perform the same task on different data in parallel.

In Task Parallelism, we perform different task on same data in parallel.

Multi-Threading Models

So, Application Programmers work on user thread but process are actually executed on kernel thread, So we need a mapping from user thread to kernel threads.

Many to One Mapping

In this many user threads are mapped to only one kernel threads. It has many disadvantages.

If one user thread makes a blocking call, entire process get blocked.
As All user thread of a program is mapped to only one kernel thread, Hence multi-processing is not possible even in multicore processors.

This Allows developer to create as many threads as needed but true concurrency doesn’t happen because only thread can be scheduled at a time.

One To One Mapping

In this one user thread get mapped to only one kerner thread. It solves problem of many to one mapping.

If one user thread makes a blocking call, other thread of same process can still work.
It offers multiprocessing on multicore processors and also a better concurrency that many to one model.

Disadvanges

As while creating a kernel thread, we need to switch from user mode to kernel mode and make few system call. So creating a kernel thread has a overhead which can burden the system and may lead to depreciated performance. That’s why most of its implementation has a restriction on the number of threads available to the user.

Many to Many Model

In this many user thread get mapped to only many kerner thread (usually less or equal than user thread).

The number of kernel threads can be restricted for each application or for machine.

This allows developers to create as many thread as needed and also allow better concurrency and multithreading on multi core processors.

Also blocking call does not lead to blocking while application.

In pratice most of the modern system uses a two level model which is a combination of 1-1 model and many - many model

Asynchronous vs synchronous threading

In Asynchronoud threading, once the parent creates the child, the parent continues So that parent and child executes concurrently. Each thread run independently of every other thread and parent thread need not know of its child termination. Hence little data sharing between threads is needed.

In synchronoud threading, Parent must wait for all its children to terminate before it resumes its execution the so called fork-join strategy. Synchronoud threading usually involves great sharing of data.

Thread Libraries

Thread Libraries provides the API for creating and implementing threading. Thread library can be implemented either in user space and in kernel space.

In user space, All code and functions exist is user space means invoking a function in the library results in calling a local function not a system call.

In kernel space, code and functions exist is kernel space means invoking a function in the library results in a system call.

Pthreads

This is a specification for thread library not implementation.

It’s certainly better to create a thread as compared to create a new process but creating a new thread do take time and new thread will be discarded once its work is completed. Also we can allow unrestricted creating of threading as it can lead to CPU resource exhaustion.

Implicit Threading

It’s become difficult for Application Programmers to make Application containing 100s to 1000s of threads. One way to solve this is to take the burden of creating threads from Programmer and let kernel do implicit threading.

One way to do so is Thread Pool

Idea is to at the start of the process create a number of thread and place them in a pool where they sit and wait for work. So if we need one thread, it awakes a thread from pool and assign work to it and once thread completes it’s work thread returns to the pool.

Awakening a thread is better and creating a thread.
A Thread pool limits the number of thread at a time.

The number of threads in the pool can be set heuristically based on factors such as the number of CPU s in the system, the amount of physical memory, and the expected number of concurrent client requests

Threading issues

The fork() vs exec() system call

So when we call fork(), we have to decide whether we want duplicate all threads, or we want new process to be single threaded.

We don’t have any problem in callling exec() as exec replaces process space with new process so only calling thread need to be duplicated.

When exec() is called after fork() then only calling thread is need to be duplicated otherwise all thread need to be duplicated.

Signal Handling

Signal are used to notify a process/thread of the happening of some event.

Signal is generated by the occurence of some event.
Signal is sent to the process.
Process handles signal by invoking signal handler.

Signal can be of two types

Asynchronous Signals :- These signals are generated by external events. Ex. specific keystroke, timer expired.
Synchronous Signals :- These signals are generated by internal eventes Ex. Division by 0, segmentation error.

Whenever a signal is generated, we have few options to deliver to

Deliver it to every thread
Deliver it to specific thread
Deliver it to certain theads.
Have a specific thread to receive all signals and deliver it to that thread.

Synchronous signals need to be delivered to the thread generating the signal only

However Asynchronoud Signal have many options.

Terminating signal need to be sent to all threads.

kill system call in linux actually generates a Asynchronous signal

Thread Cancelation

Thread cancelation means terminating a thread before it has completed it’s work.

Thread cancelation can be two types :-

Asynchronous cancelation:- One Thread imidiately terminates the target thread.
Deferred cancelation:- The target periodically checks at safe plces whether it should terminate or not, This allows the thread for a safe termination.

Asynchronous cancelation has few problems.

what if thread get terminated when it sharing data with other.
OS reclaims resources from cancelled threads but will not reclaim all resources. So cancelling a thread asynchronously may not free all resources.

Thread Local Storage

Threads may need its own copy of certain data. We can store such data in TLS.

TLS is different from local variable as variable in TLS persist over many function calls.

TLS is much like static variable with only difference being TLS is local to a thread.

Scheduler Activation

A final issue to be considered with multithreaded programs concerns communication between the kernel and the thread library to help ensure the best performance.

Many systems implementing either the many-to-many or the two-level model place an intermediate data structure between the user and kernel threads. This data structure —typically known as a lightweight process, or LWP.

To the user-thread library, the LWP appears to be a virtual processor on which the application can schedule a user thread to run. Each LWP is attached to a kernel thread, aand it is kernel threads that OS schedules to run on physical processors.

If a kernel thread blocks (such as while waiting for an I/O operation to complete), the LWP blocks as well. Up the chain, the user-level thread attached to the LWP also blocks.

An application may require any number of LWP s to run efficiently. Consider a CPU -bound application running on a single processor. In this scenario, only one thread can run at at a time, so one LWP is sufficient. An application that is I/O -intensive may require multiple LWP s to execute, however. Typically, an LWP is required for each concurrent blocking system call. Suppose, for example, that five different file-read requests occur simultaneously. Five LWPs are needed, because all could be waiting for I/O completion in the kernel. If a process has only four LWPs, then the fifth request must wait for one of the LWP s to return from the kernel.

One scheme for communication between the user-thread library and the kernel is known as scheduler activation. It works as follows: The kernel provides an application with a set of virtual processors ( LWP s), and the application can schedule user threads onto an available virtual processor. Furthermore, the kernel must inform an application about certain events. This procedure is known as an upcall. Upcalls are handled by the thread library with an upcall handler, and upcall handlers must run on a virtual processor. One event that triggers an upcall occurs when an application thread is about to block. In this scenario, the kernel makes an upcall to the application informing it that a thread is about to block and identifying the specific thread. The kernel then allocates a new virtual processor to the application. The application runs an upcall handler on this new virtual processor, which saves the state of the blocking thread and relinquishes the virtual processor on which the blocking thread is running. The upcall handler then schedules another thread that is eligible to run on the new virtual processor. When the event that the blocking thread was waiting for occurs, the kernel makes another upcall to the thread library informing it that the previously blocked thread is now eligible to run. The upcall handler for this event also requires a virtual processor, and the kernel may allocate a new virtual processor or preempt one of the user threads and run the upcall handler on its virtual processor. After marking the unblocked thread as eligible to run, the application schedules an eligible thread to run on an available virtual processor.