In this article there isn't a single line of code and it's not aimed at teaching the intricacies of multithreaded programming in any given programming language but to give a short introduction, focusing primarily on how and especially why and when multithreaded programming would be useful.
Nowadays all operating systems (except maybe the educational ones) are multitasking operating systems. This means they can run multiple tasks or processes at the same time. This has not always been so, the ms-dos was a monotasking system which means that only one task could run on the system simultaneously.
Before rushing to explain multithreaded programming is desirable to review certain features common to operating systems.
When we talk about multitasking operating systems we say they can run multiple tasks simultaneously in parallel. This is not entirely true, just multiprocessor systems (with more than one processor) can make real multitasking, in a system with one processor we talk about the illusion of multitasking.
In both cases a mechanism of the operating system called scheduler intervenes. The sheduler function is to administer the different tasks that are in execution and to decide when each task executes and - if it is applicable - the processor in which it will be executed. When a task is in execution it has the whole processor power available for itself, it doesn't share it with any other one. The same thing happens with memory space, the task that is executing has the whole available memory of the system.
So, in order for real multitasking or at least the sensation of it to take place, the tasks have to share the processor, that is to say, the task that is currently executing, will have to get out and leave the processor to another task which will take its place, this - for wich the sheduler is also in charge - is called context switch and, among other things, implies a certain overload of resources (it is necessary to keep certain relative data to the process that comes out and to load other relative to the process that comes in).
The process memory state is related with the concept of multitasking. Each process in a given instant has its own memory space, that is, the data of the process (variables, the stack, the code and the heap) are in a logical space separated from the rest so that, for the other processes all that data doesn't exist (in fact the rest of the processes are not "aware" that there are more processes, if they wanted to know about the rest of the processes running they should always look it up by means of a call to the operating system API).
When we speak of multithreaded programming we are refering to the situation in wich, inside our own application, there are diverse tasks, that is, our application distributes the work among different tasks some way.
In this sense it is necessary to remark that, for the operating system, we continue being a single process (although the operating system "knows" that we are using several threads), and being so our slice time (the time assigned by the operating system to our process) keeps constant. On the other hand only one of our threads will be executing at a given time (for each processor available).
Where's the difference then? The difference resides in that, when a process stops to execute (due to a I/O read for example) and another process begins executing a context switch takes place, involving the waste of resources stated before. However when the same case takes place among threads that context switch doesn't take place. This is because threads that belong to the same process share the same memory space.
Deciding when to go for multithreading is not an easy decision. In most cases the application is simple enough to make multithreading needless. In other cases where the aplication might be complex enough maybe the overload or the added complexity of the multithreading way (as multithreading programming is always more complex than sequencial) may cancel the possible advantages.
Generally speaking, the decision of using multiple threads must be taken when tasks exists with a high computational cost (that is, task in wich the computational cost will be clearly higher than a context switch). It will be desirable too, that any given task doesn't rely too much in the results or data provided by other tasks (as it will make no sense to program three task which will be continously wating one another) and when we foresee that they can also be "delayed" in certain moments during its execution (due to disk reads, database accesses ans do on) so that other threads may execute while that happens.
The main reason for the these points lie in what was previosly said about the executing model of the operating system.
When we talk about a blocked thread we are talking about a thread which is wating for an extern reource to complete an operation.
There are some operations that, due to the limitations of the resources accessed, are a lot slower than a normal CPU operation. That's specially true in the case of IO operations as well as accesses to remote machines. When a process (or thread) performs one of this operations it switch to a blocked state and another thread takes its place. This way, if we foresee our application can get blocked by one of this operations for a given time (while we're waiting for a database response for example), that time might be used to perform another work. If that is the case, we should consider dividing the application into different task to make the most of this "idle time" while we're accessing the database.
Let's see an example of a situation in which using multiple threads will improve the efficiency of our application. Imagine we have to access several databases located on different machines and operate with the returned results. With a single thread the trace of execution will be:
However if we implement a thread by query to the database we'll be able to be make the querys in parallel, and will be obtaining our data concurrently from them.
Notice how performance increase even if we have just one processor as, while Thread1 blocks to query the database, Thread2 executes.
Although until now we have been talking mainly about the concept of performance improvement by means of a differentiated thread use, sometimes it is interesting to separate the application in several threads although the yield gain is not very great if a suitable conceptual separation is obtained or if we want to guarantee that certain part of the application executes with a greater priority.
For example if the application has a section for loggin, a section for music management and a section for vital support, it seems advisable to separate it in 3 threads and to assign the highest priority to the part that handles the vital support.
Another example could be the design of a videogame in which, for example, we may have parts like the processing of the user input, the renderization process, the artificial intelligence subsystem... which conceptually seem like tasks that would have to be performed simultaneously instead of sequentially.
In addition to all that, we should never despise the possibility of the application being executed in machines with more than one processor (or with some technology like hyperthreading wich provides logical processing units). An application that wouldn't have anything to gain in a single processor scenario(because no input/output operations take place for example) could gain much if there are two processors available and each thread can execute in a different processor. In our videogame example, one thread could be calculating the trajectory of a falling body while the artificial intelligence system decides if it should activate the starboard thrusters.
In a multitasking operating system the processor "is distributed" between the different tasks in execution. Each task has a set of data associated wich is exclusive of this task. This data is grouped in a structure called PCB (process control block). In the PCB we can find, among other things:
In addition to the PCB, part of the information of the process consists on the resident set. The resident set of a process is the portion of a process that exists in physical memory (opposed to the rest of the program which exists in swap.
When a context switch takes place, the PCB from the exiting process is saved, the BCP of the incoming process is loaded and the incoming process begins to execute. If the previous process had cleared data of the process that had just enter (because it needed space for its own data), the data of the entering process will be located in swap, in the hard disk, so it will be necessary to bring it to physical memory again, which is a slow operation. It's also possible that in the process of bringing that data from the hard drive we will need to remove of another process that will in turn need it in its time slice.
Several reasons exist by which a process that is in execution can get blocked.
In most of operating systems, the time of the CPU it is organized in time slices, when a process enters to execute in the CPU he is assigned maximum time period which once consumed cause the process to be suspended to allow another process to execute. (This is a simplification, actual operating system have a much more complex way of distributing processes having into account priority, starvation and other things).
In order for this to take place there's a hardware mechanism that acts as a timer firing up what is called an interrupt, wich in place brings up the operating system task an allows it to block (given the case) the thread wich is executing and start executing some segment of the operating system itself which is in charge of deciding whether the current thread must give way to another one and in such case take the actions needed to remove the thread that was executing and start executing the new one.
If a process or thread currently in executing performs an IO operation, for example reading the disk, or accesing a memory location which is not currently in main memory, it will automatically get blocked while the data is fetches from the disk or the page of memory is brought to main memory.
Lastly, if a process raises an exception, by trying to access a privileged memory location, the thread or process will become blocked as the operating system handles the exception.