10. Module 3: The Concurrency Module

Figure 10.1: Schematic of the concurrency adaptor

UNTIL NOW WE HAVE ADDRESSED two sub-problems. These are a) the problem of enforcing an action on a component in such a way that the concurrency strategy is bypassed and b) the problem of learning how a component is best kept alive. The last module, which we will now present, is necessary to control access to resources by multiple clients.

Firstly we will explain how we can deduce which resources are involved, secondly we will explain what exactly the problem is we want to solve. Thirdly we will explain when a component has control over a certain resource. Fourthly, we will explain how our concurrency module can exert control over the resources involved. And finally we explain how race-conditions can be avoided.

10.1 Resources

BEFORE WE CAN EXPLAIN how we can manage access to the different resources we need to define which resources we will consider.

A resource in our case is defined as all the $logic$ transitions that can be found in the elementary net, expanded from the colored Petri-net. Every such a transition is a resource in its own right. This approach to resources in a Petri-net is good enough for our case. Clearly the real resources will have a much larger granularity than the definition we impose here. For instance a resource in our whiteboard is at least the size of a square. If we consider resources to be the state of transitions then we have 4 times as many resources: a setPosition(1,1), returnSetPosition(1,1), getPosition(50,70) and returnGetPosition(14,3) for every square. However this is an advantage because it allows for a greater flexibility in aligning the resources. For instance, with our definition of resources one can specify that square $x,y$ is locked for all operations and that square $x+1,y$ is locked for the read-operations. Furthermore, this definition of resources is also complete in the sense that no resource can be found that cannot be expressed based on the state (enabled or disabled) of transitions. This definition can be extended to include the state of $logic$ places of the core functionality. However we will continue to work with only the transitions. Definition:

A resource within our Petri-nets is defined as every possible logic transition in the elementary Petri-net expanded from the colored Petri-net.

With this definition of resources, we will now investigate what kind of race conditions can occur, and why we need to manage this.

10.2 The Requirement: What is a Race-Condition ?

THE NO-CONFLICT REQUIREMENT (given in section 7.2.1 on page [*]) states that no action requested by the client should at any moment be impossible on the server. Simply by connecting the liveness-module to the enforce-action module we can easily guarantee this. However, if we do so, the no-race requirement is not satisfied. Below we illustrate why.

Figure 10.2: A race condition caused by the adaptor. The yellow and green lines are critical sections. As can be seen the critical sections interleave (race) on the server.

The reason why connecting the liveness module to the enforce-action module does not work is that we effectively remove any concurrency strategy on the server. As such, a client can execute any server action at any time it wants. This allows for a greater freedom, and effectively avoids the conflict. However, the server can no longer guarantee that a transition that is enabled for the first client is not reassigned to another client during execution. E.g., it is possible for an adaptor to unlock a resource locked by $A$, lock the resource for component $B$, unlock the resource for component $B$, and finally lock it again for component $A$. Here a concurrency problem will occur because component $A$ expected the resource to be locked the whole time, while in practice another component was able to change the state of the given resource. This is pictured in figure 10.2.

To avoid this we should no longer only look at the fact whether a transition is enabled or disabled. The real problem lies with the change of enabled to disabled transitions not with the state they are in. If we want to make a concurrency module that avoids race conditions the concurrency module needs to understand when which component can change the 'enabledness' of a transition, or in other words when which component has control over which resource.

10.3 Control

IN THIS SECTION WE INVESTIGATE how resources can be controlled by the different clients. A resource is (as stated earlier) defined as the transitions within the expanded Petri-net. When a process has control over a certain resource it means that the process can access that resource and use it, or modify its state. In our case, we can easily define control over a resource as the ability to execute a transition, or modify the state of the transition. However, not all transitions of a Petri-net can be controlled by the underlying component. Most outgoing messages (messages such as Lock, Read, Write, Unlock) can be under control of the underlying component, however the incoming messages (messages such as LockDone, ReadDone, WriteDone) are typically not under control of the client component. This approach can also be found back in [MA97]. Definition:

A transition is under control of a client component if it can be executed immediately or its state can be modified directly or indirectly through using $logic$ messages.

Because the $incoming$ and $outgoing$ $logic$ messages can be sent (or involuntarily) received, they take part in the process of verifying whether the state of a transition can be changed. Once an $outgoing$ $logic$ message is sent out, the concurrency module will simply pass it through to the server component. If an $incoming$ $logic$ message comes back from the server, this message is again passed through to the client component. On the other hand, $outgoing$ $synchro$ messages can be sent out by the client component, but do not need to be passed through and offer the concurrency adaptor a choice where it can choose which enabled $incoming$ $synchro$ message to send back. For instance, such a choice can be the request for a lock. The concurrency adaptor has the choice to accept or reject the lock by sending back a LockTrue or LockFalse message. However, both choices will result in different enabled/disabled $logic$ transitions.

Once a LockTrue is sent back the client component can choose to execute a Read or Write, thus the read and write transitions are under control of the client component. The messages that can be sent back by the server component are readDone and writeDone, thus indirectly under control of the client-component. Given a Petri-net and a corresponding marking, this controlled-by information can be calculated automatically by means of a reachability analysis, similar to the one described in section 8.5.

10.4 Choice-Points

BEFORE INVESTIGATING HOW WE CAN SATISFY the no-race requirement we need to know what kind of actions can be taken by the concurrency module.

It is clear that the concurrency module can not simply insert new $logic$ transitions, because this would alter the runtime behavior of the system. However, it should have the possibility to insert $synchro$ transitions at certain places. However, inserting this kind of messages is already done by the enforce-action module and the keep-alive module. Nevertheless, the link between the concurrency module and the enforce-action/keep-alive module is still not fully explained.

Therefore, to be able to let the concurrency-adaptor choose a certain action, the keep-alive-adaptor will supply a set of possible futures, sorted according to their priority. This will allow the concurrency-adaptor to first try to reach a certain target future and acknowledge that future. If such a future cannot be reached without breaking the no-race requirement, another future will be tried until a suitable one is found. If no possible future can be realized the component will be set to wait until such a future can be realized.

From now on we will assume that the choices offered by the keep-alive module contains information as how to select a certain future and control-information that describes which states become under control of the given component when that future is selected. If a transition comes under control of the component we will mark it with a '+'. If a transition does not come under the control of the component we mark it with a '-', otherwise, if it is undecided we mark the transition with an '?'. We will write this down as an $1\times m$ matrix, where $m$ is the number of $logic$ transitions in the exploded colored Petri-net. For instance

& \textrm{clearPosition} & \textrm{j...
...& \textrm{setPosition}(1,1)\\
( & - & + & + & + & )\end{array}\end{displaymath}

Transitions for which it is undecided what the future will bring, will be considered to be under control of the client component.

10.5 Avoiding Race Conditions

GIVEN THE DEFINITION OF RESOURCES and the possibility to choose certain futures that extend different control over the resources, we can now explain how the concurrency module avoids colliding critical sections.

We will do this by means of two matrices. The first matrix, called the control-matrix is of size $n\times m$ where $m$ is the number of resources available and $n$ is the number of clients available. On this matrix the rows contains the clients. On the columns the transitions (resources) will be placed. If a resource is under control of a client a $+$ is placed in the matrix, otherwise a $-$.

The second matrix has the same layout but marks whether a certain resource is managed or not by the concurrency module. This is necessary for two reasons. First, a resource can always be enabled by different components (a resource like joinActor for instance) and second, sometimes certain clients don't have a concurrency strategy at all. It is clearly impossible to manage concurrency for a client that does not specify any concurrency strategy, or for resources that do not take part in a concurrency strategy. Therefore, the concurrency module needs to keep track of the resources and clients it can and will manage. If a resource/client is managed it is marked with a $+$, otherwise it is marked with a $-$.

& joinActor & retJoinActor & ...
... - & - & + & -\\
Client5 & + & + & - & - & +\end{array}\right)\end{displaymath}

& joinActor & retJoinActor & ...
... + & + & + & +\\
Client5 & - & - & - & - & -\end{array}\right)\end{displaymath}

The above example of the control matrix $C$ and the managed matrix $M$ illustrates the difference between both concepts. For instance column one and two can be a joinActor and a returnJoinActor, For all clients (except the fourth), this joinActor and returnJoinActor are under control of the client because the clients can choose to execute that transition at any time. However, the fourth actor, is unable to extend control over the joinActor because it must (for example) first call a joinLocking first. However, after the first joinLocking message, the concurrency module will recognize that the joinActor messages are effectively managed by the client component, hence the $M$ matrix marks these resources and clients with a true. ($M[4,1..2]$).

The columns 3,4 and 5 are the separate operations on all the squares off the board. The first and second client currently extends no control over the resources, however, only the second client actually makes use of a concurrency strategy for those resources ($M[2,3..5]$). The fifth client on those resource has currently control over the last resource $C[5,5]$, however, this control is unmanaged by the concurrency module $M[5,5]$ and as such must be treated differently when realizing a certain future.

Therefore, we will now explain how $logic$ and $synchro$ messages are handled by the concurrency module.

10.5.1 Handling Incoming $logic$ Requests

The concurrency module handles incoming $logic$ requests by verifying whether the message is allowed for that component, given the current situation. If an action is allowed it can be passed through to the server component. If an action is not allowed, the concurrency module has no option but to place the action in a queue and wait until a suitable configuration of the other components arises to send the message through. An action can be allowed for two reasons. Firstly, it can be allowed because the action itself is managed by nobody, or the clients managing the resource currently does not extend any control over the resource. Secondly, an action can be allowed because it concerns a resource that is managed by the client-component and is controlled by the client. Formally:

A request $t$ coming from client $id$ is allowed $\iff$

C[id,t]\wedge M[id,t]\vee(\forall i \vert 0<i\not=id<n : \neg(C[i,t]\wedge M[i,t]))
\end{displaymath} (10.1)

10.5.2 Handling Incoming $synchro$ Requests

A second kind of messages that should be handled by the concurrency module are requests to choose a certain future. These requests come from the client component only if there are multiple futures to choose from. The responsibility of the concurrency module is to choose a correct future that does not collide with the existing configuration of other clients. Therefore the $C$ and $M$ matrices will be used. However, three things need to be discussed. First we need to discuss how we can detect whether a resource is managed, second we will discuss when a future can be realized and third we will discuss what happens when such a future is realized.

In the discussion below we assume that a client with identification $id$ has proposed a number of futures $F_{i} (i=1..f)$. As already explained, a future is specified as a $1\times m$ matrix, where $m$ is the number of resources involved. Detecting managed resources

A resource is managed at the moment a client component offers two different futures for that resource and enables the concurrency module to choose between both. Therefore, we will update the $M$ matrix as follows

M'[i,j]=M[i,j]\vee\bigvee_{k<f,  l<f}F_{k}[j]\not=F_{l}[j]\end{displaymath}

The above line states that a resource is managed if different futures have different control over the resource. Is a future realizable ?

Given this information, we can now verify whether a certain future $F$ can be realized for client $id$. In general, a future $F$ can be realized if its resources can be realized. A resource $j$ can be realized only if it realization does not block out all clients (including itself) to use the resource. To understand better how such a situation might occur, we investigate all the possible tracks. We do this by only looking at one resource $j$. If there exists one offending resource (in $F$) then the entire future is not realizable.

Intuitively, when future $F$ specifies that it does not want to control resource $j$, then client $id$ will not prohibit any other client from using the resource. This can be seen in equation 10.1. If $C[id,j]$ is false, then the term $\neg(C[id,j]\wedge M[id,j])$ will be true, hence not blocking other clients from using $j$. On the other hand, if $F$ wants to control resource $j$ then we must look at the state of the other clients. We now look for a client (denoted $k$) that might give rise to a situation that locks out all clients.

  1. $C[k,j]$ is false. This will lead intuitively to no conflict because client $k$ does not extend any control over the resource, hence will not collide in the future. If this is the case the module will not prohibit F from being realized.10.1
  2. When $C[k,j]$ is true, we have a problem. Client $id$ wants to lock resource $j$, but client $k$ already has locked resource $j$. Now, depending on how the resources are managed different results may be obtained:

    1. $M[k,j]$ is true. Here, resource $k$ is managed and controlled by client $j$. Depending on whether $id$ manages its resource, we have different possibilities

      1. $M[id,j]$ is true. If we would allow $F$ from being realized then we would have two clients accessing the same resource. This cannot be allowed, hence we must prohibit $F$ from being realized.
      2. $M[id,j]$ is false. If we allow $C[id,j]$ to become true, client $id$ would have no access to the resource as long as client $k$ has locked resource $j$. Hence, this situation does not prohibit the realization of $F$.
    2. $M[k,j]$ is false. Here, resource $j$ is unmanaged by client $k,$ but the resource is under control of that client. Depending on whether $id$ manages the resource $j$, different situations can arise. If client $id$ does manage the resource then client $k$ will queue its $logic$ requests, while client $id$ will be allowed to use the resource. Hence this does not prohibit $F$ from being realized. However if client $id$ does not manage the resource either, but still claims to control it then we will might have race-condition. Of course, this is to be expected because none of both clients manage the resource, so we should allow this behavior.
Formally we express this intuition as:

F\textrm{ can be realized }\iff \forall0<j\leq m: \neg F[j] \vee \bigwedge_{k\not=id}\neg collide(k,id,j)\end{displaymath}

collide(k,id,j) \iff  C[k,j] \wedge (M[k,j]\wedge M[id,j])\end{displaymath} Realizing a future

One the concurrency module has determined which futures can be realized, it will choose one to realize. Realizing a future is easy. The only thing we need to do is copy $F$ to the correct place in the matrix $C$. Thus:


If no future can be realized the request is placed in a queue until it can be realized.

10.6 Discussion

BELOW WE DISCUSS THIS APPROACH. We investigate how certain clients behave when mediated by means of this module. We will explain that our module in most cases behaves as expected, however in certain special cases it will fail to work.

10.6.1 Server Requirements

The first observation about this module is that the concurrency strategy of the server does not matter ! The enforce-action module uses the concurrency strategy only to bring the server in a required state. The fact that we no longer need to take into account the offered concurrency strategy is important for two reasons. First, it is a strong result because not a single server component in such an automatically mediated conversation needs to offer a concurrency strategy. Second, it allows us to look only at the indirect conflicts between different client components that want to access the same server. This is what we will do now.

10.6.2 Unmanaged Concurrency

client 1 client 2 client 3 without adaptor with adaptor

We return to our conflict cases and investigate how our adaptor might improve the communication between multiple clients and a server. To do so, we first observe the behavior of a set of clients with no concurrency strategy. Afterward we observe the same set of clients for which only one requires a concurrency strategy.

If no client involved in a communication session specifies a concurrency strategy then the adaptor will simply pass through all requests. To understand this, we point out that, if no concurrency strategy is introduced, then no resource is managed, hence we end up in branch 2b (section 10.5.2). By passing simply passing through any request, the module will not avoiding race-conditions. Hence, the adaptor will mediate the concurrency conflict and will still exhibit the implicitly required behavior, that is, not to manage concurrency.

On the other hand, if one of the connected clients offers a concurrency strategy, all the unmanaged clients might need to wait until the concurrency managing client has finished its operations. To understand this we point out that a concurrency strategy will lead to managed and controlled resources. When one of those is present, all other clients will set to wait until the one managing its resources has finished.

This strategy clearly improves the working of the system. If the adaptor would simply mediate the differences, the unmanaged concurrency clients might access the server while the managed concurrency strategy is supposed to access resources exclusively. This observation shows how our adaptor might not only mediate, but also improve the concurrency behavior.

10.6.3 Livelocks

client 1 client 2 client 3 without adaptor with adaptor

The presented concurrency module does not care about livelocks. For instance, if none of the involved components takes the possibility of starvation into account, then the resulting behavior may exhibit a behavior that will lock out a certain client. However, it is possible to insert a learning algorithm in the concurrency module that will learn how to avoid livelocks and learns which timing to apply to keep all involved components alive. Because this behavior was not part of the required interface, this is clearly an improvement. However, improving the working of the whole is not always be possible, such as in the case for deadlocks.

10.6.4 Deadlocks

client 1 client 2 client 3 without adaptor with adaptor

If the involved components require a concurrency strategy which is deadlock-prone, such as a waiting square locking strategy, then the adaptor will not avoid deadlocks. The adaptor will still mediate the differences in syntax, control-flow, re-entrance, granularity conflicts and other variabilities. This shows that the adaptor working as required, however it will not resolve deadlocks automatically. It would be nice to have an adaptor that not only mediates the differences but also avoids any concurrency problems. In section 13.8.6 we will briefly explain how we could approach this. However, our adaptor will be able to detect a deadlock if one might arise.

10.6.5 Implicit Requirements

client 1 client 2 client 3 without adaptor with adaptor

Another observation we can make, is that certain implicit requirements cannot be detected, thus not mediated. Sometimes deadlocks can be avoided by cooperating clients if they apply a staircasing technique (see section 5.8.3 on page [*]). If the clients apply such a technique then the adaptor mediates the differences and works as expected, thus avoid deadlocks. However, if another client, one that does not honor this implicit requirement, joins, then the whole conversation might deadlock. In section 13.5 we come back to this issue.

10.6.6 Virtual Resources

Figure 10.3: A Petri-net of a layered concurrency strategy. The green boxes are the logic transitions which form the resources involved.

The only case where the concurrency module does not behave as expected is when certain resources are 'virtual', in the sense that they cannot be deduced from the Petri-net. Such a situation occurs for instance in conflict 6.12 (page [*]). In this conflict the client requires the possibility to first lock the entire field. However, the Petri-net involved (picture 10.3) does not have any logic transition that indicates that the entire server should be considered to be a resource. Specifically, it is possible that the client requests to enter the server. This forms two problems. First, because the liveness module has no choice but return enterOk. Second, because if this future is involuntarily realized, none of the resources come under control of the client component. In this case the concurrency adaptor fails to work as expected. However, if the enter and leave transitions would be coupled to real logic transitions, then this would not form a problem.

From these results we can conclude, that if we present our adaptor with a set of concurrency strategies the result will work as expected and mediate the conflict correctly. On the other hand, if the input contains virtual resources or contains implicit requirements, the resulting behavior might be incorrect by introducing deadlocks. However, in every case, the adaptor will still mediate the conflict.

10.6.7 Implementation Issues

One of the implementation issues we encountered is a problem of performance. We define a resource as a transition within the expanded elementary Petri-net. This means that for a simple $32\times32$ whiteboard, the expanded Petri-net will offer around $32\times32\times5$ resources. Creating this expanded Petri-net takes a long time if not optimized.

Also, because the liveness module need to specify which resources they will control, they have to transmit relatively much information. To optimize this only change sets could be used. Also, certain resources are similar. If an act can be done on a resource, then a getPosition is also always possible. Deducing this information might help in optimizing this module, it might also help in better defining what a resource exactly is, given the Petri-net.

10.7 Summary

IN THIS CHAPTER WE HAVE EXPLAINED how we can easily create a module that avoids race-condition on the server. Firstly we explained which resources we consider. Secondly, we explained the requirement to avoid race conditions. Thirdly, we explained when a component controls a certain resource. This was necessary to be able to specify when a resource is involved in a critical section. Fourthly, we explained how our concurrency module is able to control the behavior of the clients involved and how it manages to avoid race conditions.


Please note that we say 'does not prohibit $F$ from being realized' instead of 'allows $F$ to be realized'. This is because we are discussing when one client $k$ will block a future. It is perfectly possible that client $k$ does not block a future, but that client $k+1$ does so.
Werner 2004-03-07