Subsections

12. Discussion

IN THIS CHAPTER we validate our approach by observing the experimental results. First, we will briefly summarize our approach and the results. Second, we make some important observations based on the experiments and third we present some technical limitations of the presented work.

12.1 Introduction

WE WILL NOW BRIEFLY summarize our approach and the experimental results. In our approach we created a general concurrency adaptor that can be used to mediate the differences between conflicting concurrency strategies: one server and multiple clients. As stated in the introduction section 1.3.5, the adaptor we wanted to create should be able to mediate concurrency conflicts in such a way that no conflict arises. We have defined a conflict as a situation in which the mutually agreed behavior between the different communicating partners cannot be executed over the interconnection between them. In our case study, the whiteboard, the mutually agreed behavior between the different actors was not to cross each others boundaries.

The adaptor itself contains three modules. The first module is an enforce-action module which bypasses the concurrency behavior of a server component. This module makes use of a prolog program. The second modules is the liveness module. For every client component present, a liveness module will be created. Such a liveness module bypasses the required concurrency strategy in such a way that the client component can continue with its core functionality. The liveness module has been implemented by means of a reinforcement learning algorithm. Experimentally we observe that this approach is good if the client component returns correct rewards (that is rewards that are statistically correlated to the state of the involved Petri-net). The third module is the concurrency module, which takes care that no race conditions occur within the message order between the participating clients and the server. This module has been explained in chapter 10.

12.2 Observations about the Concurrency Adaptors

Table 12.2: Overview of the conflicts and the adaptors.

Conflict		Adaptor			Result
Description	Possibly	Module
	Deadlock	Liveness	Concurrency	Enforce
6.1: Simple Syntax		Ok	Ok	Ok	Ok
6.2: Parameter Encoding		Ok	Ok	Ok	Ok
6.3: Simple Reentrancy	Yes	No Choice	+Dead	Ok	Ok
6.4: Async Reentrancy		Ok	Ok	Ok	Ok
6.5: Simple Control flow		Ok	Ok	Ok	Ok
6.6: Idle Control flow		Non Markov	N/A	Ok	Fail
6.7: Waiting client, Non-waiting server	Yes	No Choice	+Dead	Ok	Ok
6.8: Nested line server, square client		Ok	Ok	Ok	Ok
6.9: Non-nested line server, square client		Ok	Ok	Ok	Ok
6.10: Square server, field client		Ok	Ok	Ok	Ok
6.11: Layered server, non-layered client	Yes	Ok	+Dead	Ok	Ok
6.12: Layered client, non-layered server		Ok	-Dead	Ok	Fail
6.13:Transition		Ok	State	State	Ok
6.14: Nested Transition		Ok	State	State	Ok
6.15: Layered Transition		Ok	State	State	Ok
6.16: Transactional Client		Ok	State	State	Ok
6.17: Empty server	Yes	No Choice	+Dead	Ok	Ok
6.18: Upscale granularity	Yes	Ok	+Dead	Ok	Ok
6.19: Downscale granularity		Ok	Ok	Ok	Ok
6.20: Waiting clients, non-waiting server	Yes	Ok	Ok	Ok	Ok
6.21: Multi-multi conflict		Ok	Distri	Ok	Out Of Scope
6.22: Multi-multi conflict 2		Ok	Distri	Ok	Out Of Scope

IN TABLE WE give an overview of the discussed conflicts from chapter 6. The ``description'' column describes which conflict we are covering. The ``possibly deadlock'' column contains 'Yes' if the approach used by the clients involved in the conflict could possibly lead to a deadlock. (for instance, a waiting locking strategy). The ``Liveness'' column specifies whether the liveness module works on the involved client. If a 'No choice' is mentioned, the liveness module has not much choice to keep a component alive. For instance, a waiting locking strategy, where a lock request can only result in a lockTrue. 'Non Markov' in this column means that the rewards are specified in such a way that the Markov property does not hold. The concurrency column specifies how the concurrency adaptor works. '+Dead' means that the concurrency module could possible deadlock, if the involved components exhibit such a behavior that might lead to a deadlock. '-Dead' means that the concurrency module will possibly introduce deadlocks, even when the client components will not lead to a deadlock. 'State' means that not only the resources need to be considered, but also the content of the resources. The ``Enforce'' column specifies how the enforce-action module performs on the server involved in the conflict. The result column states whether the overall behavior will be correct or not.

We will now observe some of the behavior of the 22 adaptors:

The idle control flow conflict (6.6) cannot be mediated because the rewards are non Markov, therefore it is impossible to get the liveness module to work.
The layered control flow conflict 6.12, occurs because a client expects a field-lock to be available, while the server doesn't offer one. This conflict can be mediated. However, if multiple similar clients join, the result might deadlock, while the clients are specifically designed to avoid deadlocks by first obtaining a server lock. The reason why this layered conflict cannot be mediated is because the resource (the server-lock) is a virtual resource.This is discussed in more detail in section 10.6.6.
The transitional conflicts (6.13, 6.14, 6.15 and 6.16) can be mediated if the enforce-action module and the concurrency module take into account the state of the Petri-net places that are common to the involved components.
The two multi-multi conflicts (6.21 and 6.22) cannot be mediated because the concurrency module is not designed to be able to handle multiple servers.

Given the required mutually agreed behavior between the different actors: not to cross each other's boundaries, we observe that in most cases (18 of the 20 tested conflicts) the concurrency strategy is mediated correctly and allows the components to execute as they intended. Below we will continue with some further observations we can make with respect to these results.

12.3 Concurrency Strategies are Very Simple

AN OBSERVATION WE CAN MAKE about the current concurrency strategies is that they are essentially very simple. Even the rollback-able concurrency strategies, which are not necessarily easy to implement, are often nothing more than finite state machines. This statement is based on two observations. First, our reachability analysis can deduce very quickly how to bring a server component in a certain state. Secondly, our liveness module learns very quickly what to do in a certain situation. This is to be expected to a certain extent because we have carefully created the representation of our situation-recognition system. However, this representation itself is also nothing more than a finite state machine.

If we think about interfaces (not only concurrency interfaces), it is not commonplace to hide features and functionalities within the API. On the contrary, most interfaces are designed to make it easy to access all the necessary operations and modify the state of an object/component.

From these two observations one might think that it would have been more appropriate to use finite state machines to describe concurrency strategies. This is not the case because the concurrency strategies themselves often contain parallel resources that act independently. This is very difficult to express with standard finite state machines. In section 3.7 we have given two examples that demonstrates the need and advantages of using Petri-nets.

12.4 Technical Observations

IN THIS SECTION WE DISCUSS some technical limitations and observations we can make, based on the experiments we have performed.

12.4.1 Non-Incremental Approach: The Problem of Late Joiners

A limitation to our work is the problem of late joiners. We did not investigate what will happen when a new component joins a set of already communicating components. Currently, this situation is not allowed. However we feel this will not constitute an obstacle. A keep-alive module can be created for every joining component and the concurrency strategy can be extended to allow new components to join.

12.4.2 The Component System

During our research we implemented a new component system. We conclude from our experiments that this component system offers a lot of advantages, especially with respect to glueing components together. Its high level reified message representation, together with its disciplined non blocking messaging system allows for easy adaptor creation. In comparison with object oriented languages, where the problem of glueing together different libraries and programs is well known, this is an advantage. For instance,

The component system allows for writing adaptors in an easy, understandable way as discussed in chapter 2. Compared to object oriented languages, the problem of late binding, polymorphism an inheritance, which complicates the writing of adaptors, is not present.
The component system allows for transparent distribution because the model itself doesn't follow 'call' semantics and is connection based.

In appendix 13.8.6 we discuss how difficult it would have been if we would have used something like Java RMI to do this work.

12.4.3 Thread based versus Event Based systems

During the experiments we have used an event based component system. This allowed us to write adaptors easily and is a model representation of an open distributed system. However, open distributed systems are not the only area where conflicting interfaces occur and not all applications are written by using an event based approach. If we look at applications that make use of object oriented frameworks then we encounter similar problems because not always all upgrades of part of the application are entirely backward compatible [LSMD96]. A small alteration in the provided behavior or a wrong estimate of the usage scenarios can result in a complete application breakdown. This also happens for applications that dynamically link to external libraries (.DLL's offering extra functionality or plugins). Typical for this kind of object oriented application is its seemingly 'sequential' behavior over object boundaries. In fact, this kind of application is often written within a thread based context and they rely heavily on a shared memory (in the broadest sense of the word). However, the use of a shared memory as a communication medium between threads (or between different parts of an application) makes it very difficult to isolate the communicating partners and modify the flow of information between them. Since it is necessary to modify the information flow to resolve any conflicts it might be very difficult to use a similar approach as the one we have presented. Appendix 13.8.6 contains an extensive explanation of the difficulties of using such a thread based model.

12.4.4 Learning Algorithms

Another observation that can be made is that a randomized testing procedure (such as is implemented by different learning algorithms) is the best guarantee to write well-tested and robust software. Actually, our experiments took longer than expected due to the fact that our learning algorithm always created unanticipated situations. This happened in more than one area as discussed below:

Petri-net evaluator: We are now able to present a Petri-net evaluator which has been thoroughly tested, both the evaluation of all kinds of random expressions as well as the evaluation of randomized Petri-nets has been tested.
Implementation of Actors: On the actor level a lot of unexpected interactions were detected, such as in the flood fill algorithm 25 on page . During all our tests we were focused on a flood fill that slowly increases its terrain. However we specified a Petri-net that simply declares a non-nested locking strategy. This means that every lock can return a lockFalse. The checkpoint (1c) in the algorithm doesn't expect a possible lockFalse and will fail if a lockFalse is returned in the line before. This was only found out after measuring the fitness of the adaptor.

This has led us to use genetic algorithms not as a means to solve the problem of conflicting interfaces, but rather as a tool which helps in creating a suitable representation and which has helps in black box testing our Petri-net evaluator.

12.5 Observations about Petri-net Interface Descriptions

THE USE OF PETRI-NETS AS AN INTERFACE DESCRIPTION LANGUAGE allows for a better software development process. Not only is the code documented in a readable and formal way, the formal properties of the documentation allow for automatic testing of the underlying component. Given the Petri-net and a correct component we then can

check completeness: A testing program can take random walks through the Petri-net and check whether the underlying component works as expected: it does not crash, it does not end up in an invalid state and so on. When using timed Petri-nets, the timings can be verified and when needed memory-requirements can be verified against executed transitions.
check consistency: A testing program can verify whether all incoming and outgoing messages from and to the component are consistent with the Petri-net. For instance, it can check that all messages sent out by the component are enabled within the Petri-net.
track behavior of external components: By inserting a tracing component (section 3.6.3) it is possible to verify whether external components honor the messages they are allowed to sent.
If we even go a step further, the Petri-net can help in guarding access to the component, by disallowing unexpected messages.
do a formal analysis to guarantee that no dead ends can be reached within the component. It is also possible to verify whether there is a home-state that is always reachable from any possible reachable state. This is important because this guarantees that a component can be reset at any time.

It is clear that the advantages offered by using a Petri-net description of an interface can outweigh the difficulties of writing one.

12.6 Performance

ALTHOUGH THIS DISSERTATION MAKES NO claim about performance, we will briefly discuss the performance of the entire adaptor. To do so we will discuss three separate estimates. For the entire adaptor we will investigate

$t_{adaptor}$ : how much time is spend in handling one message ?
$m_{adaptor}$ : how much memory is required by the adaptor ? We will express this in function of the minimum memory requirements of every involved component.
$n_{adaptor}$ : how many messages are sent for every single message from a client to a server and vice versa.

To make these estimates we will investigate the timing and memory behavior of the three different modules. Afterwards we will investigate the message count. To be able to make these estimates we will need a number of variables:

The number of server messages sent will be denoted $n_{server}$ . A message is a server message if it is sent out by the server towards one of the clients.
The number of client messages sent will be denoted $n_{client}$ . A message is a client message if it is sent out by a client towards the server. The $n_{client}$ refers to the sum of all messages transferred from any clients to the server.
$\char93 clients$ which is the number of clients involved.
In the upcoming discussion we will sometimes need an estimate of how long it takes to verify which transitions are enabled in the Petri-net. Theoretically (as explained in section 3.4.4) this is exponential because one might need to verify all combinations of tokens versus free variables. However, in practice such situations seldom arise. As we have observed experimentally, the verification whether a transition is enabled can be done in . However, in the upcoming discussion we will consider $t_{transition}$ to be the mean time over all Petri-nets to verify whether a transition is enabled. Afterward, we will, depending on whether we want a worst case estimate or an expected estimate, fill in $t_{transition}$ with an appropriate expression.
We also need a description of the memory-requirements of a Petri-net. We will denote this size $m_{server}$ or $m_{client}$ , depending on which Petri-net is involved. Intuitively the size required to represent a Petri-net is at most the size needed by the underlying component itself. For instance, in our case the server Petri-net describing the state of the whiteboard (of size by ) will need $O(x\times y)$ tokens. The server component itself also needs to keep track of the same information. Petri-nets describing more different states than recognized by the involved components are less useful (see Petri-net guidelines, section 3.8).

We will now start with discussing the time- and speed- estimates of the liveness module, the enforce-action module and the concurrency module.

12.6.1 The Liveness Module

Giving a performance estimate of the liveness module is difficult because the performance of this module largely depends on how well trained it is. Initially it might need a large number of messages and rewards to understand which behavior exactly is required by the client component. However, once a correct behavior is installed it will behave more or less constantly, depending on how much exploration is allowed. Here, we will only consider the case where the liveness module exploits correctly learned information.

To give an estimate of the time spent within the liveness module, we observe that the liveness module needs to find out a) which transitions are enabled and b) select from the enabled transitions an appropriate one by means of Q-learning. Step a) can be done in $t_{transition}$ by using the locality property of Petri-nets. Step b) can be done in because the number of active 'added' transitions is kept constant. During execution of the liveness module, non-fit transitions are removed and new transitions are added. The process of removing a transition is and the process of adding a new transition is also . This results in a time estimate of $t_{liveness}=O(t_{transition})$ .

The memory requirements of the liveness module are the memory requirements of the Petri-net describing the state of the client component and the newly added transitions. Because these newly added transition are bounded to a certain maximum, we consider it to be a constant. The size of the Petri-net is bounded by $m_{client}$ . Hence $m_{liveness}=O(m_{client})$ .

Later on, when giving a performance estimate of the entire adaptor it should be taken into account that the concurrency adaptor contains multiple liveness modules. Therefore we will add an extra index to the liveness measure: $m_{liveness_{i}}=O(m_{client_{i}})$ and $t_{liveness_{i}}=O(t_{transition})$

12.6.2 The Enforce Action Module

Giving a performance estimate of the enforce action module is even more difficult. We currently rely on a prolog implementation to decide how to enable a certain action. This makes it difficult to give correct estimations. Potentially both the memory estimate as well as the timing estimate might be exponential in function of the necessary search depth, $O(c_{1}^{searchdepth})$ . So the worst case time estimate is $t_{enforcer}=O(c_{1}^{searchdepth})$ . The worst case memory estimate is $m_{enforcer}=O(m_{server}+c_{2}^{searchdepth})$ because we also need to take into account the memory-requirements of the server side Petri-net.

However, as observed during the experiments the search depth is always very small and most concurrency strategies offer the possibility to reach a certain state quickly. Therefore we will consider a practical time estimate of $t_{enforcer}=O(1)$ . And, similarly, a practical memory estimate of $m_{enforcer}=O(m_{server})$ .

12.6.3 The Concurrency Module

The concurrency module needs to keep track of all the enabled transitions for every client component indirectly connected to it. For every client a set of controlled transitions and managed transitions will be kept in memory. The Petri-net of the server is not necessary within this module. This gives a memory estimate of $m_{concurrency}=\sum_{i=1}^{\char93 clients}O(2.m_{client_{i}})$ . The time spent within the concurrency module is a constant. For every incoming request a cross-table is checked and a decision is taken immediately. This leads to $t_{concurrency}=O(1)$ .

12.6.4 Message Sends

To determine how many messages are sent between the client components and the server component we will consider two cases: functional messages and synchronization messages.

12.6.4.1 Functional (Logic) Messages

Every functional message going from a client toward a server will go through several connections:

the ``client component $\leftrightarrow$ liveness module'' connection
the ``liveness module $\leftrightarrow$ concurrency module'' connection
the ``concurrency module $\leftrightarrow$ enforce action module'' connection
the ``enforce action module $\leftrightarrow$ server component'' connection

Messages coming from the server and going to the client will pass the same stages in reverse order. This means that for functional messages the number of messages equals $4.n_{client}+4.n_{server}$ . These results can be slightly improved by merging the concurrency module and the action-enforce module into one module at the server side. Hence, we could have an estimate of $O(3.n_{server}+3.n_{client})$ .

12.6.4.2 Synchronization Messages

The behavior of synchronization messages is not as simple as the behavior of logic messages. The problem comes from the fact that synchronization messages are not simply passed through from module to module, but are a) altered to represent the possible choices between the liveness module and the concurrency module and b) are not present between the concurrency module and the enforce action module. However, this observation does not prohibit us from making a global observation about the synchronization communication: We know that the synchronization messages introduced by the adaptor is minimal to support all components: the enforce action module will only send out the necessary synchronization messages, while the liveness module will have learned which synchronization messages are favored by the client. From this observation we know that the adaptor will not introduce useless synchronization messages. This already allows us to give an estimate of $O(n_{server}+n_{client})$ . However, to estimate how many extra messages might be added we give a trace of a synchronization request from client to server:

client component $\leftrightarrow$ liveness module: the client component sends a synchronization message to the liveness module: $n_{client}$ .
the liveness module $\leftrightarrow$ concurrency module: communicates the changing state and possible choices to the concurrency module: $n_{client}$ .
concurrency module $\leftrightarrow$ liveness module: in the worst case scenario, the concurrency module must always make a choice and communicate it back to the liveness module: $n_{client}$ . In this performance estimate we will assume that this will only occur when the real server would in practice also return a message, hence: $n_{server}$ .
the liveness module $\leftrightarrow$ client component: the liveness module sends back a synchronization message to the client component: $n_{server}$ .

The communication of messages from the server to the client happens only when a certain functional message arrives at the action enforcer for which synchronization messages are required. Such a message trace looks like:

enforce-action module $\leftrightarrow$ server component: when necessary a message will be sent to bring the server in the required state. So, we can assume that this is correlated with the number of client synchronization messages: $n_{client}$ .
server component $\leftrightarrow$ enforce-action module: this is captured in the $n_{server}$ variable.

This results in a number of messages of $3.n_{server}+3.n_{client}$

From the results of the functional messages and the synchronization messages we can conclude with an estimate of $O(3.n_{server}+3.n_{client})$ .

12.6.5 Overall Performance

From the previous performance estimate we can now summarize the worst case performance:

$\begin{displaymath} n_{adaptor}=O([3.]n_{client}+[3.]n_{server})\end{displaymath}$

$\begin{displaymath} t_{adaptor}=O(c_{1}^{searchdepth})+\char93 clients.O(t_{transition})=O(c_{1}^{searchdepth}+t_{transition})\end{displaymath}$

$\begin{displaymath} m_{adaptor}=O(c_{2}^{searchdepth}+m_{server})+\sum_{i=1}^{\char93 clients}O([4.]m_{client_{i}})\end{displaymath}$

If we consider, as explained before, that the time to verify a transition's state (enabled vs disabled) is constant and we assume that the prolog implementation is able to deduce a correct path in then we can give an expected estimate of:

$\begin{displaymath} t_{adaptor}=O(1)\end{displaymath}$

$\begin{displaymath} m_{adaptor}=O(m_{server})+\sum_{i=1}^{\char93 clients}O([4.]m_{client_{i}})\end{displaymath}$

This means that we expect that the adaptor needs as much memory as required by the different states of all involved components. Also, the time necessary to handle one message is constant.

12.7 Motivation vs Solution

IN THIS SECTION WE WILL DISCUSS our initial motivations and compare them with the solution we have presented. Our main focus here will be to investigate how our prototype could be made to work within the context of open distributed systems, multi agent systems and embedded systems.

12.7.1 Open Distributed Systems

Our adaptor as it is now does not run automatically in open distributed systems. Below we discuss two obstacles still remaining. The first being adaptor deployment (or who decides to run an adaptor) and the second being the problem of meta-conflicts (conflicts between the involved Petri-net descriptions).

12.7.1.1 Adaptor Deployment

A very simple fact about our adaptor is that, before it will be used, somebody has to insert it somewhere. We will now discuss the implications of this.

Typically, the client is the first one to observe any conflicts with the server. This means that the client will be the first to want to introduce an adaptor. This is not necessarily easy because

To insert an adaptor, extra formal documentation, describing the behavior of the client component, is needed. If the software used on the client does not provide such a documentation then it might be difficult to run an adaptor.
Our adaptor needs the possibility to isolate the server from its environment. Hence, the client side cannot solely take the decision to insert the adaptor because it requires cooperation of the server side.

The server side on the other hand is often the last one to observe any conflicts and might not want to introduce an adaptor for a number of reasons:

The server administrator needs to make the decision to insert an adaptor. Technically speaking this doesn't form a problem. However, (s)he might feel the clients should upgrade.
The performance penalty of the adaptor is considered to be too high.
Inserting an adaptor might requires a quality tracking process (how well does it work, are there any performance penalties, does it introduce bugs, is the end-result more stable than the initial situation and so on) which might take too long a time.
The extra formal documentation, describing the behavior of the server, might not be available.
The required extra formal documentation might not be offered by the clients. In such a situation the server needs to 'know' the correct definitions of different clients. This results in conflicts because the server-side interface descriptions can be incompatible with the actual clients.

Some of these problems are relatively subjective and closely linked with the involved business model, however two major observations still remain. Firstly, not everybody might agree to follow our standard of Petri-nets as a formal documentation technique. Secondly, the formal documentation itself might lead to conflicts. We will discuss these two problems below.

12.7.1.2 Why would somebody want to use Petri-nets ?

One of the motivations behind this research was the fact that standards in open distributed systems are sometimes merely used as a means to stay in control of how software is used instead of being a means to communicate with other software. If the intention of a company is to be incompatible to force partners to make certain decisions, then there is nothing that will prohibit such a company from doing so, not even Petri-nets. However, the description of this behavior was only used as a motivational example and not as the problem we wanted to solve.

On the other hand, companies that want to be compatible with other software will find that our proposal of using Petri-nets offers a big advantage that renders their usage worthwhile. The strength of the documentation format comes from automatically checking completeness and consistency. Completeness being that all messages that a component understands and uses are described within the Petri-net. Consistency being that the documentation is compatible with the behavior of the component. Both properties can be checked automatically. In other words, Petri-nets help the developer in the software development process by offering better and more accurate documentation.

12.7.1.3 Meta-Conflicts

**Figure 12.1:** Stack of adaptors
$\includegraphics[% width=1.0\textwidth]{AdaptorStack.eps}$

In section 1.1.1 we argued that defining one standard in open distributed systems is unrealistic because competitors will always try to adopt the standard and unexpected incompatibilities will arise. This problem could also occur between the different formal interface descriptions. This kind of conflicts are meta-conflicts. The problem of meta-conflicts is wide ranged; from slight discrepancies between the Petri-net descriptions to conflicts with entirely new, possible better, formal descriptions.

If the formal interface description are entirely compatible then it might be possible to create a super interface description that can express all details of the different formal descriptions. The only thing that needs to be done then is implementing converters from the different interface description languages to this super interface description language. However, it could also be possible that required information in one interface description language is simply not present in another interface description language. This would complicate this method. However before this kind of meta-conflicts can be investigated, a number of realistic conflicts between interface description languages need to be investigated. Only then can this problem be approached pragmatically. This remains future work.

If there are only slight differences between the Petri-net format used, or between the terms used within the Petri-net then other approaches can help:

meta-adaptors based on Petri-net evaluation. Such a meta-adaptor is pictured in figure 12.1. If we would create an adaptor similar to the one we have created for conflicting concurrency strategies then we need a 'common ground' that can be used to verify the behavior of the different Petri-nets. In our case, the common ground was the compatible functional behavior. In the case of conflicting Petri-nets a common ground could be the fact that a Petri-net can be verified for completeness and consistency. By tracing the incompatible net and recreating a compatible net during such a trace it should be relatively easy to convert an incompatible net to a compatible net. This can be investigated in more detail in future work.
automatic deduction of the states and transitions within a component by inspecting the code and data segments in a typical communication. If we could avoid the need for Petri-nets of an interface description at all, then the problem of meta-conflicts might be avoided. How the interface of a component can be learned is discussed in section 13.8.5.

12.7.2 Mobile Multi Agent System

A second motivation for doing this work was the problem of concurrency management in mobile multi agent systems. Because communication happens in a peer to peer manner between agents, we now discusses how our solution relates to the problems of peer to peer systems. First, we discuss how our adaptor preserves the required behavior if no central locking server is used. Second, if a central locking server is used to coordinate the behavior of a group then we see how our approach fails to offer a solution.

12.7.2.1 The Adaptor Preserves Behavior

In this section we illustrate how our adaptor correctly mediates concurrency problems in a peer to peer application that makes no use of a central coordination server.

**Figure 12.2:** Peer to peer communication between agents in a mobile multi agent system. The arrows indicate which agents require behavior from another agent. The left figure is the normal interconnection without inserting adaptors. The right figure is the same interconnection but with adaptors inserted at the server ports/interfaces.
$\includegraphics[% width=0.35\textwidth]{peer2peerintroductie.eps}$ $\includegraphics[% width=0.49\textwidth]{peer2peerwithadaptors.eps}$

In a mobile multi agent system there is at first sight not really a difference between client-components and server-components. However, if we look at the behavior they require or provide (or in other words, which agents implement a certain functionality), then we can actually make the difference between a client component and a server component. A client agent will expect a certain functionality from another agent, while a server agent will provide a certain functionality. However, contrary to a typical client-server setting, in this situation every agent can be a server as well as a client. To easily support such a model we will assume that the server behavior of such an agent is offered on one port, while the client behavior is offered on another port. This allows us to introduce adaptors on every agent (peer) that offers a service. The resulting interconnections are pictured in figure 12.2.

**Figure 12.3:** Deadlock over the different adaptors in a peer to peer situation.
$\includegraphics[% height=6cm]{peer2peerwithadaptordeadlock.eps}$

Because every agent can be a client as well as a server, it becomes difficult to identify a 'session'. In the typical client-server architecture, the client manages a session and tries to get things done on the server. Within a peer to peer system, sessions hop from one agent to another and cannot be localized anymore in one agent. This might lead to deadlocks between communicating peers, without the possibility for the adaptors to do something about it. This is illustrated in figure 12.3. The adaptors cannot resolve such a deadlock because they are part of the waiting loop. However, this problem has little to do with the ability of the adaptor to 'mediate' the conflicting behavior. In fact, if the application is programmed in such a way that it relies on concurrency strategies that lead to unwanted behavior then this unwanted behavior will also occur with adaptors, otherwise it will not.

12.7.2.2 The Adaptor does not Preserve Coordination

In a peer to peer application there can also situations where our adaptor fails to work. The example we present to illustrate this uses a central locking server that is used by all components to identify sessions and control access to the application's resources.

**Figure 12.4:** Distributed Transaction server.
$\includegraphics[% height=6cm]{DistributedTxServer.eps}$

In peer to peer systems it is often necessary to bring multiple peers from one consistent state to another consistent state. This however requires from every peer the ability to obtain a number of locks, distributed over different peers. Once all the locks are held the session can apply the changes on the involved resources and unlock them again. Often this is done by passing session id's from one peer to another such that receiving peers can verify whether the resources they hold can be accessed by the incoming session. However, the session id's themselves are often provided by other components than the receiving peer itself. For instance, consider a transaction server such as pictured in figure 12.4. The transaction server will be used by any peer that needs a transaction id (this will be our session id). This transaction id will then be passed from peer to peer (without going through the transaction component).

In this setup, multiple concurrency strategies can be present. First there are the concurrency strategies offered by one peer towards another peer, secondly there are the concurrency strategies offered from one peer toward the transaction server and thirdly there is the concurrency strategy offered from the transaction server toward the peers. We will assume that we have mediated the conflicts by placing at every server interface a concurrency adaptor. This might lead to problems:

Our adaptor requires the availability of a compatible functional behavior. This poses a problem because the only functionality our central transaction component offers is about synchronization. If the transaction component is hardwired with respect to the resources then we can use the common set of resources to be the functional link between them. However, if the transaction component uses first class resources: i.e. resources that are published by clients and recognized by the central server, then it might be very difficult to find any useful common behavior.
If peers that have no notion of session id's are pulled into the application then our adaptor will not provide them with this notion. This means that their resources will be freely available to everybody wanting to lock them, effectively destroying the use of the central transaction server. So, if we allow components to join in an 'open' way then this kind of conflict cannot be mediated.
Even if peers have a notion of session id's then we still have a problem because the application is using first class session id's and the information about such a session is localized at one central place, while in fact all the concurrency adaptors on the peers need to know about these session id's. E.g: if agent E requests a session id from the transaction server, then the Petri-net in the liveness module connected to agent E, will recognize the session id to be owned by agent E. However, if this session id is passed to agent A, then this agent will be completely lost, because the Petri-net in the liveness module of agent A, connected to agent E does not recognize this session id.
If we would be able to resolve these problems, (by not using first class session id's or first class resources) another problem might occur: synchronization races due to hidden communication. Because our adaptor might delay the passing through of a synchronization request until really necessary, a resource might not yet be locked if the agent wants to access it through another channel. Even worse, this resource might never become locked if the agent doesn't access it directly on the transaction component.

From these examples we see that a) first class resources, b) first class sessions, c) hidden communication and d) the possibility that the concurrency strategies simply cannot be mediated, still form a major obstacle. Investigating how these problems can be solved remains future work.

12.7.3 Embedded Systems

One of the motivations behind this dissertation were conflicts between connected embedded systems. The rise of smaller and mobile embedded devices increases the chance of encountering interface conflicts. In this section we investigate how suitable our solution would be for this kind of domain. First we must note that the biggest difference between desktop software and embedded software is the constraints put on it. Among others, there can be memory-constraints, speed constraints, bandwidth/latency constraints and real time constraints. Secondly, aside from these constraints, there is also an entire process involved in creating embedded systems. Typically first a prototype is developed which is, as the project continues, scaled down into a much smaller artifact. During the entire process, quality control is of utmost importance, because, especially for consumer devices, it is not always possible to fix a bug once the device has been sold [SEE99]. From a high level point of view this provides an opportunity for our adaptor because it decreases the likelihood of bugs introduced due to concurrency strategy conflicts. However, as embedded systems are developed pragmatically, we will now discuss how it would be possible to reduce the requirements of our adaptor in such a way that it could fit within certain constraints.

12.7.3.1 Memory constraints

Very often embedded systems are limited in their memory. There are two approaches that might help in reducing the memory requirements of the adaptor:

Given the performance estimation earlier

$\begin{displaymath} m_{adaptor}=O(m_{server})+\sum_{i=1}^{\char93 clients}O(4[.]m_{client_{i}})\end{displaymath}$

we see that the adaptor requires 4 times as much memory as for every Petri-net describing the behavior of a client. For desktop software this doesn't pose much problems, however for embedded systems it would be nice to be able to reduce this number. Collapsing the different modules into one might help, not because it reduces the amount of required 'information', but because the overhead of storing this information can be reduced. For instance it is possible to store the information on whether a resource is controlled and/or managed into the transitions of the Petri-nets themselves.
A second important place where storage can be optimized is our prolog module. The prolog interpreter is needed to determine how we can force the server to be in a certain state. To resolve this problem we can either try to remove it or try to make better use of the interpreter.
1. Removing the prolog interpreter: this could be done by integrating the adaptor more tightly with the server side component. By removing the concurrency strategy at the server component we can make progress on two ends. Firstly, removing the server side concurrency strategy reduces memory-requirements with respect to data space (to store locking information) and code space (to implement the concurrency strategy). Secondly, not implementing a concurrency strategy at the server component removes the need for an enforce-action module.
2. Better usage: If it is not possible to remove the concurrency strategy at the server (because the embedded system is not under control of the one inserting the adaptor) then we can make better use of the interpreter by implementing the other modules (the liveness and concurrency module) also as prolog programs. Making use of an interpreter in itself would also drastically reduce the memory-requirements and might even be better in comparison to compiling the adaptor. However, this could also introduce a speed penalty. So, it is the standard tradeoff between speed vs place requirements.

12.7.3.2 Bandwidth Constraints

Certain embedded systems have low bandwidth constraints, which means that they can only send and/or receive a small amount of bytes per second. As stated in section 12.6.4, our current implementation of the concurrency adaptor requires 3 times more messages than there are messages posted by the client and server component together. This could weigh heavily in a low-bandwidth environment. Therefore we now investigate how we could reduce the bandwidth used by the adaptor by decreasing the amount of extra message sends between the modules or by making the penalty of doing so less severe.

reducing the amount of extra messages can be done straightforwardly by combining all modules in one embedded system such that there is no extra communication necessary between the modules. It should also be noted that by decreasing the number of messages that need to be sent, we might not only optimize bandwidth, but also the latency of client/server communication and the internal speed of the adaptor (as explained in the next section).
decreasing the penalty: if it a) is possible to modify the server such that it doesn't offer any concurrency strategy or b) we can place the enforce-action module on the server-side component, then it is possible to decrease communication towards the server because no synchronization messages ought to be posted between the concurrency module and the server-side. The only remaining messages would be functional (logic) messages.

12.7.3.3 Speed constraints

There are two distinct approaches one can use to make the adaptor more efficient with respect to speed:

One of the biggest bottlenecks in the adaptor is its modularized nature. This gives rise to sending three times as many messages as there are messages being communicated. By demodularizing the adaptor and allowing the modules to work together in one shared data-space it is possible to increase speed substantially. Instead of copying the messages, pointers to the messages could be passed. Actually, by doing so it could be possible to have no overhead at all in 'extra' communication'.
As described in section 3.4.4, Petri-nets can be implemented in hardware very efficiently (data flow machines). This could make it possible to reduce the time necessary to verify whether a transition is enabled, $t_{transition}$ , with a certain factor. It should be noted that such a hardware implementation will only provide a marginal improvement and not solve the combinatorial problem of searching for a matching set of tokens. Still, for embedded systems, a speed increase with a certain factor is always welcome.
Another possibility to increase the speed of an adaptor is by parallelizing the process. By implementing every module within a separate piece of silicon it is possible to increase speed linearly.

12.7.3.4 Real time constraints

Real time constraints often require a much better quality control process [SEE99]. This process should be extended to our adaptor as well.

Embedded systems requiring real time constraints should not rely on 'learning' approaches, because these might behave indeterministically. Nevertheless, if such a system would be build anyway, then the deployment phase should be carefully investigated. The learning algorithm should, during an off-line training-phase, be allowed to learn but when placed in an on-line setting it should no longer learn further.
Systems constrained by hard runtime deadlines should not rely on synchronization operations that might block the system for an unknown amount of time. Depending on the behavior of various clients it might be possible that the concurrency adaptor decides to set certain, possible important, real time events in wait. In case this should be unavoidable a formal analysis of the entire software within the embedded system should be made. This should of course be done after the system has been scaled down from a prototype to an optimized artifact.

In this section we have discussed how our adaptor, as it is, could be used within embedded systems, how it can be scaled down to fit extra non functional requirements.

Werner 2004-03-07