Subsections

13. Conclusions

**Figure 13.1:** High level module overview
$\includegraphics[% width=1.0\textwidth]{ConclusionOverview.eps}$

IN THIS CHAPTER we first summarize our work. Then we will elaborate on our scientific contributions as well as on the applicability of the thesis. We point out the limitations of our work, give guidelines how an adaptor for other conflict domains might be created and finish the dissertation by pointing out possible future work.

13.1 Technical Summary

BASED ON THE CASE STUDY of conflicting concurrency strategies, we have shown how intelligent adaptors can be created automatically. We have illustrated this by creating such an intelligent adaptor as a sequence of three modules, where every module has different responsibilities.

The liveness module learns how to keep the components behind the concurrency strategy alive. For the concurrency adaptor as a whole, such a module is necessary to offer to the underlying components an optimal feedback behavior, as opposed to for example, returning a behavior that leads to endless live-locks such as lock-lockfalse-lock-lockfalse ad infinitum. Because it is still unknown whether liveness is a decidable property of Petri-nets, we have decided to use a reinforcement learning algorithm (Q-learning and an $\epsilon$ -greedy strategy) to approach the problem. Feedback for the learning algorithm comes from the underlying component by means of statically placed checkpoints. Every time a checkpoint is reached a reward is sent to the adaptor and the learner can, when necessary, modify its future behavior. The reinforcement learning algorithm itself has been very efficiently mapped onto the structure of colored Petri-nets, in such a way that storage requirements are minimal, without losing essential information. This optimization is mainly based on a) the explicit state information of the Petri-net already present and b) the creation of a system that can recognize certain situations and knows how to handle them. Our situation-recognition system is expressed as a Petri-net. During execution, new transitions are constantly added to the Petri-net. The learning algorithm automatically explores the new avenues offered by these transitions and, when suitable, these new transitions are kept alive and the new situation is remembered. The reason why we use Petri-net transitions as situation handlers has been explained thoroughly, and possible alternative representations have been investigated. To verify the suitability of our representation we used genetic algorithms.

The concurrency module of our concurrency adaptor inserts an appropriate concurrency strategy. To do so, it communicates with the two other modules about resources (for instance the possible squares in a whiteboard) and actions (for instance a setPosition). There is no explicit annotation of resources and actions on resources, both are implicitly represented within the Petri-net. This information is automatically deduced by the first module under the form of enabled transitions and equivalent states. This, together with extra information of the liveness module that describes which futures are possible and which futures the component would like, enables this module to make correct decisions about what to do. Once such a decision is taken the client component can perform its actions on the server component.

The enforce-action module bypasses the concurrency strategy of a component entirely. This allows for more freedom in the two other modules. The enforce-action module does this by offering a logic interface to the outside world while inserting the necessary synchronization messages at the appropriate times. To do so, the module needs to know how certain markings within the underlying Petri-net can be reached. In this dissertation we have achieved this by means of a reachability analysis in prolog.

In these three modules, the adaptor needs a) to analyze how to enable certain actions, b) to know how to reach a certain state and c) to know which resources are involved. In order to be able to know this the adaptor requires a formal documentation in the form of Petri-nets. Every component needs to provide a Petri-net describing its own required or provided concurrency strategy in a consistent and complete way. Consistent means that any action that is allowed by the Petri-net should be accepted by the component and complete means that no action that is allowed upon the component is undocumented.

13.2 Thesis Statement

WE STATED that in open distributed systems, one needs intelligent adaptors to mediate interface conflicts. We identified three reasons why one might need an intelligent adaptor: a) in open systems, it is very unlikely that one standard will be used by everybody, b) the highly volatile nature of peer to peer computational networks gives rise to interface conflicts and c) with the current advent of interconnected embedded systems and ambient intelligence the problem of conflicting interfaces will become even more prominent. We claimed that it is possible to create, for certain categories of concurrency interface conflicts, a concurrency-adaptor that automatically learns how to resolve the conflict in such a way that a) the required behavior of the involved components can be executed over the adapted interconnection and b) it is able to work on-line in certain open system. We have validated this by constructing such a concurrency-adaptor. Below we discuss the different requirements and how we have satisfied them.

To show the necessity for such adaptors, we have introduced our real-world case of conflicting concurrency strategies. We have investigated different concurrency strategies and explained why components in open distributed systems need a concurrency interface. Relying on this analysis we discussed the problem: not everybody will write the same concurrency interface for his components and thus concurrency interface conflicts will arise. We identified and explored in much detail a large number of practical conflicts. More specifically, we investigated conflicts that can easily occur when one or both of the concurrency interfaces is upgraded without prior notification and without full backward compatibility.

To show that we can create such an intelligent adaptor we have constructively developed one. The adaptor itself needs detailed information of the provided and required interfaces of all participating components. The adaptor works by means of three modules. It contains two modules that essentially bypass the provided or required interface and one module that actually implements a suitable concurrency strategy. To bypass a concurrency strategy we investigated two techniques. The first technique was a reachability analysis of a Petri-net, the second technique was a reinforcement learning algorithm. Finally we explain how resources and actions can be detected automatically within a Petri-net. This chain of three modules constitutes an intelligent adaptor that is able to set up a link between conflicting concurrency interfaces, for which the adaptor does not know the exact interface beforehand. The adaptor is restricted to client server architectures. In section 12.7.2 we have explained in detail what can go wrong with peer to peer applications. Essentially, the adaptor does not work in open peer to peer systems because not all communication between different parts of the application can be mediated by one adaptor.

13.2.1 Abstract Requirements

We now reconsider the abstract requirements from the introductory section.

13.2.1.1 The adaptor should be able to work on-line

This requirement was necessary to make the adaptor useful in an open system. The performance estimates as well as the setup of the adaptor enables it to work on-line. The liveness module is an on-line prolog program that can deduce how a certain state can be enforced upon the server, without needing to 'test' the server's behavior. The concurrency module is a program that can be executed on-line and the liveness module has been specifically designed, by selecting Q-learning as a learning technique, to work on-line.

13.2.1.2 The adaptor should work by only modifying the message flow between the involved components.

The adaptor is a central component that is placed solely on the connection between different components. Hence the only modifications it can introduce is on the messages between the different components. Therefore, this requirement has been satisfied.

13.2.1.3 The adaptor should resolve conflicts

The most important abstract requirement of an intelligent adaptor is based on the notion of an overall required behavior. In our case study the overall required behavior between the different conflicting concurrency strategies was to avoid actors crossing each others boundaries. As we have experimentally observed, our adaptor supports this implicit requirement. Two cases can occur:

all the components agree on an overall behavior and offer/require a concurrency strategy.
some components do not offer a concurrency strategy

In the cases where there is a full agreement, we have experimentally observed that our adaptor works. This has been described in chapter 12. If there is no full agreement on the required behavior then the adaptor should try as hard as possible to satisfy as much as possible of the required behavior. This requirement is satisfied because our concurrency module allows the presence of an actor that does not manage certain resources. In such a case these actors will be set 'on hold' until a suitable moment arises to realize the required changes.

13.3 Contributions

THE MAIN CONTRIBUTION OF THIS THESIS is the process we have been following to create an adaptor. Instead of trying out one technique and verifying whether it works or not, we have kept on searching to find a (set of) suitable technique(s). To reduce the complexity of the problem we have restricted ourselves to the case of concurrency conflicts, which we have expressed in an event based model. When necessary we have invented new techniques (The extremely useful Petri-net documentation for describing interfaces and a new technique to verify the bias of computer generated programs), or combined existing techniques (Petri-nets & Reinforcement learning, the three different modules that constitute an adaptor). This has lead to the creation of a concurrency adaptor that mediates differences between conflicting concurrency strategies. Below we summarize the contributions.

Open systems need concurrency interfaces: We showed that the components of an open distributed application need concurrency interfaces. The reason is that one cannot foresee how a component in an open distributed system will be used, therefore one cannot know which actions are supposed to be atomic, and therefore every component should allow other components to specify a critical section. Hence the components themselves need to offer a concurrency strategy. We have illustrated this extensively by means of a whiteboard.
Open systems require concurrency adaptors: We have shown that components within an open distributed application can offer all kinds of concurrency strategies and that it is very likely that concurrency interface conflicts will arise between different components. We have extensively discussed a set of real-world concurrency interface conflicts and we used these conflicts to validate our claim.
Petri-nets as a formalisms for interface description: We showed how Petri-nets can be used to describe the behavior of an interface in a formal way. This formal interface description is useful because it can always be synchronized with the source-code, as consistency and completeness can be checked in an automatic way.
A concurrency adaptor: We showed how an intelligent protocol adaptor can be constructed in order to resolve incompatibilities between communicating components. Such an approach is indispensable to cope with the combinatorial explosion of protocol adaptors that are needed in an open distributed setting, when dealing with interconnected embedded devices or when writing components. In all these cases components interact with other components in unpredictable ways.
Automatic bias verification: We showed how genetic algorithms can be used to verify a representation instead of providing a solution. We consider that if a certain problem can be solved easily under a certain representation, it is a good representation. If the same genetic algorithm has more trouble finding a solution the representation might not be so suitable.
Unification of Petri-nets and reinforcement learning: We showed how reinforcement learning can be unified with colored Petri-nets in an efficient way by exploiting the structural properties of the colored nets.
Component based software engineering allows for easy adaptor creation: We showed how an event based model allows for easy adaptor creation if it is used in a disciplined way, that is, if it is purely event based and no compromises are made to integrate with other models, such as a thread based models, and that no communication happens outside the system.
Guidelines for developing interface adaptors: This dissertation can be used as a platform that offers recommendations and guidelines for the development of conflict adaptors for other domains.

13.4 Limitations

IN THIS SECTION WE ELABORATE ON SOME LIMITATIONS OF THE PRESENTED APPROACH:

Only conflicting concurrency strategies: Conflicting interfaces represent a major problem that has to be dealt with. To approach this problem we have investigated an accessible but smaller problem of concurrency conflicts. This limits the immediate applicability of our technique. Nevertheless,
1. the case itself is chosen such that its correct working cannot be measured easily at runtime. E.g; How can one measure a race-condition ? This clearly depends on what the components consider to be a valid state. By using such an interesting requirement, we have investigated an interface conflict that cannot be automatically deduced from a state-machine (such as done in [Wyd01]). Also, another requirement of our concurrency conflicts was the fact that the adaptor should keep the components alive. This also made it difficult to use one standard technique to solve the problem. So, our case is indeed a small subset of all possible conflicting interfaces, nevertheless it is certainly not a trivial case.
2. in the domain of open distributed systems this work is an important contribution because the problem of conflicting interfaces is often overlooked and because our work allows for a decoupling of the required and provided concurrency interfaces. If we would want to extend this research to other application domains, then a substantial part of it should be redone. For instance, it might no longer be feasible to use Petri-nets, or to use a liveness module and/or enforce action module. Later on we will discuss a case in which conflicting socket libraries are investigated.
Well-working functional link: An important limitation of our work is the fact that we rely on the availability of an existing, well-working link between the conflicting components. More specifically, we can only generate an adaptor on non-functional requirements between two interfaces if the core functionality of both components is compatible. Such an assumption is not unexpected because it helps in creating a predictable environment that allows for verification of the generated adaptor. E.g, a whiteboard offering set_position (with underscores), while a client expect SetPosition (without underscores) are clearly in conflict. Because this is a conflict on the logic interface between both agents, our adaptor will not be able to mediate such a conflict. Some approaches might help to solve this kind of conflicts
1. Instead of relying on a correct functional link, we could probably also rely on another reference frame for which the correct working of the adaptor could be measured. For instance, a requirement such as 'make sure that both partners draw a moving actor on the whiteboard' could result in a guiding principle for an off-line learning algorithm. Later on, we will give an example of conflicting socket libraries to illustrate this.
2. Similarly, other research requires the availability of a placeholder that describes the required interaction [Wyd01]. This research however is very limited to the requirements it can describe.
3. Depending on the application it might be sufficient to just rely on the commonalities between the expected and provided state of the involved components instead of relying on the common actions.
A Controlled Case: We have only been using one case, a whiteboard, to illustrate concurrency strategy conflicts. We did not investigate other examples. Nevertheless, this whiteboard case has been explicitly created to illustrate practical problems. With it, we were able to illustrate race-conditions, deadlocks, livelocks and transactions in such a way that the nature of open distributed systems was still preserved. From an academic point of view, this case describes everything one might need in open systems. From an industry point of view, a lot of problems have been avoided. For instance, in the 'real world' not everybody agrees to follow an event based model. Also not everybody will agree to write Petri-nets, some might even disagree with the fact that a concurrency strategy is necessary. However, instead of seeing such a controlled case as a weakness, we would rather like to see it as one of the strengths of this dissertation. By using a simple model, we were able to describe all necessary issues, from simple locks to transactions in such a way that a) their need was clearly demonstrated b) many of the concepts, necessary in the real world are present (resources, actions, processes,....) and c) the common problem, recognized by different solution providers are present (livelocks, deadlocks, transactions,...).
Component based development: We have limited our work to conflicts between components in a component based system. We did not investigate how our work can be unified with models that have a concept of shared memory. Neither did we investigate how our results can be mapped on thread based models. The main reason for doing so is that in an experimental model it is impossible to take into account all possible technical details. Appendix 13.8.6 covers a detailed explanation of what kind of problems we have avoided by doing so.
Only one component system: In this dissertation we have limited our research of adaptor creation to one component system, which we created ourselves. This was mainly due to practical reasons. The project that financed my research, the SEESCOA project, needed a definition of components and an implementation of a component system in Java. It was only out of practical considerations that we have used this component system because it allowed us to use it actively on more than one front. First, in the test-case for the SEESCOA project and secondly as a means to carry out our own research. By doing so, we were able to deliver a reusable and high quality component system. A second reason why we favored the SEESCOA component system is because it has a mature design. After creating components (agents) in the Borg mobile multi agent system, some design flaws came up, that were not easy to fix. These design flaws have been corrected in our current implementation of the component system. Aside from practical considerations, the component system itself is also representative for open distributed systems, applications within open distributed systems and can be considered as a common denominator of existing component systems.
Technical limitations: The adaptor created in this dissertation works when the involved conflicts arise from syntactical conflicts, the ordering of messages, re-entrance conflicts, conflicts between static resources and conflicts on the layering of concurrency strategies. However, this certainly does not cover all conflicts that might arise. For instance, our adaptor fails to work when:
1. First class resources and/or sessions are present and used as discussed in section 12.7.2.
2. There is hidden communication between peers. Because our adaptor does not see this communication it cannot adapt it. (section 12.7.2)
3. There are non-sequence requirements. These are requirements that have nothing to do with the sequencing of messages. Such requirements can be
  1. timing: For instance, a requirement such as 'every 2 seconds an alive notification should be sent'
  2. memory: For instance 'when handling this message, the memory should remain below ...'
  3. state information: For instance, ``after a rollback the component should be in such a state''. In our case this can be relatively easily added by using the places of the Petri-net.

13.5 Application Scope of our Adaptor

IN THIS SECTION WE discuss the application scope of our adaptor, first, without giving examples of such applications. We discuss why completely open systems are not feasible and why our adaptor is useful even in situations where exact requirements on the interface, but also on the usage of that interface, are necessary. Afterward we give some immediate practical applications of our adaptor.

13.5.1 Truly Open Systems: Feasible ?

Some conflicts are impossible to mediate. In section 6.1.2 we have given examples of such conflicts. A notable example of such a conflict was our line example. In this example we assumed that the line actor would always have a lock on a square somewhere on the whiteboard. Another client required the possibility to lock the entire whiteboard. If those two actors were to work together on the same whiteboard then the second actor (which wants to lock the entire field), would never have a chance to do something because it would never be able to lock the entire whiteboard (simply because the line actor will always keep one square locked). This conflict is notable because it demonstrates that even realistic locking strategies can easily lead to impossible to solve concurrency conflicts. In this example the conflict arises from a subtle difference within the usage scenario involved.

This indicates that truly open systems, such as advertised by the mobile multi agent scene, might be impossible to build: a small change to a component or to one of the involved usage scenarios might transform a 'possible to mediate conflict' to an 'impossible to mediate conflict'. For open systems this means that probably the only way these systems can work is if everybody agrees on exactly the same specification. This specification should be exact (formal) with respect to what kind of usage scenarios and what kind of interfaces are allowed. This is clearly not realistic for open systems.

13.5.2 Then: Why Do We Need Adaptors ?

**Figure 13.2:** Interface boundaries and usage boundaries. Top of picture: small interface, limited usage. Middle: small interface, large usage range. Bottom: large interface, large usage range.
$\includegraphics[% width=0.80\textwidth]{Coupling.eps}$

The statement 'open systems require an agreement between all partners to follow exactly the same specification' might sound contradictory with our initial motivations. However an exact specification does not prohibit from allowing certain degrees of freedom. An adaptor might shift the problem of being compatible with a very narrow interface description to being compatible with a wide range of allowed interfaces (within certain degrees of freedom). We explain this by elaborating on the different possibilities (all pictured in figure 13.2)

When the interfaces and the usage are narrowly specified, then we have an interface that is barely usable by other components[Ore98]. If somebody wants to use the component in a slightly different situation (a situation with other usage scenarios) it will no longer be compatible with the exact specification of the usage scenario. A component with such a strict requirement is very limited with respect to the client components it can serve.
When the interfaces are narrowly specified, but the component usage is wider ranged, then possible client components have, depending on their goals, more freedom in using the provided interface. However, this poses a problem because certain usage scenarios are not necessarily easy to implement over the narrow interface. E.g, implementing a field-lock over an interface that offers the locking of squares. If we could, in such situations, allow other, slightly different, interfaces to access the same functionality then the development of open systems would become easier.
When interfaces are specified as wider ranged interface-descriptions (instead of exact syntactical descriptions) and the usage scenarios are wider ranged as well, then one component can use another as it sees fit. E.g.: one component specifies a reentrant locking strategy, while another specifies a non-reentrant locking strategy. However, before such wide ranged interface can communicate some conversion needs to take place. This is possible if we are able to mediate the differences between the different interfaces. This is where our adaptor comes in because it enables such scenarios.

To summarize, within certain boundaries, an automatic adaptor is useful because it allows any component a choice in the interface it provides or requires. This can have a major impact on the way components are developed as we will explain below. Also, it essentially offers a decoupling of the required interface from the provided interface.

13.5.3 Impact on Component Development

**Figure 13.3:** Demonstration how a Petri-net can be written that will immediately lock all fields necessary for the line and its trail.
$\includegraphics[% height=8cm]{CPN-LineAndTrail.eps}$

We initially started our investigation trying to foresee what the Internet would be in the future. We tried to foresee what kind of problems will arise. We concluded that conflicting concurrency strategies would become a large problem. In the approach we have presented we tackle this problem by assuming that every component will offer a concurrency strategy and that these will need to be mediated. In doing so, the problem of 'writing correct adaptors' is shifted to the problem of 'specifying the required/provided concurrency behavior precisely'. The result of this approach is that the developer does not have to implement the adaptors directly, but that he instead has the responsibility of writing a complete and consistent Petri-net. The developer can use such a Petri-net to write down a strategy specifically suited for the problem at hand. Instead of sticking to one of the standard concurrency approaches he can now easily specify which resources need to be locked simultaneously. Depending on the required usage scenario it might be possible to require (or provide) better suited interfaces for the needed functionality. For instance it is possible to write a client component that specifies that it wants to lock an entire line and its trail immediately. This is pictured in figure 13.3 and would make writing our line actor more easy. On top of this is the fact that this Petri-net exactly specifies what is needed and thus makes mediating conflicts involving this concurrency strategy even more easy.

13.5.4 Decoupling Client / Server

**Figure 13.4:** Using a concurrency server to which components can send a concurrency representation agent.
$\includegraphics[% width=1.0\textwidth]{ConcurrencyServerIdea.eps}$

If we follow this line of thought we end up with concurrency servers on which different components can publish the resources they have and the behavior they need. This might become even more interesting if, instead of supplying such a server with a Petri-net of the behavior, we could supply such a server with an actual program, that represents our original component. This program can express even more behavioral information than our Petri-net descriptions. In these cases mobility of components would be very useful. This is pictured in figure 13.4.

13.6 Some Practical Applications

Below, we describe some applications or application domains where our concurrency adaptor, as presented here, is useful. We first give a number of examples where the adaptor is useful in situations where the interface not necessarily changes. The applications we have been looking for were constrained by a number of requirements

The application should have a client-server architecture, otherwise our adaptor is not immediately applicable.
A concurrency strategy is necessary to guarantee a correct overall behavior
the resources should be statically defined.
when possible, the interface should change often, because this will naturally lead to numerous conflicts. However, this is not strictly necessary to make our adaptor useful.
the system must be event-based, or expressible within an event-based system.

Afterward we give an example of an interface with frequently changing semantics in which this research can provide an added value, without relying explicitly on the previous requirements.

13.6.1 Collaborative Computer Supported Work

A computer collaborative supported work environment often contacts a central server that offers some kind of whiteboard (in the broad sense of the word). Users can join such a whiteboard and through it communicate with each other, while still following some rules. In this case, the client program offers a user interface to the end-user, while the central server stores and manages the data. In such a multi-user environment it is necessary to lock operations on the whiteboard. Depending on the kind of locking strategy required by the client an adaptor might be necessary. This is a case where locking is necessary, the resource can be described statically, the clients might require a different concurrency strategy and it can be considered as an event based system. Therefore, our adaptor is useful in this context.

13.6.2 Databases Access

Databases are another area where our work might be useful. Databases are often centrally placed servers that allow different clients to update records. To guarantee that this happens in a consistent way, a locking interface needs to be present. If such a database would, together with the schema's of the different tables, also supply a Petri-net that describes the offered concurrency strategies, then it might be easier for clients to access this data because they in turn can define a more suitable required concurrency strategy. This can make implementing clients for databases much easier.

13.6.3 Frequently Changing interfaces: TCP/IP Sockets

Table 13.1: Some important, barely documented, differences between Linux sockets and windows sockets.

Linux

Windows '98
Operation

We now present an example of conflicting interfaces within a domain other than concurrency interfaces: socket interfaces. Later on we will use this example as an illustration of how (part of) the techniques used in this dissertation can be used to implement adaptors in other domains.

When we were developing Borg, the biggest problem of creating a working version for all platforms was, contrary to what might be expected, not the user interface, but rather the socket libraries. We have developed Borg for PalmOS, Windows, Linux and Macintosh. Every operating system offers its own version of a socket library. This has posed major problems because not every socket library is as easy to use as any other socket library. Table 13.1 contains the most important differences between Linux and windows.^13.1 Between the two presented API's some important differences exist:

Syntactical differences on multiple levels. The error constants under windows are all merged into h_errno, while under Linux, the error constants are separated in two variables, h_errno and errno. Another syntactical conflict can be found when marking a socket non-blocking.
Initialization of the library is different. Linux doesn't require initialization, Windows does.
Windows will not signal when data is received, hence the application needs either to poll constantly the sockets it manages, or it should start a new thread. Linux will signal when data might be available, hence not requiring a new thread or polling loop. This is a conflict that is not easy to solve in a platform independent way.
Between different Linux versions other conflicts might arise. In the kernel version 2.2 series, circular sends would not deadlock but return an error, while in the 2.4 series such a circular send will wait and deadlock. Depending on the usage scenario this might form a problem.

As illustrated by this example, interfaces that often get a new implementation, so called hotspots, will automatically give rise to differences between implementors. This often leads to conflicts. In this example the availability of Petri-nets to describe the semantics of the different functions would have been extremely helpful as well as the possibility to mediate the differences automatically. When we present the guidelines, we will investigate this example in more detail and give a sketch how to approach the problem of conflicting socket interfaces.

13.7 Guidelines

CONFLICTING INTERFACES REPRESENT A major problem that has to be dealt with. To approach this problem we have investigated a smaller, but more accessible problem. The solution presented in this dissertation is not a general solution in the sense that it can be straightforwardly applied to other domains. Nevertheless, in identifying solutions to a subset of a smaller problem, we have tried to keep both the solution and the subset of problems as general as possible. This allows us to identify guidelines which can help shaping new solutions to other problems of conflicting interfaces. We will now discuss these guidelines and explain them by giving an example of how the previous identified problem of conflicting socket libraries might be solved. This example will be presented in another font.

13.7.1 Explore the Environment

The first thing that needs to be done is to explore the environment. The best way to get grip on a certain domain is to explore existing real world conflicts and investigate what exactly causes the conflicts. In this dissertation we have done this by identifying a set of concurrency conflicts (chapter 6). The process of exploring the environment should result in a set of variabilities (that need not to be strictly orthogonal). In a later stage the variabilities are used as input for defining the requirements and possible solutions.

Once this is done, we should check whether it is necessary to represent these conflicts in a scaled down model. This might be necessary to reduce the technical problems involved and/or to speed up the process of testing conflicts (the prototyping cycle). In this dissertation we have created our own component model instead of using standard CORBA services or other available techniques (chapter 2).

Important questions:

what kind of conflicts do we want to investigate ?
what are the variabilities ?
what kind of test conflicts will be investigated ?
what kind of prototyping model will be used ?

Example: Our socket case has already been described in section 13.6.3. We have identified a number of variabilities such as

blocking vs non-blocking calls: a call can be either blocking or non-blocking. A non-blocking call returns immediately, while a blocking call will wait until the kernel has finished doing his job.
signaling vs polling vs waiting: will the kernel inform us when something interesting (like an incoming packet) has occurred or do we need to poll/wait constantly?
syntactical conflicts: how are the different calls named ? Are the same constants available, what is their meaning ?
initialization of the libraries, opening of sockets and shutting down.
timing of different calls. How long does it take before a timeout occurs ?

To identify a set of conflicts we can rely on a number of different applications ported to different platforms and kernels, such as we have done with the Borg mobile multi agent platform. In this case we don't need an actual model, however in defining the environment we can decide to start using a cross-compiler because this makes it easy to quickly test different programs.

13.7.2 Goal & Reference Frame

Once the environment is explored, it is time to look for a suitable reference frame. A reference frame is part of the environment and is common to all involved conflicts. The reference frame should be chosen very carefully because, later on, it should allow for a measurement of the correct working of an adaptor. In this dissertation, we have used as a reference frame the assumption that the 'functional link' between the involved components was compatible (chapter 7). If we would have chosen a reference frame such as 'every component can send a message', then we would not have been able to define what a good working adaptor was exactly. Therefore, when choosing a reference frame, it is important to find good answers for the following questions:

What is the goal of the adaptor ?
Where do you want to place the adaptor ?
What are the commonalities ?
Which commonalities are useful ?

Example: In our socket case, we try to make user applications, written for certain socket implementations, compatible with other socket implementations, without modifications to either the application or the kernel/library source code. We do this by inserting an adaptor between the application and the used library. Towards the application our adaptor will offer all the necessary socket primitives and from the underlying kernel/library, it will use the offered socket primitives. Our socket case has a number of commonalities over different implementations.

Everything is about communication between to computers. Clearly our adaptor works if it is able to communicate with another program that makes use of any other socket library.
Everything is also about what kind of data is sent out over/received from, for instance, an Ethernet cable. If we could compare the data one socket library puts on the cable with the data another one wants to put on the cable then we have a common ground.

From these commonalities we will use the last one as reference frame because the protocol is the largest common denominator between different socket implementations and because it does not require a second computer to agree to the protocol you are using.

13.7.3 Identify Necessary Extra Information

Once a) the goal of the adaptor and b) a usable reference frame have been identified, it becomes necessary to explicitly state the requirements of our adaptor. Before we can state the requirements more formally, it is often necessary to introduce extra information. By analyzing the variabilities and the problems of expressing a requirement, it becomes clear what kind of extra information is needed. In a later stage we will need this information to create the requirements for the adaptor.

Some standard questions for every variability:

can we express it as a requirement ?
if so, how easy-to-check can we make this requirement ?
if not, what kind of information is missing ?
can this information be obtained automatically ?

Example: We go back to the variabilities we have identified earlier.

Clearly to be able to handle syntactical conflicts, blocking vs non-blocking conflicts, signaling vs waiting vs polling conflicts, we need the ability to alter the event flow between libraries. This is only possible if we at least know what kind of event flow is requested and/or provided.
A second important issue is timing. Socket libraries often rely on timeouts on all kinds of operations (sends, receives, connection acceptance and so on). To be able to react to such requirements it is necessary to understand them. Hence, the extra documentation should contain timing information.
Without the knowledge of when a certain block of data is put on the Ethernet cable, it is virtually impossible to understand the working of a socket library. Therefore, we need communication information.

Summarized, we need a) message flow information (captures blocking / non-blocking, signaling / polling / waiting and syntactical conflicts), b) protocol information (which data is put on the cable by which function) and c) timing information. None of these is directly available in the standard syntactical API accompanying socket libraries.

13.7.4 State the Adaptor Requirements Explicitly

Once the necessary, but missing, extra information is identified, we should be able to state the requirements explicitly. To do this we can again rely on the variabilities we have identified earlier. Which of these variabilities does the adaptor need to understand ? Can we define what a good working adaptor is for such a variability based on the reference frame ? In this case it might be necessary to re-investigate the reference frame, however we consider this to be part of the prototyping cycle. When defining the requirements it is important to distinguish between different kinds of requirements. These are:

Some requirements can be statically checked (off-line), without running the program, only by looking at the two conflicting components a program can be generated that will satisfy the requirement. This is a most favorable situation, however such a situation will not always occur. In this dissertation we didn't encounter such a requirement.
Some requirements can only be checked by running the program. For instance, a requirement such as the liveness-requirement cannot be automatically satisfied before program execution (off-line) because it requires runtime-rewards from the client component.
Some requirements cannot practically be checked. It should be noted that it is not because a requirement cannot be verified (or is too resource consuming to verify), that it is impossible to solve it. In this dissertation, the no-race requirement, is such a requirement. Checking whether no race-condition occurs at runtime is much more difficult than solving race conditions. It is clear that this kind of requirements is much more difficult to satisfy because it requires expert knowledge of the domain and knowledge of standard programming idioms to tackle these kind of problems.

Example: We go back to the variabilities we have identified earlier. We assume that the protocol that is used by the conflicting socket libraries is the same and compatible. We also assume that the information communicated by this protocol is compatible. So, the essential problem is in how the application communicates with the socket library to achieve a certain effect. With this we are able to specify the requirements informally:

The application should be able to use any event structure it requires to realize a data stream. The application will provide the adaptor with the required ordering of 'events'.
The application should be able to use any timing structure it requires. The application will provide the adaptor with these requirements
The adaptor should only use the provided socket implementation to realize the required behavior. The socket implementation provides a description of its event structure and its timing structure.

In the above, 'event' refers to a call, a return, a signal, a data block sent or received. This requirements are relatively informal, if this adaptor would be made, a more strict set of requirements would be necessary.

13.7.5 Correctly Represent Missing Information

**Figure 13.5:** Petri-net description of a socket net.
$\includegraphics[% width=1.0\columnwidth]{SocketNet.eps}$

The missing information must come from somewhere. Either the user of the adaptor supplies it, or the implementors of the different interfaces have to supply it. Essentially where it comes from does not matter, as long as it is there. To motivate users to supply the missing information, it might be helpful to make it useful. In this dissertation we have done this with our Petri-nets, they not only help in creating an adaptor, but they also help in detecting all kinds of interesting features of an interface/component (see chapter 3).

Secondly, the process of writing down the missing information should itself not be more difficult than the process of writing an adaptor. Therefore, it is necessary to carefully create a language that is able to capture all essential information in such a way that it is intuitive and expressive enough. However, here it should be avoided to introduce a Turing-complete language because then a number of the requirements might suddenly become impossible to verify. So, keep an eye on how the requirements can be validated afterward.

Questions with respect to the missing information are:

how can we make it useful to the one who needs to supply it ?
how can we present it in a language that is natural to humans and compact ?
how can we avoid a language that is Turing-complete ?

Example: For our socket case, which was missing event flow information, protocol information and timing information, we will introduce timed Petri-nets, with two special places, which we describe below. By doing so we can express all the missing information.

event flow information will help in avoiding conflicts between blocking / non-blocking calls, signaling / polling / waiting architectures and syntactical problems.
timing information. By simulating the Petri-net in such a way that timers trigger certain places it might be possible to implement 'required' timing information'. By observing timers and detecting timeouts at 'offered' behavior, it is possible to detect errors.
communication: by introducing two places, a special receive-source place and a special send-sink place it might be possible to verify the correct working of the communication link. One place is a source-place to receive raw protocol data. The second is a sink-place that is used to send out raw data.

However, how we present these Petri-nets to the end-user is another problem. In this dissertation we have already created a Petri-net text format that is easy to use and can be expanded to more difficult Petri-nets offering different kinds of behavior. In this case we will need to add optional timing information on the arcs. Also, we cannot expect that the end-user exactly knows the underlying protocol (with the exact control data), therefore it might be a better choice to simply store the actual data in these places. Also because the use of buffers is an important issue in this case, it might be useful to introduce such an abstract data type into out Petri-nets. Also a necessity is the notion of a session, this will also be marked within the tokens. An example is given in figure 13.5.

13.7.6 Create the Adaptor

Once the missing information is present and the requirements are known, it is time to look for solutions. This involves finding out which requirements can result in a compilation of an adaptor, and which requirements require the insertion of a general algorithm. In both cases it is important to decide what kind of internal representation will be used. In this dissertation the internal representation of the liveness-module and the enforce-action module was similar to the Petri-net representation, however, the representation of the concurrency module was slightly different and more tuned toward its efficient working.

When using learning algorithms, make sure to use an automatic process to verify whether the representation is good. In the end, a computer needs to use the representation to (re)act correctly to certain situations. Therefore it is important to have a representation that quickly leads to a correct behavior. In other words: the bias of the representation should be measured and should be optimal. In this dissertation we have verified our Petri-net representation of the liveness-module by means of a genetic algorithm. E.g, the representation of the liveness module.

Once solutions have been identified to match the different requirements investigate how these solutions can be modularized. Afterward investigate what kind of topology is necessary to make the entire adaptor work. Important issues:

how can every requirement be satisfied ? compilation vs interpretation ?
when using learning algorithms: How to guarantee a correct bias ?
can certain requirements be modularized ?
what kind of topology will be used ?

Example: In our socket case, the implementation of the adaptor might be similar to the adaptor presented in this dissertation.

To guarantee a correct data flow, our adaptor can investigate at the application side which data it places in the output tokens. At the kernel/library side it can do a reachability analysis to deduce how to send data over the socket.
Solving timing information can be done by inserting a Petri-net evaluator that knows how to handle timing.

Probably the only two modules necessary to make this work is one module that communicates to the user level application and a second module that communicates to the actual library/kernel. The communication link between these two modules can be a common place, on one side understood as a receiving place, on the other side understood as a sending place. (figure 13.5 contains dotted transitions and lines which link them both together).

The evaluation of these socket nets should be able to handle the notion of threads: creating ones when necessary, managing incoming threads and destroying others. This can be done by using the 'session' knowledge present within the different tokens.

The above guidelines of course should be embedded in a typical prototype/experiment cycle. With these guidelines it should be possible to create adaptors for domains other than concurrency conflicts in open client server architectures. Essentially , any place where one interface is fitted with multiple implementations is a candidate for automatic adaptation.

13.8 Future Work

13.8.1 Meta-conflicts

An area that is not investigated in this dissertation are meta-conflicts. These are conflicts that occur between the different interface-descriptions. We have already touched this issue in section 12.7.1. However, before the problem of meta-conflicts can be investigated, a large number of adaptors based on extra formal documentation needs to be available.

13.8.2 The problem of checkpoints

13.8.2.1 Test-scenarios

**Figure 13.6:** Two test-scenarios to verify the correct working of an adaptor.
$\includegraphics[% height=8cm]{TestScenarios1.eps}$

In our solution to the liveness problem we assumed that the programmer can easily specify checkpoints in the source code. However, if this is not possible, test-scenarios can be used. A test scenario specifies which traces are considered to be 'good'. Based on this information, rewards can be obtained. Illustration 13.6 shows two test-scenarios. Scenario 1 shows that the client will send out a Lock and expects a LockTrue in response. Afterward an Act will be sent out and an ActDone is expected to return. Finally an Unlock is sent out and an UnlockDone should return. The second scenario illustrates more or less the same, the only difference is that now two Act's are sent out. The second scenario can be used to teach a learning algorithm that after the first act message multiple other act messages might be sent out. Given these two test-scenarios one can easily implement a tracker in the Petri-net that will assign rewards when appropriate.

If we go even a step further, it might be possible to convert a number of test-scenarios automatically to the Petri-net description of a component [EK98]. However such a process heavily relies on a correct generalization. Given the two scenarios in figure 13.6, one might assume that the component will also work for 3 or more acts at the same time. However, nothing ensures this is the case. Depending on the algorithm another generalization might occur.

13.8.2.2 Annotated Petri-nets

Instead of using Petri-nets and separate checkpoints, the developer of the Petri-nets could also tag arcs with a priority, which would resemble the rewards offered by the different components.

13.8.3 Other Learning Approaches ?

We have made the assumption that every future reward is statistically correlated to the current state of the Petri-net involved. This was necessary to demonstrate that the mapping of our liveness problem to a reinforcement learning problem was well defined. However, if we have made the assumption of a reward/marking correlation, we only did this from a practical point of view: for the programmer it is easy to specify checkpoints. In the conducted experiments we have observed that in general this did not form any problem, however a large scale investigation of this property would be very useful, especially to learn the boundaries of our approach.

We have already seen that if the reward is not entirely dependent on the locking strategy some oscillation in the values will occur, on the other hand if the reward is used to express a hidden property of the locking strategy it will not be able to learn it.

13.8.4 Peer to Peer Concurrency

In this dissertation we did not fully investigate how peer to peer concurrency could be managed. All the examples we have presented work with only one server. As explained in section 12.7.2 there are some major problems involved with peer to peer concurrency. The biggest problem of such an environment is that not all communication can be mediated and that different components might communicate indirectly with other components. To solve this problem adaptors themselves should be able to coordinate their behavior and learn how to behave to support an implicit present overall behavior. Theoretically, game theory might be applicable.

Game theory [Nas50b,Nas50a,Nas51,Nas53] requires agents that choose a discrete action. Depending on the action they choose they can either win or lose. The outcome of a game not only depends on the action chosen by one agent but on the interaction between all the chosen actions (this is typically the situation in peer to peer networks). A game is typically characterized by 5 elements:

the players, how many are there ? Is there a chance-player (or is there some random element) ? In our situation these could be the components and the delay times when sending messages.
a set of possible actions for every player. This largely depends on what kind of possibilities a component has. Probably this will be mapped onto the sending of a message or waiting with sending the message.
the information players have available when choosing their actions. It is clear that no adaptor in a peer to peer network will have all the information.
a measurement which describes the payoff for all combinations of actions. It should be possible to embody the overall required behavior into some kind of reward system.
a description of what every player tries to accomplish. These are the requirement of every agent. In this dissertation we have assumed that every agent tries to accomplish the same.

The 5 elements that characterize a game are present in typical peer to peer networks, so game theory might be a good start to investigate the problem of conflicting interfaces in peer to peer networks.

13.8.5 Learning an Interface Description

As explained in section 12.3, all investigated concurrency interfaces behave as simple finite state machines. Therefore it might be possible to deduce the behavior of an interface in an automatic way. For instance, a learning algorithm could, given an API of a component's interface, try out which actions are possible at which time and construct a Petri-net description of the interface automatically.

To do so a program $\rho$ , that is supposed to learn a Petri-net description of component $\alpha$ , which offers only a syntactical interface description $\delta$ , could follow different strategies:

learn hidden variables of $\alpha$ by only accessing $\delta$ . The problem of modeling the behavior of $\alpha$ by only looking at $\delta$ is essentially finding out which variables are useful to describe the behavior of $\alpha$ over $\delta$ . The problem is that all too often these variables are hidden within the semantics and not actually present at $\delta$ . Therefore two ideas might help
1. observe the behavior of a standard communication and find out hidden variables by statistical analysis of the message flow.
2. generation of test-messages by $\rho$ to validate and/or test certain variables:
  1. useful to verify whether a hidden variable is truly a distinct state and not merely a coincidence.
  2. useful to check the boundaries of variables.
directly observe the component $\alpha$ . Instead of looking for hidden variables it might also be possible to directly investigate the behavior of the component.
1. observing its binary state before and after handling a message. This avoids the use of looking for hidden variables but can easily create too large models and describe states that are entirely redundant to model the behavior of $\delta$ .
2. observe the control flow within the program when handling a message. This should make it possible to detect whether all branches of the component have been investigated and possible how other branches can be investigated. This might help in creating the expressions on the arcs of the Petri-net.
model generation & verification. Generate a random model $\gamma$ . Afterward $\rho$ can verify, by checking consistency and completeness whether $\gamma$ is a correct model of $\alpha$ . To do so, genetic algorithms/programming might be useful (given the correct bias of course). After verification of a model $\gamma$ , crossover and mutation could help in refining and creation of a better model.
Petri-net representation in such a way that
1. the Petri-net it is not too small. For instance, one place that captured the entire state of the component, with 5000+ transitions describing every possible message is clearly not a good Petri-net
2. the Petri-net is not too large. It is also useless to create a place for every memory-cell available to the component.
3. the representation should only contain places to model a certain requirement such as i) when a certain message can be send by only looking at the client, or ii) when a certain message can be send, looking at the server, or iii) to capture the state of a component such that it can be rolled back at a later time, or others...
4. automatic reduction of the Petri-net might help in removing obsolete states, not necessary to model $\delta$ .

The above are some ideas about how one might try to learn the behavior of an interface automatically. Whether it is truly possible to learn the behavior of an interface automatically remains an open question. If we would be able to do this, then the impact of our work would be substantially greater, because then we would be able to mediate differences between components in an even more automatic way. As a trade-off, the advantages of using a formal interface specification would be lost. We would no longer be able to verify the completeness and consistency of an interface description.

13.8.6 Determinism versus Non-Determinism

This dissertation focuses on the generation of intelligent adaptors. We made the assumption that we have the concurrency strategies specified as a Petri-net. From a pragmatic point of view, this is very useful because it offers the developer all kinds of interesting information. However, the essential reason why we need Petri-nets is to introduce determinism and avoid that the liveness module could bring the client components in an invalid state by sending out a wrong message. E.g., if the client component does not know that before the first setPosition, a joinActor needs to be sent, the client component might fail to work properly.

In the enforce-action module we also need a similar deterministic property. This module needs to know how it can bring the server component in a certain state. Here we assume that the server is well written and that every possible synchronization action that can be invoked upon the server can easily be undone. Of course, to avoid denial of service attacks and to increase its robustness, every good server offers this kind of functionality. However, in general, the concurrency module needs be sure that it can deduce deterministically what to do and that it will always be able to reach a certain state.

Within the concurrency module a similar deterministic problem occurs. However this problem cannot be solved easily. The concurrency module resolves race conditions. However, it is unable to resolve deadlocks when it detects them. The reason is that it is unable to anticipate a deadlock, because it has no knowledge of it, and that, once it has detected a deadlock it is unable to undo previous actions on the involved components.

From these three modules we see how required determinism poses problems. During this dissertation we have tried to generate adaptors that work without making mistakes that cannot be revised afterward. The reason why we imposed such requirement is that all real-world computation happens in a controlled non-deterministic (hence deterministic) way. If two components communicate, there are exactly two components and the functionality offered by those two components is exactly understood. If an adaptor wants to mediate the differences between those components it should be deterministically correct, otherwise it would certainly bring disaster upon one of both (if not both) components. This deterministic view on computation is an illusion, for in open distributed there are many opportunities when a component might fail; all too often the reasons lie within uncontrolled upgrades or unreliable infrastructure. Nevertheless developers still hold on to an illusion of determinism and construct programs that behave completely deterministically.

In this dissertation we made exactly the same assumption. However, one can also approach the problem from the opposite side. One might be tempted to embrace the non-determinism found in open distributed systems and investigate how different but similar working components might emerge behavior that satisfies certain requirements.

By exploiting non-determinism one might create programs, which act creatively and intelligently, because they would be truly able to find and apply clues in the environment which a deterministic process cannot see. This kind of research however is still barely started and certainly not available in the field of 'services' offered on the Internet. Some pointers might give a lead such as swarm intelligence [KRCE01], amorphous computing [HAAC⁺00] and cellular automata [TM87].

Footnotes

...^13.1: The version for Macintosh and PalmOS have been omitted but can be found on the site http://borg.rave.org/cgi-bin/borgcvs/borg/cborgcore/. The different files are named StdLinux.h, StdLinux.c, StdWindows.h, StdWindows.c, StdPalm.h, StdPalm.c, StdMacintosh.h and StdMacintosh.c.

Werner 2004-03-07