High level module overview
IN THIS CHAPTER we first summarize our work. Then we will
elaborate on our scientific contributions as well as on the applicability
of the thesis. We point out the limitations of our work, give guidelines
how an adaptor for other conflict domains might be created and finish
the dissertation by pointing out possible future work.
BASED ON THE CASE STUDY of conflicting concurrency strategies,
we have shown how intelligent adaptors can be created automatically.
We have illustrated this by creating such an intelligent adaptor as
a sequence of three modules, where every module has different responsibilities.
The liveness module learns how to keep the components behind the concurrency
strategy alive. For the concurrency adaptor as a whole, such a module
is necessary to offer to the underlying components an optimal feedback
behavior, as opposed to for example, returning a behavior that leads
to endless live-locks such as lock-lockfalse-lock-lockfalse
ad infinitum. Because it is still unknown whether liveness is a decidable
property of Petri-nets, we have decided to use a reinforcement learning
algorithm (Q-learning and an -greedy strategy) to approach
the problem. Feedback for the learning algorithm comes from the underlying
component by means of statically placed checkpoints. Every time a
checkpoint is reached a reward is sent to the adaptor and the learner
can, when necessary, modify its future behavior. The reinforcement
learning algorithm itself has been very efficiently mapped onto the
structure of colored Petri-nets, in such a way that storage requirements
are minimal, without losing essential information. This optimization
is mainly based on a) the explicit state information of the Petri-net
already present and b) the creation of a system that can recognize
certain situations and knows how to handle them. Our situation-recognition
system is expressed as a Petri-net. During execution, new transitions
are constantly added to the Petri-net. The learning algorithm automatically
explores the new avenues offered by these transitions and, when suitable,
these new transitions are kept alive and the new situation is remembered.
The reason why we use Petri-net transitions as situation handlers
has been explained thoroughly, and possible alternative representations
have been investigated. To verify the suitability of our representation
we used genetic algorithms.
The concurrency module of our concurrency adaptor inserts an appropriate
concurrency strategy. To do so, it communicates with the two other
modules about resources (for instance the possible squares in a whiteboard)
and actions (for instance a setPosition). There is no
explicit annotation of resources and actions on resources, both are
implicitly represented within the Petri-net. This information is automatically
deduced by the first module under the form of enabled transitions
and equivalent states. This, together with extra information of the
liveness module that describes which futures are possible and which
futures the component would like, enables this module to make correct
decisions about what to do. Once such a decision is taken the client
component can perform its actions on the server component.
The enforce-action module bypasses the concurrency strategy of a component
entirely. This allows for more freedom in the two other modules. The
enforce-action module does this by offering a logic interface to the
outside world while inserting the necessary synchronization messages
at the appropriate times. To do so, the module needs to know how certain
markings within the underlying Petri-net can be reached. In this dissertation
we have achieved this by means of a reachability analysis in prolog.
In these three modules, the adaptor needs a) to analyze how to enable
certain actions, b) to know how to reach a certain state and c) to
know which resources are involved. In order to be able to know this
the adaptor requires a formal documentation in the form of Petri-nets.
Every component needs to provide a Petri-net describing its own required
or provided concurrency strategy in a consistent and complete way.
Consistent means that any action that is allowed by the Petri-net
should be accepted by the component and complete means that no action
that is allowed upon the component is undocumented.
WE STATED that in open distributed systems, one needs intelligent
adaptors to mediate interface conflicts. We identified three reasons
why one might need an intelligent adaptor: a) in open systems, it
is very unlikely that one standard will be used by everybody, b) the
highly volatile nature of peer to peer computational networks gives
rise to interface conflicts and c) with the current advent of interconnected
embedded systems and ambient intelligence the problem of conflicting
interfaces will become even more prominent. We claimed that it
is possible to create, for certain categories of concurrency interface
conflicts, a concurrency-adaptor that automatically learns how to
resolve the conflict in such a way that a) the required behavior of
the involved components can be executed over the adapted interconnection
and b) it is able to work on-line in certain open system. We have
validated this by constructing such a concurrency-adaptor. Below we
discuss the different requirements and how we have satisfied them.
To show the necessity for such adaptors, we have introduced our real-world
case of conflicting concurrency strategies. We have investigated different
concurrency strategies and explained why components in open distributed
systems need a concurrency interface. Relying on this analysis
we discussed the problem: not everybody will write the same concurrency
interface for his components and thus concurrency interface conflicts
will arise. We identified and explored in much detail a large number
of practical conflicts. More specifically, we investigated conflicts
that can easily occur when one or both of the concurrency interfaces
is upgraded without prior notification and without full backward compatibility.
To show that we can create such an intelligent adaptor we have constructively
developed one. The adaptor itself needs detailed information of the
provided and required interfaces of all participating components.
The adaptor works by means of three modules. It contains two modules
that essentially bypass the provided or required interface and one
module that actually implements a suitable concurrency strategy. To
bypass a concurrency strategy we investigated two techniques. The
first technique was a reachability analysis of a Petri-net, the second
technique was a reinforcement learning algorithm. Finally we explain
how resources and actions can be detected automatically within a Petri-net.
This chain of three modules constitutes an intelligent adaptor that
is able to set up a link between conflicting concurrency
interfaces, for which the adaptor does not know the exact interface
beforehand. The adaptor is restricted to client server architectures.
In section 12.7.2 we have explained in detail
what can go wrong with peer to peer applications. Essentially, the
adaptor does not work in open peer to peer systems because not all
communication between different parts of the application can be mediated
by one adaptor.
We now reconsider the abstract requirements from the introductory
This requirement was necessary to make the adaptor useful in an open
system. The performance estimates as well as the setup of the adaptor
enables it to work on-line. The liveness module is an on-line prolog
program that can deduce how a certain state can be enforced upon the
server, without needing to 'test' the server's behavior. The concurrency
module is a program that can be executed on-line and the liveness
module has been specifically designed, by selecting Q-learning as
a learning technique, to work on-line.
The adaptor is a central component that is placed solely on the connection
between different components. Hence the only modifications it can
introduce is on the messages between the different components. Therefore,
this requirement has been satisfied.
The most important abstract requirement of an intelligent adaptor
is based on the notion of an overall required behavior. In
our case study the overall required behavior between the different
conflicting concurrency strategies was to avoid actors crossing each
others boundaries. As we have experimentally observed, our adaptor
supports this implicit requirement. Two cases can occur:
In the cases where there is a full agreement, we have experimentally
observed that our adaptor works. This has been described in chapter
12. If there is no full agreement on the required
behavior then the adaptor should try as hard as possible to satisfy
as much as possible of the required behavior. This requirement is
satisfied because our concurrency module allows the presence of an
actor that does not manage certain resources. In such a case these
actors will be set 'on hold' until a suitable moment arises to realize
the required changes.
- all the components agree on an overall behavior and offer/require
a concurrency strategy.
- some components do not offer a concurrency strategy
THE MAIN CONTRIBUTION OF THIS THESIS is the process we have
been following to create an adaptor. Instead of trying out one technique
and verifying whether it works or not, we have kept on searching to
find a (set of) suitable technique(s). To reduce the complexity of
the problem we have restricted ourselves to the case of concurrency
conflicts, which we have expressed in an event based model. When necessary
we have invented new techniques (The extremely useful Petri-net documentation
for describing interfaces and a new technique to verify the bias of
computer generated programs), or combined existing techniques (Petri-nets
& Reinforcement learning, the three different modules that constitute
an adaptor). This has lead to the creation of a concurrency adaptor
that mediates differences between conflicting concurrency strategies.
Below we summarize the contributions.
- Open systems need concurrency interfaces: We showed that the
components of an open distributed application need concurrency
interfaces. The reason is that one cannot foresee how a component
in an open distributed system will be used, therefore one cannot know
which actions are supposed to be atomic, and therefore every component
should allow other components to specify a critical section. Hence
the components themselves need to offer a concurrency strategy. We
have illustrated this extensively by means of a whiteboard.
- Open systems require concurrency adaptors: We have shown that
components within an open distributed application can offer all kinds
of concurrency strategies and that it is very likely that concurrency
interface conflicts will arise between different components. We have
extensively discussed a set of real-world concurrency interface conflicts
and we used these conflicts to validate our claim.
- Petri-nets as a formalisms for interface description: We showed
how Petri-nets can be used to describe the behavior of an interface
in a formal way. This formal interface description is useful because
it can always be synchronized with the source-code, as consistency
and completeness can be checked in an automatic way.
- A concurrency adaptor: We showed how an intelligent protocol
adaptor can be constructed in order to resolve incompatibilities between
communicating components. Such an approach is indispensable to cope
with the combinatorial explosion of protocol adaptors that are needed
in an open distributed setting, when dealing with interconnected embedded
devices or when writing components. In all these cases components
interact with other components in unpredictable ways.
- Automatic bias verification: We showed how genetic algorithms
can be used to verify a representation instead of providing a solution.
We consider that if a certain problem can be solved easily under a
certain representation, it is a good representation. If the same genetic
algorithm has more trouble finding a solution the representation might
not be so suitable.
- Unification of Petri-nets and reinforcement learning: We showed
how reinforcement learning can be unified with colored Petri-nets
in an efficient way by exploiting the structural properties of the
- Component based software engineering allows for easy adaptor
creation: We showed how an event based model allows for easy adaptor
creation if it is used in a disciplined way, that is, if it is purely
event based and no compromises are made to integrate with other models,
such as a thread based models, and that no communication happens outside
- Guidelines for developing interface adaptors: This dissertation
can be used as a platform that offers recommendations and guidelines
for the development of conflict adaptors for other domains.
IN THIS SECTION WE ELABORATE ON SOME LIMITATIONS OF THE PRESENTED
- Only conflicting concurrency strategies: Conflicting interfaces
represent a major problem that has to be dealt with. To approach this
problem we have investigated an accessible but smaller problem of
concurrency conflicts. This limits the immediate applicability of
our technique. Nevertheless,
- the case itself is chosen such that its correct working cannot
be measured easily at runtime. E.g; How can one measure a race-condition
? This clearly depends on what the components consider to be a valid
state. By using such an interesting requirement, we have investigated
an interface conflict that cannot be automatically deduced from a
state-machine (such as done in [Wyd01]). Also, another requirement
of our concurrency conflicts was the fact that the adaptor should
keep the components alive. This also made it difficult to use one
standard technique to solve the problem. So, our case is indeed a
small subset of all possible conflicting interfaces, nevertheless
it is certainly not a trivial case.
- in the domain of open distributed systems this work is an important
contribution because the problem of conflicting interfaces is often
overlooked and because our work allows for a decoupling of the required
and provided concurrency interfaces. If we would want to extend this
research to other application domains, then a substantial part of
it should be redone. For instance, it might no longer be feasible
to use Petri-nets, or to use a liveness module and/or enforce action
module. Later on we will discuss a case in which conflicting socket
libraries are investigated.
- Well-working functional link: An important limitation of our
work is the fact that we rely on the availability of an existing,
well-working link between the conflicting components. More specifically,
we can only generate an adaptor on non-functional requirements between
two interfaces if the core functionality of both components is compatible.
Such an assumption is not unexpected because it helps in creating
a predictable environment that allows for verification of the generated
adaptor. E.g, a whiteboard offering set_position (with
underscores), while a client expect SetPosition (without
underscores) are clearly in conflict. Because this is a conflict on
the logic interface between both agents, our adaptor will not
be able to mediate such a conflict. Some approaches might help to
solve this kind of conflicts
- Instead of relying on a correct functional link, we could probably
also rely on another reference frame for which the correct working
of the adaptor could be measured. For instance, a requirement such
as 'make sure that both partners draw a moving actor on the whiteboard'
could result in a guiding principle for an off-line learning algorithm.
Later on, we will give an example of conflicting socket libraries
to illustrate this.
- Similarly, other research requires the availability of a placeholder
that describes the required interaction [Wyd01]. This research
however is very limited to the requirements it can describe.
- Depending on the application it might be sufficient to just rely on
the commonalities between the expected and provided state of
the involved components instead of relying on the common actions.
- A Controlled Case: We have only been using one case, a whiteboard,
to illustrate concurrency strategy conflicts. We did not investigate
other examples. Nevertheless, this whiteboard case has been explicitly
created to illustrate practical problems. With it, we were able to
illustrate race-conditions, deadlocks, livelocks and transactions
in such a way that the nature of open distributed systems was still
preserved. From an academic point of view, this case describes everything
one might need in open systems. From an industry point of view, a
lot of problems have been avoided. For instance, in the 'real world'
not everybody agrees to follow an event based model. Also not everybody
will agree to write Petri-nets, some might even disagree with the
fact that a concurrency strategy is necessary. However, instead of
seeing such a controlled case as a weakness, we would rather like
to see it as one of the strengths of this dissertation. By using a
simple model, we were able to describe all necessary issues, from
simple locks to transactions in such a way that a) their need was
clearly demonstrated b) many of the concepts, necessary in the real
world are present (resources, actions, processes,....) and c) the
common problem, recognized by different solution providers are present
(livelocks, deadlocks, transactions,...).
- Component based development: We have limited our work to conflicts
between components in a component based system. We did not investigate
how our work can be unified with models that have a concept of shared
memory. Neither did we investigate how our results can be mapped on
thread based models. The main reason for doing so is that in an experimental
model it is impossible to take into account all possible technical
details. Appendix 13.8.6 covers a detailed explanation of what
kind of problems we have avoided by doing so.
- Only one component system: In this dissertation we have limited
our research of adaptor creation to one component system, which we
created ourselves. This was mainly due to practical reasons. The project
that financed my research, the SEESCOA project, needed a definition
of components and an implementation of a component system in Java.
It was only out of practical considerations that we have used this
component system because it allowed us to use it actively
on more than one front. First, in the test-case for the SEESCOA project
and secondly as a means to carry out our own research. By doing so,
we were able to deliver a reusable and high quality component system.
A second reason why we favored the SEESCOA component system is because
it has a mature design. After creating components (agents) in the
Borg mobile multi agent system, some design flaws came up, that were
not easy to fix. These design flaws have been corrected in our current
implementation of the component system. Aside from practical considerations,
the component system itself is also representative for open distributed
systems, applications within open distributed systems and can be considered
as a common denominator of existing component systems.
- Technical limitations: The adaptor created in this dissertation
works when the involved conflicts arise from syntactical conflicts,
the ordering of messages, re-entrance conflicts, conflicts between
static resources and conflicts on the layering of concurrency strategies.
However, this certainly does not cover all conflicts that might arise.
For instance, our adaptor fails to work when:
- First class resources and/or sessions are present and
used as discussed in section 12.7.2.
- There is hidden communication between peers. Because our adaptor
does not see this communication it cannot adapt it. (section 12.7.2)
- There are non-sequence requirements. These are requirements
that have nothing to do with the sequencing of messages. Such requirements
- timing: For instance, a requirement such as 'every 2 seconds
an alive notification should be sent'
- memory: For instance 'when handling this message, the memory
should remain below ...'
- state information: For instance, ``after a rollback the
component should be in such a state''. In our case this can be relatively
easily added by using the places of the Petri-net.
13.5 Application Scope of our Adaptor
IN THIS SECTION WE discuss the application scope of our adaptor,
first, without giving examples of such applications. We discuss why
completely open systems are not feasible and why our adaptor is useful
even in situations where exact requirements on the interface, but
also on the usage of that interface, are necessary. Afterward
we give some immediate practical applications of our adaptor.
Some conflicts are impossible to mediate. In section 6.1.2
we have given examples of such conflicts. A notable example of such
a conflict was our line example. In this example we assumed that the
line actor would always have a lock on a square somewhere on the whiteboard.
Another client required the possibility to lock the entire whiteboard.
If those two actors were to work together on the same whiteboard then
the second actor (which wants to lock the entire field), would never
have a chance to do something because it would never be able to lock
the entire whiteboard (simply because the line actor will always keep
one square locked). This conflict is notable because it demonstrates
that even realistic locking strategies can easily lead to impossible
to solve concurrency conflicts. In this example the conflict arises
from a subtle difference within the usage scenario involved.
This indicates that truly open systems, such as advertised by the
mobile multi agent scene, might be impossible to build: a small change
to a component or to one of the involved usage scenarios might transform
a 'possible to mediate conflict' to an 'impossible to mediate conflict'.
For open systems this means that probably the only way these systems
can work is if everybody agrees on exactly the same
specification. This specification should be exact (formal)
with respect to what kind of usage scenarios and what kind of interfaces
are allowed. This is clearly not realistic for open systems.
Interface boundaries and usage boundaries. Top
of picture: small interface, limited usage. Middle: small interface,
large usage range. Bottom: large interface, large usage range.
The statement 'open systems require an agreement between all partners
to follow exactly the same specification' might sound contradictory
with our initial motivations. However an exact specification does
not prohibit from allowing certain degrees of freedom. An adaptor
might shift the problem of being compatible with a very narrow interface
description to being compatible with a wide range of allowed interfaces
(within certain degrees of freedom). We explain this by elaborating
on the different possibilities (all pictured in figure 13.2)
To summarize, within certain boundaries, an automatic adaptor is useful
because it allows any component a choice in the interface it
provides or requires. This can have a major impact on the way components
are developed as we will explain below. Also, it essentially offers
a decoupling of the required interface from the provided interface.
- When the interfaces and the usage are narrowly specified,
then we have an interface that is barely usable by other components[Ore98].
If somebody wants to use the component in a slightly different situation
(a situation with other usage scenarios) it will no longer be compatible
with the exact specification of the usage scenario. A component with
such a strict requirement is very limited with respect to the client
components it can serve.
- When the interfaces are narrowly specified, but the component
usage is wider ranged, then possible client components have,
depending on their goals, more freedom in using the provided interface.
However, this poses a problem because certain usage scenarios are
not necessarily easy to implement over the narrow interface. E.g,
implementing a field-lock over an interface that offers the locking
of squares. If we could, in such situations, allow other, slightly
different, interfaces to access the same functionality then the development
of open systems would become easier.
- When interfaces are specified as wider ranged interface-descriptions
(instead of exact syntactical descriptions) and the usage scenarios
are wider ranged as well, then one component can use another as it
sees fit. E.g.: one component specifies a reentrant locking strategy,
while another specifies a non-reentrant locking strategy. However,
before such wide ranged interface can communicate some conversion
needs to take place. This is possible if we are able to mediate the
differences between the different interfaces. This is where our adaptor
comes in because it enables such scenarios.
Demonstration how a Petri-net can be
written that will immediately lock all fields necessary for the line
and its trail.
We initially started our investigation trying to foresee what the
Internet would be in the future. We tried to foresee what kind of
problems will arise. We concluded that conflicting concurrency strategies
would become a large problem. In the approach we have presented we
tackle this problem by assuming that every component will offer a
concurrency strategy and that these will need to be mediated. In doing
so, the problem of 'writing correct adaptors' is shifted to the problem
of 'specifying the required/provided concurrency behavior precisely'.
The result of this approach is that the developer does not have to
implement the adaptors directly, but that he instead has the responsibility
of writing a complete and consistent Petri-net. The developer can
use such a Petri-net to write down a strategy specifically suited
for the problem at hand. Instead of sticking to one of the standard
concurrency approaches he can now easily specify which resources need
to be locked simultaneously. Depending on the required usage scenario
it might be possible to require (or provide) better suited interfaces
for the needed functionality. For instance it is possible to write
a client component that specifies that it wants to lock an entire
line and its trail immediately. This is pictured in figure 13.3
and would make writing our line actor more easy. On top of this is
the fact that this Petri-net exactly specifies what is needed and
thus makes mediating conflicts involving this concurrency strategy
even more easy.
Using a concurrency server to which
components can send a concurrency representation agent.
If we follow this line of thought we end up with concurrency servers
on which different components can publish the resources they have
and the behavior they need. This might become even more interesting
if, instead of supplying such a server with a Petri-net of the behavior,
we could supply such a server with an actual program, that represents
our original component. This program can express even more behavioral
information than our Petri-net descriptions. In these cases mobility
of components would be very useful. This is pictured in figure 13.4.
Below, we describe some applications or application domains where
our concurrency adaptor, as presented here, is useful. We first give
a number of examples where the adaptor is useful in situations where
the interface not necessarily changes. The applications we have been
looking for were constrained by a number of requirements
Afterward we give an example of an interface with frequently changing
semantics in which this research can provide an added value, without
relying explicitly on the previous requirements.
- The application should have a client-server architecture, otherwise
our adaptor is not immediately applicable.
- A concurrency strategy is necessary to guarantee a correct overall
- the resources should be statically defined.
- when possible, the interface should change often, because this will
naturally lead to numerous conflicts. However, this is not strictly
necessary to make our adaptor useful.
- the system must be event-based, or expressible within an event-based
A computer collaborative supported work environment often contacts
a central server that offers some kind of whiteboard (in the broad
sense of the word). Users can join such a whiteboard and through it
communicate with each other, while still following some rules. In
this case, the client program offers a user interface to the end-user,
while the central server stores and manages the data. In such a multi-user
environment it is necessary to lock operations on the whiteboard.
Depending on the kind of locking strategy required by the client an
adaptor might be necessary. This is a case where locking is necessary,
the resource can be described statically, the clients might
require a different concurrency strategy and it can be considered
as an event based system. Therefore, our adaptor is useful in this
Databases are another area where our work might be useful. Databases
are often centrally placed servers that allow different clients to
update records. To guarantee that this happens in a consistent way,
a locking interface needs to be present. If such a database would,
together with the schema's of the different tables, also supply a
Petri-net that describes the offered concurrency strategies, then
it might be easier for clients to access this data because they in
turn can define a more suitable required concurrency strategy. This
can make implementing clients for databases much easier.
13.6.3 Frequently Changing interfaces:
Some important, barely documented,
differences between Linux sockets and windows sockets.
We now present an example of conflicting interfaces within a domain
other than concurrency interfaces: socket interfaces. Later on we
will use this example as an illustration of how (part of) the techniques
used in this dissertation can be used to implement adaptors in other
When we were developing Borg, the biggest problem of creating a working
version for all platforms was, contrary to what might be expected,
not the user interface, but rather the socket libraries. We have developed
Borg for PalmOS, Windows, Linux and Macintosh. Every operating system
offers its own version of a socket library. This has posed major problems
because not every socket library is as easy to use as any other socket
library. Table 13.1 contains the most important
differences between Linux and windows.13.1 Between the two presented API's some important differences exist:
As illustrated by this example, interfaces that often get a new implementation,
so called hotspots, will automatically give rise to differences between
implementors. This often leads to conflicts. In this example the availability
of Petri-nets to describe the semantics of the different functions
would have been extremely helpful as well as the possibility to mediate
the differences automatically. When we present the guidelines, we
will investigate this example in more detail and give a sketch how
to approach the problem of conflicting socket interfaces.
- Syntactical differences on multiple levels. The error constants
under windows are all merged into h_errno, while under
Linux, the error constants are separated in two variables, h_errno
and errno. Another syntactical conflict can be found when
marking a socket non-blocking.
- Initialization of the library is different. Linux doesn't require
initialization, Windows does.
- Windows will not signal when data is received, hence the application
needs either to poll constantly the sockets it manages, or it should
start a new thread. Linux will signal when data might be available,
hence not requiring a new thread or polling loop. This is a conflict
that is not easy to solve in a platform independent way.
- Between different Linux versions other conflicts might arise. In the
kernel version 2.2 series, circular sends would not deadlock but return
an error, while in the 2.4 series such a circular send will wait and
deadlock. Depending on the usage scenario this might form a problem.
CONFLICTING INTERFACES REPRESENT A major problem that has
to be dealt with. To approach this problem we have investigated a
smaller, but more accessible problem. The solution presented in this
dissertation is not a general solution in the sense that it can be
straightforwardly applied to other domains. Nevertheless, in identifying
solutions to a subset of a smaller problem, we have tried to keep
both the solution and the subset of problems as general as possible.
This allows us to identify guidelines which can help shaping new solutions
to other problems of conflicting interfaces. We will now discuss these
guidelines and explain them by giving an example of how the previous
identified problem of conflicting socket libraries might be solved.
This example will be presented in another font.
The first thing that needs to be done is to explore the environment.
The best way to get grip on a certain domain is to explore existing
real world conflicts and investigate what exactly causes the conflicts.
In this dissertation we have done this by identifying a set of concurrency
conflicts (chapter 6). The process of exploring the
environment should result in a set of variabilities (that need not
to be strictly orthogonal). In a later stage the variabilities are
used as input for defining the requirements and possible solutions.
Once this is done, we should check whether it is necessary to represent
these conflicts in a scaled down model. This might be necessary to
reduce the technical problems involved and/or to speed up the process
of testing conflicts (the prototyping cycle). In this dissertation
we have created our own component model instead of using standard
CORBA services or other available techniques (chapter 2).
Example: Our socket
case has already been described in section 13.6.3.
We have identified a number of variabilities such as
- what kind of conflicts do we want to investigate ?
- what are the variabilities ?
- what kind of test conflicts will be investigated ?
- what kind of prototyping model will be used ?
To identify a set of conflicts we can rely
on a number of different applications ported to different platforms
and kernels, such as we have done with the Borg mobile multi agent
platform. In this case we don't need an actual model, however in defining
the environment we can decide to start using a cross-compiler because
this makes it easy to quickly test different programs.
- blocking vs non-blocking calls:
a call can be either blocking or non-blocking. A non-blocking call
returns immediately, while a blocking call will wait until the kernel
has finished doing his job.
- signaling vs polling vs
waiting: will the kernel inform
us when something interesting (like an incoming packet) has occurred
or do we need to poll/wait constantly?
- syntactical conflicts:
how are the different calls named ? Are the same constants available,
what is their meaning ?
- initialization of
the libraries, opening of sockets
and shutting down.
- timing of different
calls. How long does it take before a timeout occurs ?
Once the environment is explored, it is time to look for a suitable
reference frame. A reference frame is part of the environment and
is common to all involved conflicts. The reference frame should be
chosen very carefully because, later on, it should allow for a measurement
of the correct working of an adaptor. In this dissertation, we have
used as a reference frame the assumption that the 'functional link'
between the involved components was compatible (chapter 7).
If we would have chosen a reference frame such as 'every component
can send a message', then we would not have been able to define what
a good working adaptor was exactly. Therefore, when choosing a reference
frame, it is important to find good answers for the following questions:
Example: In our socket case,
we try to make user applications, written for certain socket implementations,
compatible with other socket implementations, without modifications
to either the application or the kernel/library source code. We do
this by inserting an adaptor between the application and the used
library. Towards the application our adaptor will offer all the necessary
socket primitives and from the underlying kernel/library, it will
use the offered socket primitives. Our socket case has a number of
commonalities over different implementations.
- What is the goal of the adaptor ?
- Where do you want to place the adaptor ?
- What are the commonalities ?
- Which commonalities are useful ?
From these commonalities we will use the last one as
reference frame because the protocol is the largest common denominator
between different socket implementations and because it does not require
a second computer to agree to the protocol you are using.
- Everything is about communication
between to computers. Clearly
our adaptor works if it is able to communicate with another program
that makes use of any other socket library.
- Everything is also about what kind of data is sent
out over/received from, for instance, an Ethernet cable. If we could
compare the data one socket library puts on the cable with the data
another one wants to put on the cable then we have a common ground.
Once a) the goal of the adaptor and b) a usable reference frame have
been identified, it becomes necessary to explicitly state the requirements
of our adaptor. Before we can state the requirements more formally,
it is often necessary to introduce extra information. By analyzing
the variabilities and the problems of expressing a requirement, it
becomes clear what kind of extra information is needed. In a later
stage we will need this information to create the requirements for
Some standard questions for every variability:
Example: We go back
to the variabilities we have identified earlier.
- can we express it as a requirement ?
- if so, how easy-to-check can we make this requirement ?
- if not, what kind of information is missing ?
- can this information be obtained automatically ?
Summarized, we need a) message flow information
(captures blocking / non-blocking, signaling / polling / waiting and
syntactical conflicts), b) protocol information (which data is put
on the cable by which function) and c) timing information. None of
these is directly available in the standard syntactical API accompanying
- Clearly to be able to handle syntactical
conflicts, blocking vs non-blocking conflicts, signaling vs waiting
vs polling conflicts, we need the ability to alter the event flow
between libraries. This is only possible if we at least know
what kind of event flow is requested and/or provided.
- A second important issue is timing.
Socket libraries often rely on timeouts on all kinds of operations
(sends, receives, connection acceptance and so on). To be able to
react to such requirements it is necessary to understand them. Hence,
the extra documentation should contain timing information.
- Without the knowledge of when a certain block
of data is put on the Ethernet cable, it is virtually impossible to
understand the working of a socket library. Therefore, we need communication
Once the necessary, but missing, extra information is identified,
we should be able to state the requirements explicitly. To do this
we can again rely on the variabilities we have identified earlier.
Which of these variabilities does the adaptor need to understand ?
Can we define what a good working adaptor is for such a variability
based on the reference frame ? In this case it might be necessary
to re-investigate the reference frame, however we consider this to
be part of the prototyping cycle. When defining the requirements it
is important to distinguish between different kinds of requirements.
Example: We go back
to the variabilities we have identified earlier. We assume that the
protocol that is used by the conflicting socket libraries is the same
and compatible. We also assume
that the information communicated by this protocol is compatible.
So, the essential problem is in how the application communicates with
the socket library to achieve a certain effect. With this we are able
to specify the requirements informally:
- Some requirements can be statically checked (off-line), without
running the program, only by looking at the two conflicting components
a program can be generated that will satisfy the requirement. This
is a most favorable situation, however such a situation will not always
occur. In this dissertation we didn't encounter such a requirement.
- Some requirements can only be checked by running the program.
For instance, a requirement such as the liveness-requirement cannot
be automatically satisfied before program execution (off-line) because
it requires runtime-rewards from the client component.
- Some requirements cannot practically be checked. It should
be noted that it is not because a requirement cannot be verified (or
is too resource consuming to verify), that it is impossible to solve
it. In this dissertation, the no-race requirement, is such a requirement.
Checking whether no race-condition occurs at runtime is much more
difficult than solving race conditions. It is clear that this kind
of requirements is much more difficult to satisfy because it requires
expert knowledge of the domain and knowledge of standard programming
idioms to tackle these kind of problems.
In the above, 'event' refers to a call, a
return, a signal, a data block sent or received. This requirements
are relatively informal, if this adaptor would be made, a more strict
set of requirements would be necessary.
- The application should be able to use any
event structure it requires to realize a data stream. The application
will provide the adaptor with the required ordering of 'events'.
- The application should be able to use any
timing structure it requires. The application will provide the adaptor
with these requirements
- The adaptor should only use the provided
socket implementation to realize the required behavior. The socket
implementation provides a description of its event structure and its
Petri-net description of a socket net.
The missing information must come from somewhere. Either the user
of the adaptor supplies it, or the implementors of the different interfaces
have to supply it. Essentially where it comes from does not matter,
as long as it is there. To motivate users to supply the missing information,
it might be helpful to make it useful. In this dissertation we have
done this with our Petri-nets, they not only help in creating an adaptor,
but they also help in detecting all kinds of interesting features
of an interface/component (see chapter 3).
Secondly, the process of writing down the missing information should
itself not be more difficult than the process of writing an adaptor.
Therefore, it is necessary to carefully create a language that is
able to capture all essential information in such a way that it is
intuitive and expressive enough. However, here it should be avoided
to introduce a Turing-complete language because then a number of the
requirements might suddenly become impossible to verify. So, keep
an eye on how the requirements can be validated afterward.
Questions with respect to the missing information are:
Example: For our
socket case, which was missing event flow information, protocol information
and timing information, we will introduce timed Petri-nets, with two
special places, which we describe below. By doing so we can express
all the missing information.
- how can we make it useful to the one who needs to supply it ?
- how can we present it in a language that is natural to humans and
- how can we avoid a language that is Turing-complete ?
However, how we present these Petri-nets
to the end-user is another problem. In this dissertation we have already
created a Petri-net text format that is easy to use and can be expanded
to more difficult Petri-nets offering different kinds of behavior.
In this case we will need to add optional timing information on the
arcs. Also, we cannot expect that the end-user exactly knows the underlying
protocol (with the exact control data), therefore it might be a better
choice to simply store the actual data in these places. Also because
the use of buffers is an important issue in this case, it might be
useful to introduce such an abstract data type into out Petri-nets.
Also a necessity is the notion of a session, this will also be marked
within the tokens. An example is given in figure 13.5.
- event flow information will
help in avoiding conflicts between blocking / non-blocking calls,
signaling / polling / waiting architectures and syntactical problems.
- timing information.
By simulating the Petri-net in such a way that timers trigger certain
places it might be possible to implement 'required' timing information'.
By observing timers and detecting timeouts at 'offered' behavior,
it is possible to detect errors.
- communication: by
introducing two places, a special receive-source place and a special
send-sink place it might be possible to verify the correct working
of the communication link. One place is a source-place to receive
raw protocol data. The second is a sink-place that is used to send
out raw data.
Once the missing information is present and the requirements are known,
it is time to look for solutions. This involves finding out which
requirements can result in a compilation of an adaptor, and which
requirements require the insertion of a general algorithm. In both
cases it is important to decide what kind of internal representation
will be used. In this dissertation the internal representation of
the liveness-module and the enforce-action module was similar to the
Petri-net representation, however, the representation of the concurrency
module was slightly different and more tuned toward its efficient
When using learning algorithms, make sure to use an automatic process
to verify whether the representation is good. In the end, a computer
needs to use the representation to (re)act correctly to certain situations.
Therefore it is important to have a representation that quickly leads
to a correct behavior. In other words: the bias of the representation
should be measured and should be optimal. In this dissertation we
have verified our Petri-net representation of the liveness-module
by means of a genetic algorithm. E.g, the representation of the liveness
Once solutions have been identified to match the different requirements
investigate how these solutions can be modularized. Afterward investigate
what kind of topology is necessary to make the entire adaptor work.
Example: In our
socket case, the implementation of the adaptor might be similar to
the adaptor presented in this dissertation.
- how can every requirement be satisfied ? compilation vs interpretation
- when using learning algorithms: How to guarantee a correct bias ?
- can certain requirements be modularized ?
- what kind of topology will be used ?
Probably the only two modules necessary to
make this work is one module that communicates to the user level application
and a second module that communicates to the actual library/kernel.
The communication link between these two modules can be a common place,
on one side understood as a receiving place, on the other side understood
as a sending place. (figure 13.5 contains dotted transitions
and lines which link them both together).
- To guarantee a correct data flow, our adaptor
can investigate at the application side which data it places in the
output tokens. At the kernel/library side it can do a reachability
analysis to deduce how to send data over the socket.
- Solving timing information can be done by
inserting a Petri-net evaluator that knows how to handle timing.
The evaluation of these socket nets should
be able to handle the notion of threads: creating ones when necessary,
managing incoming threads and destroying others. This can be done
by using the 'session' knowledge present within the different tokens.
The above guidelines of course should be embedded in a typical prototype/experiment
cycle. With these guidelines it should be possible to create adaptors
for domains other than concurrency conflicts in open client server
architectures. Essentially , any place where one interface is fitted
with multiple implementations is a candidate for automatic adaptation.
An area that is not investigated in this dissertation are meta-conflicts.
These are conflicts that occur between the different interface-descriptions.
We have already touched this issue in section 12.7.1.
However, before the problem of meta-conflicts can be investigated,
a large number of adaptors based on extra formal documentation needs
to be available.
13.8.2 The problem of checkpoints
In our solution to the liveness problem we assumed that the programmer
can easily specify checkpoints in the source code. However, if this
is not possible, test-scenarios can be used. A test scenario specifies
which traces are considered to be 'good'. Based on this information,
rewards can be obtained. Illustration 13.6 shows
two test-scenarios. Scenario 1 shows that the client will send out
a Lock and expects a LockTrue in response.
Afterward an Act will be sent out and an ActDone
is expected to return. Finally an Unlock is sent out and
an UnlockDone should return. The second scenario illustrates
more or less the same, the only difference is that now two Act's
are sent out. The second scenario can be used to teach a learning
algorithm that after the first act message multiple other
act messages might be sent out. Given these two test-scenarios
one can easily implement a tracker in the Petri-net that will assign
rewards when appropriate.
Two test-scenarios to verify the correct working
of an adaptor.
If we go even a step further, it might be possible to convert a number
of test-scenarios automatically to the Petri-net description of a
component [EK98]. However such a process heavily
relies on a correct generalization. Given the two scenarios in figure
13.6, one might assume that the component will also
work for 3 or more acts at the same time. However, nothing ensures
this is the case. Depending on the algorithm another generalization
Instead of using Petri-nets and separate checkpoints, the developer
of the Petri-nets could also tag arcs with a priority, which would
resemble the rewards offered by the different components.
We have made the assumption that every future reward is statistically
correlated to the current state of the Petri-net involved. This was
necessary to demonstrate that the mapping of our liveness problem
to a reinforcement learning problem was well defined. However, if
we have made the assumption of a reward/marking correlation, we only
did this from a practical point of view: for the programmer it is
easy to specify checkpoints. In the conducted experiments we have
observed that in general this did not form any problem, however a
large scale investigation of this property would be very useful, especially
to learn the boundaries of our approach.
We have already seen that if the reward is not entirely dependent
on the locking strategy some oscillation in the values will occur,
on the other hand if the reward is used to express a hidden property
of the locking strategy it will not be able to learn it.
In this dissertation we did not fully investigate how peer to peer
concurrency could be managed. All the examples we have presented work
with only one server. As explained in section 12.7.2
there are some major problems involved with peer to peer concurrency.
The biggest problem of such an environment is that not all communication
can be mediated and that different components might communicate indirectly
with other components. To solve this problem adaptors themselves should
be able to coordinate their behavior and learn how to behave to support
an implicit present overall behavior. Theoretically, game theory might
Game theory [Nas50b,Nas50a,Nas51,Nas53] requires agents that
choose a discrete action. Depending on the action they choose they
can either win or lose. The outcome of a game not only depends on
the action chosen by one agent but on the interaction between all
the chosen actions (this is typically the situation in peer to peer
networks). A game is typically characterized by 5 elements:
The 5 elements that characterize a game are present in typical peer
to peer networks, so game theory might be a good start to investigate
the problem of conflicting interfaces in peer to peer networks.
- the players, how many are there ? Is there a chance-player
(or is there some random element) ? In our situation these could be
the components and the delay times when sending messages.
- a set of possible actions for every player. This largely depends
on what kind of possibilities a component has. Probably this will
be mapped onto the sending of a message or waiting with sending the
- the information players have available when choosing their
actions. It is clear that no adaptor in a peer to peer network will
have all the information.
- a measurement which describes the payoff for all combinations
of actions. It should be possible to embody the overall required behavior
into some kind of reward system.
- a description of what every player tries to accomplish. These
are the requirement of every agent. In this dissertation we have assumed
that every agent tries to accomplish the same.
13.8.5 Learning an Interface Description
As explained in section 12.3, all investigated
concurrency interfaces behave as simple finite state machines. Therefore
it might be possible to deduce the behavior of an interface in an
automatic way. For instance, a learning algorithm could, given an
API of a component's interface, try out which actions are possible
at which time and construct a Petri-net description of the interface
To do so a program , that is supposed to learn a Petri-net
description of component , which offers only a syntactical
interface description , could follow different strategies:
The above are some ideas about how one might try to learn the behavior
of an interface automatically. Whether it is truly possible to learn
the behavior of an interface automatically remains an open question.
If we would be able to do this, then the impact of our work would
be substantially greater, because then we would be able to mediate
differences between components in an even more automatic way. As a
trade-off, the advantages of using a formal interface specification
would be lost. We would no longer be able to verify the completeness
and consistency of an interface description.
- learn hidden variables of by only accessing .
The problem of modeling the behavior of by only looking
at is essentially finding out which variables are useful
to describe the behavior of over . The problem
is that all too often these variables are hidden within the semantics
and not actually present at . Therefore two ideas might help
- observe the behavior of a standard communication and find out
hidden variables by statistical analysis of the message flow.
- generation of test-messages by to validate and/or test
- useful to verify whether a hidden variable is truly a distinct state
and not merely a coincidence.
- useful to check the boundaries of variables.
- directly observe the component . Instead of looking
for hidden variables it might also be possible to directly investigate
the behavior of the component.
- observing its binary state before and after handling a message.
This avoids the use of looking for hidden variables but can easily
create too large models and describe states that are entirely redundant
to model the behavior of .
- observe the control flow within the program when handling a
message. This should make it possible to detect whether all branches
of the component have been investigated and possible how other branches
can be investigated. This might help in creating the expressions on
the arcs of the Petri-net.
- model generation & verification. Generate a random model .
Afterward can verify, by checking consistency and completeness
whether is a correct model of . To do so, genetic
algorithms/programming might be useful (given the correct bias of
course). After verification of a model , crossover and mutation
could help in refining and creation of a better model.
- Petri-net representation in such a way that
- the Petri-net it is not too small. For instance, one place that captured
the entire state of the component, with 5000+ transitions describing
every possible message is clearly not a good Petri-net
- the Petri-net is not too large. It is also useless to create a place
for every memory-cell available to the component.
- the representation should only contain places to model a certain requirement
such as i) when a certain message can be send by only looking at the
client, or ii) when a certain message can be send, looking at the
server, or iii) to capture the state of a component such that it can
be rolled back at a later time, or others...
- automatic reduction of the Petri-net might help in removing obsolete
states, not necessary to model .
13.8.6 Determinism versus Non-Determinism
This dissertation focuses on the generation of intelligent adaptors.
We made the assumption that we have the concurrency strategies specified
as a Petri-net. From a pragmatic point of view, this is very useful
because it offers the developer all kinds of interesting information.
However, the essential reason why we need Petri-nets is to
introduce determinism and avoid that the liveness module could bring
the client components in an invalid state by sending out a wrong message.
E.g., if the client component does not know that before the first
setPosition, a joinActor needs to be sent,
the client component might fail to work properly.
In the enforce-action module we also need a similar deterministic
property. This module needs to know how it can bring the server
component in a certain state. Here we assume that the server is well
written and that every possible synchronization action that can be
invoked upon the server can easily be undone. Of course, to avoid
denial of service attacks and to increase its robustness, every good
server offers this kind of functionality. However, in general, the
concurrency module needs be sure that it can deduce deterministically
what to do and that it will always be able to reach a certain
Within the concurrency module a similar deterministic problem
occurs. However this problem cannot be solved easily. The concurrency
module resolves race conditions. However, it is unable to resolve
deadlocks when it detects them. The reason is that it is unable to
anticipate a deadlock, because it has no knowledge of it, and
that, once it has detected a deadlock it is unable to undo previous
actions on the involved components.
From these three modules we see how required determinism poses
problems. During this dissertation we have tried to generate adaptors
that work without making mistakes that cannot be revised afterward.
The reason why we imposed such requirement is that all real-world
computation happens in a controlled non-deterministic (hence
deterministic) way. If two components communicate, there are exactly
two components and the functionality offered by those two components
is exactly understood. If an adaptor wants to mediate the differences
between those components it should be deterministically correct,
otherwise it would certainly bring disaster upon one of both
(if not both) components. This deterministic view on computation is
an illusion, for in open distributed there are many opportunities
when a component might fail; all too often the reasons lie within
uncontrolled upgrades or unreliable infrastructure. Nevertheless developers
still hold on to an illusion of determinism and construct programs
that behave completely deterministically.
In this dissertation we made exactly the same assumption. However,
one can also approach the problem from the opposite side. One might
be tempted to embrace the non-determinism found in open distributed
systems and investigate how different but similar working components
might emerge behavior that satisfies certain requirements.
By exploiting non-determinism one might create programs, which act
creatively and intelligently, because they would be truly able to
find and apply clues in the environment which a deterministic process
cannot see. This kind of research however is still barely started
and certainly not available in the field of 'services' offered on
the Internet. Some pointers might give a lead such as swarm intelligence
[KRCE01], amorphous computing [HAAC+00]
and cellular automata [TM87].
- The version for Macintosh and PalmOS have been omitted but can be
found on the site http://borg.rave.org/cgi-bin/borgcvs/borg/cborgcore/.
The different files are named StdLinux.h, StdLinux.c,
StdWindows.h, StdWindows.c, StdPalm.h,
StdPalm.c, StdMacintosh.h and StdMacintosh.c.