Subsections
12. Discussion
IN THIS CHAPTER we validate our approach by observing the
experimental results. First, we will briefly summarize our approach
and the results. Second, we make some important observations based
on the experiments and third we present some technical limitations
of the presented work.
WE WILL NOW BRIEFLY summarize our approach and the experimental
results. In our approach we created a general concurrency adaptor
that can be used to mediate the differences between conflicting concurrency
strategies: one server and multiple clients. As stated in the introduction
section 1.3.5, the adaptor we wanted to create
should be able to mediate concurrency conflicts in such a way that
no conflict arises. We have defined a conflict as a situation in which
the mutually agreed behavior between the different communicating partners
cannot be executed over the interconnection between them. In our case
study, the whiteboard, the mutually agreed behavior between the different
actors was not to cross each others boundaries.
The adaptor itself contains three modules. The first module is an
enforce-action module which bypasses the concurrency behavior of a
server component. This module makes use of a prolog program. The second
modules is the liveness module. For every client component present,
a liveness module will be created. Such a liveness module bypasses
the required concurrency strategy in such a way that the client component
can continue with its core functionality. The liveness module has
been implemented by means of a reinforcement learning algorithm. Experimentally
we observe that this approach is good if the client component
returns correct rewards (that is rewards that are statistically correlated
to the state of the involved Petri-net). The third module is the concurrency
module, which takes care that no race conditions occur within the
message order between the participating clients and the server. This
module has been explained in chapter 10.
12.2 Observations about the Concurrency
Adaptors
Table 12.2:
Overview of the conflicts and the adaptors.
Conflict |
Adaptor |
Result |
Description |
Possibly |
Module |
|
|
Deadlock |
Liveness |
Concurrency |
Enforce |
|
6.1: Simple Syntax |
|
Ok |
Ok |
Ok |
Ok |
6.2: Parameter Encoding |
|
Ok |
Ok |
Ok |
Ok |
6.3: Simple Reentrancy |
Yes |
No Choice |
+Dead |
Ok |
Ok |
6.4: Async Reentrancy |
|
Ok |
Ok |
Ok |
Ok |
6.5: Simple Control flow |
|
Ok |
Ok |
Ok |
Ok |
6.6: Idle Control flow |
|
Non Markov |
N/A |
Ok |
Fail |
6.7: Waiting client, Non-waiting
server |
Yes |
No Choice |
+Dead |
Ok |
Ok |
6.8: Nested line server, square
client |
|
Ok |
Ok |
Ok |
Ok |
6.9: Non-nested line server,
square client |
|
Ok |
Ok |
Ok |
Ok |
6.10: Square server, field
client |
|
Ok |
Ok |
Ok |
Ok |
6.11: Layered server, non-layered
client |
Yes |
Ok |
+Dead |
Ok |
Ok |
6.12: Layered client, non-layered
server |
|
Ok |
-Dead |
Ok |
Fail |
6.13:Transition |
|
Ok |
State |
State |
Ok |
6.14: Nested Transition |
|
Ok |
State |
State |
Ok |
6.15: Layered Transition |
|
Ok |
State |
State |
Ok |
6.16: Transactional Client |
|
Ok |
State |
State |
Ok |
6.17: Empty server |
Yes |
No Choice |
+Dead |
Ok |
Ok |
6.18: Upscale granularity |
Yes |
Ok |
+Dead |
Ok |
Ok |
6.19: Downscale granularity |
|
Ok |
Ok |
Ok |
Ok |
6.20: Waiting clients, non-waiting server |
Yes |
Ok |
Ok |
Ok |
Ok |
6.21: Multi-multi conflict |
|
Ok |
Distri |
Ok |
Out Of Scope |
6.22: Multi-multi conflict 2 |
|
Ok |
Distri |
Ok |
Out Of Scope |
|
IN TABLE WE give an overview of the
discussed conflicts from chapter 6. The ``description''
column describes which conflict we are covering. The ``possibly
deadlock'' column contains 'Yes' if the approach used by the clients
involved in the conflict could possibly lead to a deadlock. (for instance,
a waiting locking strategy). The ``Liveness'' column specifies
whether the liveness module works on the involved client. If a 'No
choice' is mentioned, the liveness module has not much choice to keep
a component alive. For instance, a waiting locking strategy, where
a lock request can only result in a lockTrue.
'Non Markov' in this column means that the rewards are specified in
such a way that the Markov property does not hold. The concurrency
column specifies how the concurrency adaptor works. '+Dead' means
that the concurrency module could possible deadlock, if the
involved components exhibit such a behavior that might lead to a deadlock.
'-Dead' means that the concurrency module will possibly introduce
deadlocks, even when the client components will not lead to a deadlock.
'State' means that not only the resources need to be considered, but
also the content of the resources. The ``Enforce'' column specifies
how the enforce-action module performs on the server involved in the
conflict. The result column states whether the overall behavior will
be correct or not.
We will now observe some of the behavior of the 22 adaptors:
- The idle control flow conflict (6.6)
cannot be mediated because the rewards are non Markov, therefore it
is impossible to get the liveness module to work.
- The layered control flow conflict 6.12,
occurs because a client expects a field-lock to be available, while
the server doesn't offer one. This conflict can be mediated. However,
if multiple similar clients join, the result might deadlock, while
the clients are specifically designed to avoid deadlocks by first
obtaining a server lock. The reason why this layered conflict cannot
be mediated is because the resource (the server-lock) is a virtual
resource.This is discussed in more detail in section 10.6.6.
- The transitional conflicts (6.13, 6.14,
6.15 and 6.16)
can be mediated if the enforce-action module and the concurrency
module take into account the state of the Petri-net places that are
common to the involved components.
- The two multi-multi conflicts (6.21 and 6.22)
cannot be mediated because the concurrency module is not designed
to be able to handle multiple servers.
Given the required mutually agreed behavior between the different
actors: not to cross each other's boundaries, we observe that in most
cases (18 of the 20 tested conflicts) the concurrency strategy is
mediated correctly and allows the components to execute as they intended.
Below we will continue with some further observations we can make
with respect to these results.
12.3 Concurrency Strategies are Very
Simple
AN OBSERVATION WE CAN MAKE about the current concurrency
strategies is that they are essentially very simple. Even the rollback-able
concurrency strategies, which are not necessarily easy to implement,
are often nothing more than finite state machines. This statement
is based on two observations. First, our reachability analysis can
deduce very quickly how to bring a server component in a certain state.
Secondly, our liveness module learns very quickly what to do in a
certain situation. This is to be expected to a certain extent because
we have carefully created the representation of our situation-recognition
system. However, this representation itself is also nothing more than
a finite state machine.
If we think about interfaces (not only concurrency interfaces), it
is not commonplace to hide features and functionalities within the
API. On the contrary, most interfaces are designed to make it easy
to access all the necessary operations and modify the state of an
object/component.
From these two observations one might think that it would have been
more appropriate to use finite state machines to describe concurrency
strategies. This is not the case because the concurrency strategies
themselves often contain parallel resources that act independently.
This is very difficult to express with standard finite state machines.
In section 3.7 we have given two examples
that demonstrates the need and advantages of using Petri-nets.
IN THIS SECTION WE DISCUSS some technical limitations and
observations we can make, based on the experiments we have performed.
A limitation to our work is the problem of late joiners. We did not
investigate what will happen when a new component joins a set of already
communicating components. Currently, this situation is not allowed.
However we feel this will not constitute an obstacle. A keep-alive
module can be created for every joining component and the concurrency
strategy can be extended to allow new components to join.
During our research we implemented a new component system. We conclude
from our experiments that this component system offers a lot of advantages,
especially with respect to glueing components together. Its high level
reified message representation, together with its disciplined non
blocking messaging system allows for easy adaptor creation. In comparison
with object oriented languages, where the problem of glueing together
different libraries and programs is well known, this is an advantage.
For instance,
- The component system allows for writing adaptors in an easy, understandable
way as discussed in chapter 2. Compared to object oriented
languages, the problem of late binding, polymorphism an inheritance,
which complicates the writing of adaptors, is not present.
- The component system allows for transparent distribution because the
model itself doesn't follow 'call' semantics and is connection based.
In appendix 13.8.6 we discuss how difficult it would have been
if we would have used something like Java RMI to do this work.
During the experiments we have used an event based component system.
This allowed us to write adaptors easily and is a model representation
of an open distributed system. However, open distributed systems are
not the only area where conflicting interfaces occur and not all applications
are written by using an event based approach. If we look at applications
that make use of object oriented frameworks then we encounter similar
problems because not always all upgrades of part of the application
are entirely backward compatible [LSMD96]. A small alteration
in the provided behavior or a wrong estimate of the usage scenarios
can result in a complete application breakdown. This also happens
for applications that dynamically link to external libraries (.DLL's
offering extra functionality or plugins). Typical for this kind of
object oriented application is its seemingly 'sequential' behavior
over object boundaries. In fact, this kind of application is often
written within a thread based context and they rely heavily on a shared
memory (in the broadest sense of the word). However, the use of a
shared memory as a communication medium between threads (or between
different parts of an application) makes it very difficult to isolate
the communicating partners and modify the flow of information between
them. Since it is necessary to modify the information flow to resolve
any conflicts it might be very difficult to use a similar approach
as the one we have presented. Appendix 13.8.6 contains an extensive
explanation of the difficulties of using such a thread based model.
Another observation that can be made is that a randomized testing
procedure (such as is implemented by different learning algorithms)
is the best guarantee to write well-tested and robust software. Actually,
our experiments took longer than expected due to the fact that our
learning algorithm always created unanticipated situations. This happened
in more than one area as discussed below:
- Petri-net evaluator: We are now able to present a Petri-net
evaluator which has been thoroughly tested, both the evaluation of
all kinds of random expressions as well as the evaluation of randomized
Petri-nets has been tested.
- Implementation of Actors: On the actor level a lot of unexpected
interactions were detected, such as in the flood fill algorithm 25
on page . During all our tests
we were focused on a flood fill that slowly increases its terrain.
However we specified a Petri-net that simply declares a non-nested
locking strategy. This means that every lock can return a lockFalse.
The checkpoint (1c) in the algorithm doesn't expect a possible lockFalse
and will fail if a lockFalse is returned in the line before.
This was only found out after measuring the fitness of the adaptor.
This has led us to use genetic algorithms not as a means to solve
the problem of conflicting interfaces, but rather as a tool which
helps in creating a suitable representation and which has helps in
black box testing our Petri-net evaluator.
THE USE OF PETRI-NETS AS AN INTERFACE DESCRIPTION LANGUAGE
allows for a better software development process. Not only is the
code documented in a readable and formal way, the formal properties
of the documentation allow for automatic testing of the underlying
component. Given the Petri-net and a correct component we then can
- check completeness: A testing program can take random walks
through the Petri-net and check whether the underlying component works
as expected: it does not crash, it does not end up in an invalid state
and so on. When using timed Petri-nets, the timings can be verified
and when needed memory-requirements can be verified against executed
transitions.
- check consistency: A testing program can verify whether all
incoming and outgoing messages from and to the component are consistent
with the Petri-net. For instance, it can check that all messages sent
out by the component are enabled within the Petri-net.
- track behavior of external components: By inserting a tracing
component (section 3.6.3) it is possible to
verify whether external components honor the messages they are allowed
to sent.
- If we even go a step further, the Petri-net can help in guarding
access to the component, by disallowing unexpected messages.
- do a formal analysis to guarantee that no dead ends can be
reached within the component. It is also possible to verify whether
there is a home-state that is always reachable from any possible reachable
state. This is important because this guarantees that a component
can be reset at any time.
It is clear that the advantages offered by using a Petri-net description
of an interface can outweigh the difficulties of writing one.
ALTHOUGH THIS DISSERTATION MAKES NO claim about performance,
we will briefly discuss the performance of the entire adaptor. To
do so we will discuss three separate estimates. For the entire adaptor
we will investigate
- : how much time is spend in handling one message ?
- : how much memory is required by the adaptor ? We will
express this in function of the minimum memory requirements of every
involved component.
- : how many messages are sent for every single message
from a client to a server and vice versa.
To make these estimates we will investigate the timing and memory
behavior of the three different modules. Afterwards we will investigate
the message count. To be able to make these estimates we will need
a number of variables:
- The number of server messages sent will be denoted .
A message is a server message if it is sent out by the server towards
one of the clients.
- The number of client messages sent will be denoted .
A message is a client message if it is sent out by a client towards
the server. The refers to the sum of all messages
transferred from any clients to the server.
- which is the number of clients involved.
- In the upcoming discussion we will sometimes need an estimate of how
long it takes to verify which transitions are enabled in the Petri-net.
Theoretically (as explained in section 3.4.4)
this is exponential because one might need to verify all combinations
of tokens versus free variables. However, in practice such situations
seldom arise. As we have observed experimentally, the verification
whether a transition is enabled can be done in . However, in
the upcoming discussion we will consider
to be the
mean time over all Petri-nets to verify whether a transition
is enabled. Afterward, we will, depending on whether we want a worst
case estimate or an expected estimate, fill in
with
an appropriate expression.
- We also need a description of the memory-requirements of a Petri-net.
We will denote this size or , depending
on which Petri-net is involved. Intuitively the size required to represent
a Petri-net is at most the size needed by the underlying component
itself. For instance, in our case the server Petri-net describing
the state of the whiteboard (of size by ) will need
tokens. The server component itself also needs to keep track of the
same information. Petri-nets describing more different states
than recognized by the involved components are less useful (see Petri-net
guidelines, section 3.8).
We will now start with discussing the time- and speed- estimates of
the liveness module, the enforce-action module and the concurrency
module.
Giving a performance estimate of the liveness module is difficult
because the performance of this module largely depends on how well
trained it is. Initially it might need a large number of messages
and rewards to understand which behavior exactly is required by the
client component. However, once a correct behavior is installed it
will behave more or less constantly, depending on how much exploration
is allowed. Here, we will only consider the case where the liveness
module exploits correctly learned information.
To give an estimate of the time spent within the liveness module,
we observe that the liveness module needs to find out a) which transitions
are enabled and b) select from the enabled transitions an appropriate
one by means of Q-learning. Step a) can be done in
by using the locality property of Petri-nets. Step b) can be done
in because the number of active 'added' transitions is kept
constant. During execution of the liveness module, non-fit transitions
are removed and new transitions are added. The process of removing
a transition is and the process of adding a new transition
is also . This results in a time estimate of
.
The memory requirements of the liveness module are the memory requirements
of the Petri-net describing the state of the client component and
the newly added transitions. Because these newly added transition
are bounded to a certain maximum, we consider it to be a constant.
The size of the Petri-net is bounded by . Hence
.
Later on, when giving a performance estimate of the entire adaptor
it should be taken into account that the concurrency adaptor contains
multiple liveness modules. Therefore we will add an extra index to
the liveness measure:
and
Giving a performance estimate of the enforce action module is even
more difficult. We currently rely on a prolog implementation to decide
how to enable a certain action. This makes it difficult to give correct
estimations. Potentially both the memory estimate as well as the timing
estimate might be exponential in function of the necessary search
depth,
. So the worst case time estimate
is
. The worst case memory estimate
is
because we also
need to take into account the memory-requirements of the server side
Petri-net.
However, as observed during the experiments the search depth is always
very small and most concurrency strategies offer the possibility to
reach a certain state quickly. Therefore we will consider a practical
time estimate of
. And, similarly, a practical
memory estimate of
.
The concurrency module needs to keep track of all the enabled transitions
for every client component indirectly connected to it. For every client
a set of controlled transitions and managed transitions will
be kept in memory. The Petri-net of the server is not necessary within
this module. This gives a memory estimate of
.
The time spent within the concurrency module is a constant. For every
incoming request a cross-table is checked and a decision is taken
immediately. This leads to
.
12.6.4 Message Sends
To determine how many messages are sent between the client components
and the server component we will consider two cases: functional messages
and synchronization messages.
Every functional message going from a client toward a server will
go through several connections:
- the ``client component
liveness module'' connection
- the ``liveness module
concurrency module''
connection
- the ``concurrency module
enforce action module''
connection
- the ``enforce action module
server component''
connection
Messages coming from the server and going to the client will pass
the same stages in reverse order. This means that for functional messages
the number of messages equals
. These
results can be slightly improved by merging the concurrency module
and the action-enforce module into one module at the server side.
Hence, we could have an estimate of
.
The behavior of synchronization messages is not as simple as the behavior
of logic messages. The problem comes from the fact that synchronization
messages are not simply passed through from module to module, but
are a) altered to represent the possible choices between the liveness
module and the concurrency module and b) are not present between the
concurrency module and the enforce action module. However, this observation
does not prohibit us from making a global observation about the synchronization
communication: We know that the synchronization messages introduced
by the adaptor is minimal to support all components: the enforce action
module will only send out the necessary synchronization messages,
while the liveness module will have learned which synchronization
messages are favored by the client. From this observation we know
that the adaptor will not introduce useless synchronization messages.
This already allows us to give an estimate of
.
However, to estimate how many extra messages might be added we give
a trace of a synchronization request from client to server:
- client component
liveness module: the client component
sends a synchronization message to the liveness module: .
- the liveness module
concurrency module: communicates
the changing state and possible choices to the concurrency module:
.
- concurrency module
liveness module: in the worst
case scenario, the concurrency module must always make a choice and
communicate it back to the liveness module: . In this
performance estimate we will assume that this will only occur when
the real server would in practice also return a message, hence: .
- the liveness module
client component: the liveness
module sends back a synchronization message to the client component:
.
The communication of messages from the server to the client happens
only when a certain functional message arrives at the action enforcer
for which synchronization messages are required. Such a message trace
looks like:
- enforce-action module
server component: when necessary
a message will be sent to bring the server in the required state.
So, we can assume that this is correlated with the number of client
synchronization messages: .
- server component
enforce-action module: this is
captured in the variable.
This results in a number of messages of
From the results of the functional messages and the synchronization
messages we can conclude with an estimate of
.
From the previous performance estimate we can now summarize the worst
case performance:
If we consider, as explained before, that the time to verify a transition's
state (enabled vs disabled) is constant and we assume that the prolog
implementation is able to deduce a correct path in then we
can give an expected estimate of:
This means that we expect that the adaptor needs as much memory as
required by the different states of all involved components.
Also, the time necessary to handle one message is constant.
IN THIS SECTION WE WILL DISCUSS our initial motivations and
compare them with the solution we have presented. Our main focus here
will be to investigate how our prototype could be made to work within
the context of open distributed systems, multi agent systems and embedded
systems.
12.7.1 Open Distributed Systems
Our adaptor as it is now does not run automatically in open distributed
systems. Below we discuss two obstacles still remaining. The first
being adaptor deployment (or who decides to run an adaptor) and the
second being the problem of meta-conflicts (conflicts between the
involved Petri-net descriptions).
A very simple fact about our adaptor is that, before it will be used,
somebody has to insert it somewhere. We will now discuss
the implications of this.
Typically, the client is the first one to observe any conflicts with
the server. This means that the client will be the first to want
to introduce an adaptor. This is not necessarily easy because
- To insert an adaptor, extra formal documentation, describing the behavior
of the client component, is needed. If the software used on the client
does not provide such a documentation then it might be difficult to
run an adaptor.
- Our adaptor needs the possibility to isolate the server from its environment.
Hence, the client side cannot solely take the decision to insert the
adaptor because it requires cooperation of the server side.
The server side on the other hand is often the last one to observe
any conflicts and might not want to introduce an adaptor for a number
of reasons:
- The server administrator needs to make the decision to insert an adaptor.
Technically speaking this doesn't form a problem. However, (s)he might
feel the clients should upgrade.
- The performance penalty of the adaptor is considered to be too high.
- Inserting an adaptor might requires a quality tracking process (how
well does it work, are there any performance penalties, does it introduce
bugs, is the end-result more stable than the initial situation and
so on) which might take too long a time.
- The extra formal documentation, describing the behavior of the server,
might not be available.
- The required extra formal documentation might not be offered by the
clients. In such a situation the server needs to 'know' the correct
definitions of different clients. This results in conflicts because
the server-side interface descriptions can be incompatible with the
actual clients.
Some of these problems are relatively subjective and closely linked
with the involved business model, however two major observations still
remain. Firstly, not everybody might agree to follow our standard
of Petri-nets as a formal documentation technique. Secondly, the formal
documentation itself might lead to conflicts. We will discuss these
two problems below.
One of the motivations behind this research was the fact that standards
in open distributed systems are sometimes merely used as a means to
stay in control of how software is used instead of being a means to
communicate with other software. If the intention of a company is
to be incompatible to force partners to make certain decisions, then
there is nothing that will prohibit such a company from doing
so, not even Petri-nets. However, the description of this behavior
was only used as a motivational example and not as the problem we
wanted to solve.
On the other hand, companies that want to be compatible with other
software will find that our proposal of using Petri-nets offers a
big advantage that renders their usage worthwhile. The strength of
the documentation format comes from automatically checking completeness
and consistency. Completeness being that all messages that a component
understands and uses are described within the Petri-net. Consistency
being that the documentation is compatible with the behavior of the
component. Both properties can be checked automatically. In other
words, Petri-nets help the developer in the software development process
by offering better and more accurate documentation.
12.7.1.3 Meta-Conflicts
Figure 12.1:
Stack of adaptors
|
In section 1.1.1 we argued that defining one standard
in open distributed systems is unrealistic because competitors will
always try to adopt the standard and unexpected incompatibilities
will arise. This problem could also occur between the different formal
interface descriptions. This kind of conflicts are meta-conflicts.
The problem of meta-conflicts is wide ranged; from slight discrepancies
between the Petri-net descriptions to conflicts with entirely new,
possible better, formal descriptions.
If the formal interface description are entirely compatible then it
might be possible to create a super interface description that can
express all details of the different formal descriptions. The only
thing that needs to be done then is implementing converters from the
different interface description languages to this super interface
description language. However, it could also be possible that required
information in one interface description language is simply not present
in another interface description language. This would complicate this
method. However before this kind of meta-conflicts can be investigated,
a number of realistic conflicts between interface description languages
need to be investigated. Only then can this problem be approached
pragmatically. This remains future work.
If there are only slight differences between the Petri-net format
used, or between the terms used within the Petri-net then other approaches
can help:
- meta-adaptors based on Petri-net evaluation. Such a meta-adaptor
is pictured in figure 12.1. If we would create
an adaptor similar to the one we have created for conflicting concurrency
strategies then we need a 'common ground' that can be used to verify
the behavior of the different Petri-nets. In our case, the common
ground was the compatible functional behavior. In the case of conflicting
Petri-nets a common ground could be the fact that a Petri-net can
be verified for completeness and consistency. By tracing the incompatible
net and recreating a compatible net during such a trace it should
be relatively easy to convert an incompatible net to a compatible
net. This can be investigated in more detail in future work.
- automatic deduction of the states and transitions within a
component by inspecting the code and data segments in a typical communication.
If we could avoid the need for Petri-nets of an interface description
at all, then the problem of meta-conflicts might be avoided. How the
interface of a component can be learned is discussed in section 13.8.5.
12.7.2 Mobile Multi Agent System
A second motivation for doing this work was the problem of concurrency
management in mobile multi agent systems. Because communication happens
in a peer to peer manner between agents, we now discusses how our
solution relates to the problems of peer to peer systems. First, we
discuss how our adaptor preserves the required behavior if no central
locking server is used. Second, if a central locking server is used
to coordinate the behavior of a group then we see how our approach
fails to offer a solution.
In this section we illustrate how our adaptor correctly mediates concurrency
problems in a peer to peer application that makes no use of a central
coordination server.
Figure 12.2:
Peer to peer communication between
agents in a mobile multi agent system. The arrows indicate which agents
require behavior from another agent. The left figure is the normal
interconnection without inserting adaptors. The right figure is the
same interconnection but with adaptors inserted at the server ports/interfaces.
|
In a mobile multi agent system there is at first sight not really
a difference between client-components and server-components. However,
if we look at the behavior they require or provide (or in other words,
which agents implement a certain functionality), then we can actually
make the difference between a client component and a server component.
A client agent will expect a certain functionality from another agent,
while a server agent will provide a certain functionality. However,
contrary to a typical client-server setting, in this situation every
agent can be a server as well as a client. To easily support such
a model we will assume that the server behavior of such an agent is
offered on one port, while the client behavior is offered on another
port. This allows us to introduce adaptors on every agent (peer) that
offers a service. The resulting interconnections are pictured in figure
12.2.
Figure 12.3:
Deadlock over the different adaptors in
a peer to peer situation.
|
Because every agent can be a client as well as a server, it becomes
difficult to identify a 'session'. In the typical client-server architecture,
the client manages a session and tries to get things done on the server.
Within a peer to peer system, sessions hop from one agent to another
and cannot be localized anymore in one agent. This might lead to deadlocks
between communicating peers, without the possibility for the adaptors
to do something about it. This is illustrated in figure 12.3.
The adaptors cannot resolve such a deadlock because they are part
of the waiting loop. However, this problem has little to do with the
ability of the adaptor to 'mediate' the conflicting behavior. In fact,
if the application is programmed in such a way that it relies on concurrency
strategies that lead to unwanted behavior then this unwanted behavior
will also occur with adaptors, otherwise it will not.
In a peer to peer application there can also situations where our
adaptor fails to work. The example we present to illustrate this uses
a central locking server that is used by all components to identify
sessions and control access to the application's resources.
Figure 12.4:
Distributed Transaction server.
|
In peer to peer systems it is often necessary to bring multiple peers
from one consistent state to another consistent state. This however
requires from every peer the ability to obtain a number of locks,
distributed over different peers. Once all the locks are held the
session can apply the changes on the involved resources and unlock
them again. Often this is done by passing session id's from one peer
to another such that receiving peers can verify whether the resources
they hold can be accessed by the incoming session. However, the session
id's themselves are often provided by other components than the receiving
peer itself. For instance, consider a transaction server such as pictured
in figure 12.4. The transaction server
will be used by any peer that needs a transaction id (this will be
our session id). This transaction id will then be passed from peer
to peer (without going through the transaction component).
In this setup, multiple concurrency strategies can be present. First
there are the concurrency strategies offered by one peer towards another
peer, secondly there are the concurrency strategies offered from one
peer toward the transaction server and thirdly there is the concurrency
strategy offered from the transaction server toward the peers. We
will assume that we have mediated the conflicts by placing at every
server interface a concurrency adaptor. This might lead to problems:
- Our adaptor requires the availability of a compatible functional behavior.
This poses a problem because the only functionality our central transaction
component offers is about synchronization. If the transaction component
is hardwired with respect to the resources then we can use the common
set of resources to be the functional link between them. However,
if the transaction component uses first class resources: i.e.
resources that are published by clients and recognized by the central
server, then it might be very difficult to find any useful common
behavior.
- If peers that have no notion of session id's are pulled into the application
then our adaptor will not provide them with this notion. This means
that their resources will be freely available to everybody wanting
to lock them, effectively destroying the use of the central transaction
server. So, if we allow components to join in an 'open' way then this
kind of conflict cannot be mediated.
- Even if peers have a notion of session id's then we still have a problem
because the application is using first class session id's and the
information about such a session is localized at one central place,
while in fact all the concurrency adaptors on the peers need to know
about these session id's. E.g: if agent E requests a session id from
the transaction server, then the Petri-net in the liveness module
connected to agent E, will recognize the session id to be owned by
agent E. However, if this session id is passed to agent A, then this
agent will be completely lost, because the Petri-net in the liveness
module of agent A, connected to agent E does not recognize this session
id.
- If we would be able to resolve these problems, (by not using first
class session id's or first class resources) another problem might
occur: synchronization races due to hidden communication. Because
our adaptor might delay the passing through of a synchronization request
until really necessary, a resource might not yet be locked if the
agent wants to access it through another channel. Even worse, this
resource might never become locked if the agent doesn't access it
directly on the transaction component.
From these examples we see that a) first class resources, b) first
class sessions, c) hidden communication and d) the possibility that
the concurrency strategies simply cannot be mediated, still form a
major obstacle. Investigating how these problems can be solved remains
future work.
One of the motivations behind this dissertation were conflicts between
connected embedded systems. The rise of smaller and mobile embedded
devices increases the chance of encountering interface conflicts.
In this section we investigate how suitable our solution would be
for this kind of domain. First we must note that the biggest difference
between desktop software and embedded software is the constraints
put on it. Among others, there can be memory-constraints, speed constraints,
bandwidth/latency constraints and real time constraints. Secondly,
aside from these constraints, there is also an entire process involved
in creating embedded systems. Typically first a prototype is developed
which is, as the project continues, scaled down into a much smaller
artifact. During the entire process, quality control is of utmost
importance, because, especially for consumer devices, it is not always
possible to fix a bug once the device has been sold [SEE99].
From a high level point of view this provides an opportunity for our
adaptor because it decreases the likelihood of bugs introduced due
to concurrency strategy conflicts. However, as embedded systems are
developed pragmatically, we will now discuss how it would be possible
to reduce the requirements of our adaptor in such a way that it could
fit within certain constraints.
Very often embedded systems are limited in their memory. There are
two approaches that might help in reducing the memory requirements
of the adaptor:
- Given the performance estimation earlier
we see that the adaptor requires 4 times as much memory as for every
Petri-net describing the behavior of a client. For desktop software
this doesn't pose much problems, however for embedded systems it would
be nice to be able to reduce this number. Collapsing the different
modules into one might help, not because it reduces the amount of
required 'information', but because the overhead of storing this information
can be reduced. For instance it is possible to store the information
on whether a resource is controlled and/or managed into the transitions
of the Petri-nets themselves.
- A second important place where storage can be optimized is our prolog
module. The prolog interpreter is needed to determine how we can force
the server to be in a certain state. To resolve this problem we can
either try to remove it or try to make better use of the interpreter.
- Removing the prolog interpreter: this could be done by integrating
the adaptor more tightly with the server side component. By removing
the concurrency strategy at the server component we can make progress
on two ends. Firstly, removing the server side concurrency strategy
reduces memory-requirements with respect to data space (to store locking
information) and code space (to implement the concurrency strategy).
Secondly, not implementing a concurrency strategy at the server
component removes the need for an enforce-action module.
- Better usage: If it is not possible to remove the concurrency
strategy at the server (because the embedded system is not under control
of the one inserting the adaptor) then we can make better use of the
interpreter by implementing the other modules (the liveness and concurrency
module) also as prolog programs. Making use of an interpreter in itself
would also drastically reduce the memory-requirements and might even
be better in comparison to compiling the adaptor. However, this could
also introduce a speed penalty. So, it is the standard tradeoff between
speed vs place requirements.
Certain embedded systems have low bandwidth constraints, which means
that they can only send and/or receive a small amount of bytes per
second. As stated in section 12.6.4, our current
implementation of the concurrency adaptor requires 3 times more messages
than there are messages posted by the client and server component
together. This could weigh heavily in a low-bandwidth environment.
Therefore we now investigate how we could reduce the bandwidth used
by the adaptor by decreasing the amount of extra message sends between
the modules or by making the penalty of doing so less severe.
- reducing the amount of extra messages can be done straightforwardly
by combining all modules in one embedded system such that there is
no extra communication necessary between the modules. It should also
be noted that by decreasing the number of messages that need to be
sent, we might not only optimize bandwidth, but also the latency of
client/server communication and the internal speed of the adaptor
(as explained in the next section).
- decreasing the penalty: if it a) is possible to modify the
server such that it doesn't offer any concurrency strategy or b) we
can place the enforce-action module on the server-side component,
then it is possible to decrease communication towards the server
because no synchronization messages ought to be posted between
the concurrency module and the server-side. The only remaining messages
would be functional (logic) messages.
There are two distinct approaches one can use to make the adaptor
more efficient with respect to speed:
- One of the biggest bottlenecks in the adaptor is its modularized nature.
This gives rise to sending three times as many messages as there are
messages being communicated. By demodularizing the adaptor and allowing
the modules to work together in one shared data-space it is possible
to increase speed substantially. Instead of copying the messages,
pointers to the messages could be passed. Actually, by doing so it
could be possible to have no overhead at all in 'extra' communication'.
- As described in section 3.4.4, Petri-nets
can be implemented in hardware very efficiently (data flow machines).
This could make it possible to reduce the time necessary to verify
whether a transition is enabled,
, with a certain
factor. It should be noted that such a hardware implementation will
only provide a marginal improvement and not solve the combinatorial
problem of searching for a matching set of tokens. Still, for embedded
systems, a speed increase with a certain factor is always welcome.
- Another possibility to increase the speed of an adaptor is by parallelizing
the process. By implementing every module within a separate piece
of silicon it is possible to increase speed linearly.
Real time constraints often require a much better quality control
process [SEE99]. This process should be extended to
our adaptor as well.
- Embedded systems requiring real time constraints should not rely on
'learning' approaches, because these might behave indeterministically.
Nevertheless, if such a system would be build anyway, then the deployment
phase should be carefully investigated. The learning algorithm should,
during an off-line training-phase, be allowed to learn but when placed
in an on-line setting it should no longer learn further.
- Systems constrained by hard runtime deadlines should not rely on synchronization
operations that might block the system for an unknown amount of time.
Depending on the behavior of various clients it might be possible
that the concurrency adaptor decides to set certain, possible important,
real time events in wait. In case this should be unavoidable a formal
analysis of the entire software within the embedded system should
be made. This should of course be done after the system has
been scaled down from a prototype to an optimized artifact.
In this section we have discussed how our adaptor, as it is, could
be used within embedded systems, how it can be scaled down to fit
extra non functional requirements.
Werner
2004-03-07