6. Transaction Basics in Borg

Johan Fabry¹ - jfabry@dcc.uchile.cl

1- Programming Technology Lab (PROG) Department of Computer Science (DINF) Vrije Universiteit Brussel (VUB); Pleinlaan 2; 1050 Brussel; Belgium

Abstract : These excercises were made by Johan Fabry for the course Distributed Systems. These were quite interesting and are presented here for the sake of completeness. They were used in 1999-2000.

Reference: Johan Fabry; Transaction Basics in Borg;

This page contains the Transaction exercises for the Distributed Systems course. The relevant files can be found on the ftp site. We use CBORG. We will explore what happens when multiple agents access a central store, what goes wrong at what points, and how we can avoid this.

Setting: A Simple bank example.

For these exercises we assume a simple bank example, where a central database keeps a record of accounts, each account having a string as key, and a number as value (= the contents of the account).

We will use 3 agents as clients throughout the first couple of examples. This is the first one:

{ ssend(to,msg)::{send(to,msg);sync(to,msg)}; srecv(from,msg)::recv(sync(from,msg),msg); BeginTransaction(txs)::{txs.BeginTransaction(agentself()); srecv(txs,any)}; Read(ctx,name)::{ctx.Read(name);srecv(ctx,any)}; Write(ctx,name,value)::{ctx.Write(name,value);srecv(ctx,any)}; Commit(ctx)::{ctx.Commit();srecv(ctx,any)}; Abort(ctx)::{ctx.Abort();srecv(ctx,any)};

go():: { server:remotedict("master/txserver"); for(i:0, i:=i+1, i<100, {display(i); context:BeginTransaction(server); a:Read(context,"a"); b:Read(context,"b"); Write(context,"a",a-1); Write(context,"b",b+1); Commit(context) }) }

As you can see, the agents transfers money from account a to account b. It repeats this operation multiple times, this is to exaggerate the results, so later you can easily observe some weirdness. The BeginTransaction and Commit operations will be described later. For now, remember that every group of database transactions needs to be preceded by a BeginTransaction and followed by a Commit.

You can try this bank client with this agent (simpleserver.cborg) as simple server.

Now create a similar agent, which transfers money from account c to account b, repeating this operation the same number of times. For reasons which will become clear later, first read the balance of c and second the balance of b.

Also create a third agent, which makes the sum of the balances in the accounts (do not forget the BeginTransaction before, and the Commit after reading a, then b, then c) and records this somewhere locally, again repeating this operation (save the result for each iteration separately!, e.g. in a table).

The first Bank database

Now for the weirdness.

Run 2 CBorg interpreters. In one interpreter, run the first agent and the server you used before. In the other, run the second agent. Try to start the agents at the same time. You can write a third agent for this, for example:

{ a:remotedict("master/client1"); b:remotedict("master/client2"); a.go(); b.go() }

When both agents have finished, have a look a the contents of the database. Write it down somewhere. Reset the database and try again. You will most probably get different results!

WTF is going on here? What you have here is a classic case of the "lost update" problem, as a result of the concurrent access of the 2 clients to a datum (in this case the bank account b). Have a look at the following interleaving of threads for the two clients:

Agent 1	Agent 2
a: Read(context,"a"); b: Read(context,"b");
	c: Read(context, "c"); b: Read(context, "b"); Write(context,"c",c-1); Write(context,"b",b+1);
Write(context,"a",a-1); Write(context,"b",b+1);

It is clear that in this case, the Write(context,"b",b+1); operation of agent 2 is lost. This is the "lost update".

Try something else. Instead of running the second agent in the second interpreter, run the third agent concurrently with the first agent in the first interpreter. Note the results, and try again.

Again, weirdness. This time, we have the "inconsistent retrievals" problem. This happens, for example in the following interleaving of threads.

Agent 1	Agent 3
a: Read(context,"a"); b: Read(context,"b"); Write(context,"a",a-1);
	a: Read(context, "a"); b: Read(context, "b"); c: Read(context, "c");
Write(context,"b",b+1);

When agent 3 accesses the data, this data is in an inconsistent state, leading to the "inconsistent retrievals" problem.

Avoiding interference: Transactions

Our bank would be in deep problems if these things happen, so we must fix this. We must make sure that data which is 'in use' is not concurrently accessed if this can lead to lost updates or inconsitent retrievals.

This is what transactions are made for. Let's have a tentative definition:

A transaction is an atomic set of operations which, from a clients point of view, transform the database from one consistent state to another consistent state.

A transaction is started by a client by calling the BeginTransaction operation, this sets up a so-called transaction context, in which all operations on the database are performed. A transaction is ended by either a commit, which saves all the work in the database, or a rollback, which undoes all the work performed by the transaction.

We could have the server only process one transaction at a time, in effect having no concurrency at all, but this has a big negative impact on performance, which we do not want. Therefore we want to maximize concurrency. Transactions are allowed to execute concurrently if this would have the same effect on the data as a serial execution, i.e. if they are "serially equivalent".

The use of serial equivalence as a criterion for correct concurrent execution avoids the lost update and inconsistent retrievals problems we showed above.

The most common way to ensure serial equivalence is through the use of locks in the database.

We have made a start on this in the lockingserver.cborg file. Here no locking is actually performed (all accesses are granted), and transactions can not be rolled back. Change the server such that serial equivalence is ensured, and verify this by running the above two tests (no need to implement rollbacks yet).

Deadlocks & Aborting Transactions

Run Agent 2 and 3 concurrently. If you have a bit of bad luck, the system freezes up on you.

What is going on now? Consider this interleaving of threads:

Agent 2	Agent 3
	a: Read(context, "a"); (a is now locked by agent 3) b: Read(context, "b"); (b is now locked by agent 3)
c: Read(context, "c"); (c is now locked by agent 2) b: Read(context, "b"); (b is already locked by agent 3 -> wait for release of lock)
	c: Read(context, "c"); (c is already locked by agent 2 -> wait for release of lock)

Welcome to the wonderful world of deadlocks. Deadlocks are one of those things which can make your life really miserable.

Note that if Agent2 first reads b, then c, no deadlocks will occur, that's why we wanted you to implement it differently. Of course you realise it is impossible to statically ensure no deadlocks will occur by reordering read and write sequences when writing larger systems

A good transaction server will provide some relief here, in that it can detect deadlocks and undertake some action to break them.

This boils down to the fact that the server will select one transaction, and forcibly rollback it. All changes made to the database by the transaction will be undone, and the program will receive an error message as a result of the read or write operation it was executing when the deadlock occurred.

This implies that clients have to be prepared to handle these forced rollbacks at each operation on the database, which can make the code quite complex, but can sadly not be avoided.

There are various ways to detect that a deadlock has occurred. Deadlock detection can easily be added to the locking server you created above, by adding timeouts for each lock. If a lock is requested by a different context and denied, decrease a timeout counter. When the counter reaches zero, rollback the transaction which has the lock. The obvious disadvantage of this is that it can lead to forced rollbacks when no actual deadlock has occurred.

Implement this deadlock detection on your previous server. Dont forget:
- to correctly implement rollbacks (for the database, it should be as if the transaction never occurred)
- to change the implementations of the clients, so they can handle rollbacks.

Think of alternative ways to detect deadlocks (HINT: "wait-for graph"), and implement such a scheme.

Transactions: ACID properties

A last bit of terminology: You will often hear about the ACID properties: transactions are

Atomic : All actions in a transaction are performed (comitted), or none of them are performed(rolled back)

Consistent : A transaction takes the system from one consistent state to another consistent state

Isolated : A transaction does not see the effects of other (non-committed) transactions; intermediate effects of a transaction are not seen by other transactions.

Durable : The changes made by a transaction are kept in permanent storage