Design Decisions: September 2006

Saturday, September 30, 2006

Java Keeps Keeping On

The real motives why industry analysts love to predict Java fall... and why they will spend another 10+ years being proven wrong.

A very good entry discussing some of the reasons why Java continues to defy the doomsday predictions of so-called "industry experts".

Java not being the new kid on the block is not news it's an achievement that every professional (state of mind not state of employment) Java programmer can be proud of.

Wednesday, September 13, 2006

Concurrent Javadoc

Writing correct concurrent code is hard and what I'm finding equally challenging is documenting concurrent behavior in code in prose (javadoc to be specific).

Pre Java 1.5 all we had where java.lang.Thread, the Runnable interface, the synchronized keyword, and volatile variables. Because synchronized was the only real mechanism for concurrency control if you wanted to reason about the thread safety of a class all you had to do was look for the synchronized keyword. Some of you may be asking "but what about volatile?" volatile then was nowhere near as powerful as it is now. Back then the only thing volatile really meant was do not cache. Nowadays volatile not only means do not cache it also affects the visibility of other non volatile variables and it also imposes ordering constraints. Basically synchronized and volatile no longer act in isolation. All activities inside the JVM is now governed by a well defined memory model (JMM). It's no longer enough to look for the synchronized keyword you have to keep the memory model in mind at all times which is damn hard to document. Therefore I'm seeking feedback.

Imagine you have been asked to maintain a body of code with which you have had no previous experience and the only thing you know is it's targeted at Java 1.5 and higher (new JMM). You come across this (partial) class definition:

1    /**
2     * Ties {@link BlastMaster}, {@link BlastManager}, and {@link BlastAgent} objects together around a blast. It is the
3     * highest layer of abstraction available on a blast. It provides the hooks for the lifecycle management of a blast and
4     * is the communication channel between BlastAgent(s) and BlastManager(s) and BlastMaster.
5     * <p>
6     * The lifecycle of a context goes something like this. It is created by BlastMaster, injected into the outbound queue
7     * by BlastMaster, scheduled by BlastManager, delivered by BlastAgent(s), descheduled by BlastManager, rescheduled by
8     * BlastManager, delivered by BlastAgent(s) ....
9     *
10    * The scheduling, delivering, and descheduling steps repeat until a) all the recipients have been sent the blast or b)
11    * the duration of the blast has expired.
12    * </p>
13    *
14    * @author HashiDiKo
15    */
16   public final class BlastContext extends Constants
17   {
18       /**
19        * Keeps track of the number of domains yet to be delivered. When this number reaches zero it means the current
20        * iteration of trying to deliver the blast has ended.
21        *
22        * @see BlastAgent#decrementRemaining()
23        * @see #fire(ContextEvent.Event)
24        */
25       final AtomicInteger remaining = new AtomicInteger();
26   
27       /**
28        * Provides a mechanism for a {@link BlastManager} to be notified when the {@link BlastAgent}s executing the context
29        * stop executing it.
30        * <p> This object does a great deal in terms of establishing <em>happens-before</em> edges between a blast manager
31        * and its agents and between blast agents.</p>
32        */
33       final Set<Thread> agents = Collections.synchronizedSet( new HashSet<Thread>( MAX_BLAST_AGENTS + 1) );
34   
35       /**
36        * The message.
37        *
38        * <h4>Visibility</h4>
39        *
40        * <p>The blast manager thread is the only thread that writes this field while blast agents are the only threads
41        * to read this field. This field is <i>correctly synchronized</i> because the blast master writing this field
42        * <b>always</b> <i>happens-before</i> the blast agents reading it. This <i>happens-before</i> relationship is
43        * established in the code starting with {@link #init()} and its call to {@link #doInit()}. {@link #doInit()} is
44        * the only place in the code where this field is written. You simply need to follow the flows in
45        * {@link BlastManager} starting from where {@link #init()} is called to see synchronization the points that
46        * establishes the <i>happens-before</i> relationship between the blast manager and its blast agents in regard to
47        * this field.
48        * </p>
49        */
50       Message message;
51   
52       /**
53        * Indicates that this context is context to an old blast. In other words the system was restarted and the blast
54        * was loaded from the database.
55        *
56        * @see #init()
57        * @see #BlastContext(boolean, EmailBlast, BlastMaster)
58        */
59       private final boolean old;
60   
61       /**
62        * The home directory of the blast.
63        */
64       private final File home;
65   
66       /**
67        * The duration of the blast in milliseconds.
68        */
69       private final long durationMillis;
70   
71       /**
72        * Used to implement our backoff algorithm.
73        *
74        * <h4>Visibility</h4>
75        *
76        * <p>The field is only ever read/written by blast agents thus the visibility of this field is limited to
77        * blast agents in general and specifically the {@link #fire(ContextEvent.Event) fire} method. Even though
78        * {@link #fire(ContextEvent.Event) fire } is not <tt>synchronized</tt> and this field is non volatile it is
79        * <em>correctly synchronized</em>. There are 2 things at work that make this so. Firstly, there is never
80        * concurrent access to this field. At most 1 blast agent thread will write/read this field for a delivery cycle.
81        * This is guranteed by {@link BlastAgent#decrementRemaining()} because a) how its implemented and b)it is the only
82        * place where a blast agent get accesses this field. Finally, the fact that blast agents call
83        * {@link #register(Thread)} before they execute a blast and {@link #unregister(Thread)} when they are done
84        * executing a blast (both contained <tt>synchronized</tt> blocks using the same monitor) gurantees that the update
85        * of this field will be visible to any subsequent reads by different blast agents.
86        * </p>
87        * @see #fire(ContextEvent.Event)
88        */
89       private int magnitude;
90   
91       /**
92        * Used to implement our backoff algorithm.
93        *
94        * <h4>Visibility</h4>
95        *
96        * <p>The field is only ever read/written by blast agents thus the visibility of this field is limited to
97        * blast agents in general and specifically the {@link #fire(ContextEvent.Event) fire} method. Even though
98        * {@link #fire(ContextEvent.Event) fire } is not <tt>synchronized</tt> and this field is non volatile it is
99        * <em>correctly synchronized</em>. There are 2 things at work that make this so. Firstly, there is never
100       * concurrent access to this field. At most 1 blast agent thread will write/read this field for a delivery cycle.
101       * This is guranteed by {@link BlastAgent#decrementRemaining()} because a) how its implemented and b)it is the only
102       * place where a blast agent get accesses this field. Finally, the fact that blast agents call
103       * {@link #register(Thread)} before they execute a blast and {@link #unregister(Thread)} when they are done
104       * executing a blast (both contained <tt>synchronized</tt> blocks using the same monitor) gurantees that the update
105       * of this field will be visible to any subsequent reads by different blast agents.
106       * </p>
107       * @see #fire(ContextEvent.Event)
108       */
109      private boolean hrs;
110  
111      /**
112       * Stores the backoff timeout (the end result of the backoff algorithm computation).
113       *
114       * <h4>Visibility</h4>
115       *
116       * <p>The visibility requirements for this field is kind of funky. It's is a read/write field for
117       * {@link BlastManager}s and write only field for {@link BlastAgent}s. So the core visibilty issue is writes by
118       * blast agents being visible to blast manager. The <em>happens-before</em> edge that makes this possible is
119       * established by the monitor acquisition of {@link #agents} by the blast agent thread that writes this field and
120       * the subsequent acquisition of the same monitor by the blast manager.
121       * </p>
122       *
123       * @see #fire(ContextEvent.Event)
124       */
125      private long backoffTimeout;
126  
127      /**
128       * Indicates if {@link #doInit()} has been called.
129       * This field is only ever read/written by the blast manager thread for this context thus it is correctly
130       * synchronized.
131       */
132      private boolean didInit;
133  
134      /**
135       * Temporarily store the short domain blacklist expiration.
136       * <p>
137       * When this value is greater than zero it indicates to the blast manager thread that even though {@link #domainQ}
138       * is empty this context has undelivered recipients.
139       * </p>
140       * <p>This field is <em>correctly synchronized</em> because it is only ever read/written by the blast manager
141       * thread.</p>
142       *
143       * @see #getDomains()
144       */
145      private long shortestDomainTimeout;
146  
147      /**
148       * Gate to {@link #manager}. It establishes a <em>happens-before</em> edge with {@link #manager}.
149       */
150      private final CountDownLatch managerLatch = new CountDownLatch( 1 );
151  }
152

So. Does the javadoc comments say enough about how non volatile class members manage to be correctly synchronized?

Saturday, September 09, 2006

A Developer's Journal; Solaris/CMT #5

I am feeling disingenuous about my "A Developer's Journal; Solaris 10 ..." title because I've made 5 entries so far and have said very little about Solaris 10. The truth of the matter is Solaris is the means and not the end. Let me explain. I have absolutely no interest in Solaris as an end to itself at this point because of Linux. And though Sun has made awesome progress with Solaris on x86 it still has a way to go before it is as seamless as Solaris on Sparc. So my key interest in Solaris is strictly with Solaris on Sparc in general and Solaris on Niagara processors specifically. Therefore all entries, starting with this one, will be titled "A Developer's Journal; Solaris/CMT ...". My apologies to any readers who have arrived here under the assumption that these are strictly Solaris 10 entries.

Friday, September 08, 2006

A Developer's Journal; Solaris 10 #4

I brought it home. It's still in the box. It's setting in my living room steering at me and refuses to blink. I know I don't stand a chance of winning the steering game but I steer anyway eyes bloodshot I steer. It's all I can do with it at the moment. I have deadlines to meet and missing deadlines is bad for business so I can't cut into my dev time to play with the it. The clock is ticking, ticking, ticking. As I look away from my monitor to rest my eyes it catches my gaze. It steers at me and I steer back neither of us blinking while the clock ticks, ticks, ticks.

Monday, September 04, 2006

A Developer's Journal; Solaris 10 #3

The T1 is here. Andrew called to let me know it's at the office. It only took 6 days. I'm impressed. I guess Sun has finally got the kinks in there supply chain hammered out.

I need to figure out where I'm going to put it. I can set it up at the office or take it down to the data center or set it up at home. I just don't know right now because my thoughts are elsewhere. I've got some integration work to do with Jetty and a heck of a lot of code to write for the web service interface for my project.

So much to do so little time.

Sunday, September 03, 2006

Latches in Java

David Holmes provides an excellent description of the difference between java.util.concurrent.CountDownLatch and java.util.concurrent.CyclicBarrier on the concurrency interest mailing list. I have posted it here for those of you who don't subscribe to the list and may not have a good grasp of these classes. And since I use CountDownLatch in code I've posted in the past and in code I'm planning on posting I'm happy to have such a clear explanation that I can link to in future entries.

... As Brian describes one notion of state "latching" is that it progresses to a terminal value from whence it no longer changes. It is permanently "latched". A synchronization object that behaves in this way is the CountDownLatch - once "open" it never closes and can't be reset.

More generally though latches can be reset - consider a digital flip-flop such as a "gated D-latch", the flip-flop latches the value of the data when the gate is pulsed/strobed; if you change the data without changing the gate then the latched value is unchanged, but change the gate and the latched value is updated.

In synchronization object terms a latch is sometimes called a "gate" - the connotation being that if the gate is open anyone/everyone can pass through; while if it is shut no one can pass through. The CountDownLatch operates this way, but a more general "gate" is the CyclicBarrier which can also be reset (and automatically does so). Of course the semantics of CountDownLatch and CyclicBarrier are somewhat different. CyclicBarrier is what can be called a "weighted gate" or "weighted bucket" - it is set up to expect N threads to arrive, when they arrive they have sufficient "weight" to open the gate, or tip the bucket - in this case the gate/bucket is spring-loaded and closes/rights-itself as soon as the threads leave, so it is ready for the next set of threads to use. CountDownLatch on the other hand is like a gate with multiple padlocks - when the last padlock is removed, the gate opens and it stays open. Aren't these analogies quaint :-) We could have defined CountDownLatch to allow reset but reset semantics are messy and usually not needed for CDL usage, in contrast barrier designs typically always use the barrier multiple times.

It seems the database folk are using the term "latch" for lightweight lock, which is an uncommon usage from a synchronization perspective and a poor choice in my view, though arguably there is an analogy between "locking a door/window" and just "latching it shut". In that sense "latching" is a weaker form of "locking". But I don't like the usage.

A Developer's Journal; Solaris 10 #2

After 2 "Try and Buy" applications Sun Microsystems has finally shipped my T1. The thought of getting my hands on it gets my geek johnson tingling. You may be wondering "why just a tingle?" Well, when I first read about CMT coming to a server near me my geek johnson pitched a tent with enough head room for Yao Ming to enter without bending his head. But the truth of the matter is I am a cynic to the core. Vendors are notorious for contorting the truth and calling it marketing. They all claim their product is the greatest thing since Internet porn which is obviously a lie. THERE IS NO THING BETTER THAN INTERNET PORN! There. I said it. Go tell your friends. I'll wait for you in the next paragraph....

So I'll read the technical specs and the research and whitepapers on a technology/product but I don't drink the marketing Kool-Aid. So until I get my hands on the box and start playing with it you'll only get a tingle out of me.

Friday, September 01, 2006

A Developer's Journal; Solaris 10 #1

I'm testing the Solaris 10 waters. My plan for this journal is to record my Solaris journey in as much useful detail as possible. But before I start reporting my trials and tribulations I want to discuss why I'm taking this path (remember Linux is my OS of choice). There are 5 reasons why I'm looking at Solaris 10 now (from most to least importance):

Try and Buy
Chip Multi-Threading (CMT)
DTrace
ZFS
Hotspot

Try and Buy

Try and Buy made the list and snatched the number one spot because it makes number 2 accessible given my limited R&D budget.

CMT

Sun has jumped ahead of the competition (Intel/AMD/IBM) in terms of throughput computing with the release of the T1 processor, code-named Niagara. I had previously discussed the coming of throughput computing here and here and I'm surprised yet thrilled to see it come to pass so rapidly. Most of the reviews of Niagara I've read are from people who don't have a clue about multi-threading and/or what makes it great. I'm guessing they have a rudimentary understanding of what a process is and probably don't know why scaling systems with threads is more efficient and performant than doing it with processes. So I'm really excited to get a crack at throwing an application that was designed from the ground up for multi-threaded deployment at Niagara.

DTrace

I'm a performance junkie. I like to make things go fast and it is hard to make something go fast when you don't understand what the thing is doing. Before DTrace it was possible to get pieces of the puzzle and sometimes it was possible to postulate about the whole and get it right but that's usually a hit or miss and it takes years of experience to develop the facilities to get more hits than misses. With DTrace there is no need to postulate it becomes possible to know with absolute certainty what the hell is going on. DTrace takes the guessing out of performance tuning.

ZFS

The disk subsystem is probably the most mission critical part of a computer because it's where all the programs and data are stored and without programs or data a computer isn't worth squat.

Aside:

Okay I concede. You can turn it into a modern art piece or make an ugly end table but other than that it isn't worth squat!

Unfortunately the disk subsystem is also the most unreliable. So any practical technology that handles reliability w/o major performance costs is a big deal to me. For years we've been subject to either uber expensive SCSI/RAID systems or cheap unreliable (and slower) IDE systems. But that's all changing, SATA and it's ilk are catching up to SCSI in terms of reliability and performance which will eventually drive SCSI out of the picture or force the price of SCSI down to the point where there is really no difference in terms of choosing one over the other. The bottom line is ZFS makes reliable disk subsystems something everyone can afford.

Hotspot

Given that the hotspot JVM is a Sun product and so is Solaris I conjecture that hotspot works better on Solaris than anywhere else. Especially Solaris on Sparc. So even though I'm a big Linux fan I'm an even bigger Java one.