Java Threads an Inconvenient Truth

Java Threads an Inconvenient Truth

A good understanding of multi-threading in the traditional unix memory model is required for the following tutorial.

Traditional Unix Threads

With the traditional c model one could fork(). This would create a child and parent, the child would copy the parent's memory space and they would communicate through sockets, which were essentially files. vfork() would allow them to share the same memory space, but one at a time. Then there are also exec() and clone() that would similarly launch processes and allow for communication between them. Accurate communication proved to be very difficult, so eventually POSIX threads came to be. They brought mutex locks and semaphores to help with communication (locking). These are all very low end principles and implementing them takes a lot of care and consideration.

The Java Model

We all take Java for granted, we launch Threads and assume that we can use the same memory space, that each Thread is created equal and will get fair access to the processor. We assume that we can share information between threads via variables, use Threads, Tasks and Runnables with no regard to memory life. The sad truth is that we were misinformed. You may have noticed that there are a lot of standard classes like HashMap and Hashtable that seem to do the same thing but one is “Thread Safe” or "Syncrhonized" and the other is not. More importantly, you may have noticed that nobody really explained what these terms really meant, other than saying that syncrhonization is a costly operation.

Hoare Monitor Basics

Java uses a synchronization mechanism called Hoare Monitors. The idea is that handling communication between multiple threads is difficult (acquiring and releasing) mutex locks). So to simplify this issue, a monitor only allows one thread into the critical sections of its code at a time while all other threads wait. That means that all threads only have one lock to acquire and release. This makes it a lot simpler to control the concurrency you are applying. A common misconception is that using a Hoare Monitor only allows one lock for the entire operating system or application. In Java, each object is a monitor and gives access to its critical section (or monitor) through the synchronized keyword.

1. Synchronized Function

synchronized int myCriticalSection(){
	// Critical section
	return result;
}

2. Synchronized Block of Code.

...
	synchronized(someNonPrimitiveObject) {
		// Critical section
	}
...

What happens when you use #2

// Acquire lock + MAGIC
	// Critical section
// MAGIC + release lock

Note the MAGIC, we will talk about this later.

What happens when you use #1

int myCriticalSection(){
	syncrhonized (this) {
		//Critical section
	}
	return result;
}

or

int myCriticalSection(){
	// Acquire lock + MAGIC
		// Critical section
	// MAGIC + release lock
	return result;
}

This Is Where The MAGIC Happens

Most people think they understand the above example and think that it can be replaced by clever ueses of sleeps, interrupts and variable dependant loop conditions, this is not the case. In reality the MAGIC is consolidating the memory spaces that the threads and monitors operate in. Any thread that enters the monitor will have up-to-date information from the work of all threads that have used that monitor at the time that they left the monitor. Any changes made by threads after leaving the monitor are not guaranteed to visible to anyone but that thread. If we remove the synchronized keywords in the above example and replace it with sleep or interrupt, the information written to memory by one thread won't necessarily propagate to the other threads.

Lets go over a simple example.

class MySexyMonitor{

	private String mString;

	public synchronized String getSyncedString(){
		return getString();
	}	

	public synchronized setSyncedString(String string){
		setString(string);
	}

	public String getString(){
		return mString;
	}

	public setString(String string){
		mString = string;
	}
}

Assume that these happen atomically in the following order

Cycle

Thread1

Thread2

Thread3

1
setString(“unSynced”);
 
 
2
 
 
setSyncedString(“Synced”);
3
 
String s1 = getString();
 
4
 
String s2 = getSyncedString();
 
5
String s3 = getString();
 
 

What are the possible values for s1,s2, s3?

null, “unSynced” or “Synced”?

All of these are possible for s1. s3 could be either of “unSynced or “Synced”. s2 will always be “Synced”. Thread1 has set the value of mString to “unSynced” in its memory space. That memory is dirty, but may never actually get "syncrhonized" with the memory spaces of Thread2 or Thread3.

Synchronized blocks of code create a happens-before relationship. This means that subsequent calls to the critical section of a monitor (synchronized blocks) assure that every piece of memory a thread touched, until it left the monitor, is visible to the next thread accessong that same monitor. Note the importance of which monitor you are synchronizing on. If two monitors synchronize code that touches the same variable, threads accessing one monitor will not necessarily see the changes made to variables by other threads accessing the other monitor.

Also it is important to note that in some JVMs, in the example above, s2 could be “unSynced”. Because, Thread1 wrote to the variable, its memory synchronization could have been delayed until just after Thread3's write. This shows the dangers of having un-synchronized code in a multi-threaded environment.

Deadlock and Lock Acquisition

It is still possible to have deadlock due to the order in which you acquire different monitors' locks, so be careful. The synchronized keyword is well designed, it forces you to release your locks every time you use them and makes sure nobody forgets to release a lock. Monitor locks are also re-entrant, this means that if you have entered a synchronized block of code you can also enter others that require the lock you already have. Other threads will have to wait for you to exit the initial synchronized block of code before they can access the monitor.

Synchronize All The Methods!

No! There are other ways of ensuring that your memory is up to date. The volatile keyword can be set when instantiating class member variables.

private volatile String mVolatileString;

The volatile keyword will ensure that every write to that variable is visible to all threads. It will not ensure that access to that variable remains atomic. Certain blogs will claim that adding the volatile keyword to an object's definition is equivalent to surrounding all of that object's methods with synchronized blocks, this is not the case. Sun's documentation only guarantees that the writes are visible to all other threads. The order of the reads and writes is respected more and more as the versions of java increase, but it is not guaranteed. Accessing volatile objects creates a happens-before relationship with all other threads that access that volatile variable. All the memory chagnes made by a thread at the time it accessed the volatile variable will be made visible to all other threads at the time they access that volatile variable (Same as above). Similar to the relationship of the synchronized keyword and its monitor, here the changes we are talking about are only guaranteed to be seen by other threads at the time that they access the volatile variable. Which volatile variable is touched depends on what changes a thread sees.

So Much Overhead

Most often I hear “I didn't synchronize that because it would cost me too much in overhead”. This is a false assumption. Synchronizing and creating happens-before relationships in modern day java is very cheap, although in older JVMs it was fairly expensive.

Managing Memory Updates Yourself

For those of you who believe that synchronized and volatile are keywords made up by Java fan boys as a way to diminish the true java expert, you are in luck. You can synchronize your memory using the join method. The join() method creates a happens-before relationship between two threads. Although I do not recommend this as your go to happens-before relationship creator unless you have very specific speed optimization needs that require you to keep track of how much time the JVM is taking to accomplish other happens-before relationship mechanisms. Also you need to have a very specific work flow for threads that would require the rest of join()'s functionality.

Thread Safe Classes

You may have seen that the only difference between certain Classes is that one is thread safe and the other is not. Thread safe means not only that it is safe for concurrent access in terms of appropriate atomicity, but also in terms of making sure that you are reading the most up to date information, regardless of which thread you are.

Concurrent Classes

Classes such as ConcurrentHashMap are not only thread safe, but optimized for concurrent access. They offer methods such as putIfAbsent(K key, V value). A custom implementation would require you to lock around a traditional HashMap or Hashtable, check to see if the inserting value is absent, put it in and then release the lock. At least double the locks if you are using a thread safe Colleciton. Concurrent classes claim to be optimized more so than one would be able to do with traditional locking of thread safe or non thread safe objects. I highly recommend the use of these classes.

Due to the difficulty of using wait/notify correctly, Java has classes such as BlockingQueue allow for functions such as take() which blocks until there is a new element in the queue. Also look at CountDownLatch , Semaphores, CyclicBarrier and Exchanger.

Note for timing use System.nanoTime instead of System.currentTimeInMillis.

Immutable Classes

Immutable classes are thread safe because they only get written once. To make an immutable class make all your member variables private and final. They can only be created in the constructor. When changes want to be made to the value of your variable, simply create a new variable of that type and return it with the changes desired.

Public class myImmutableClass {
	private final String mImmutableString;
	
	public myImmutableClass(String string){
		mImmutableString = string;
	}

	public myImmutableClass append(char c){
		return new myImmutableClass(mImmutableString + c);
	}
}				

This works because constructors define the initial value of the object and hence do not require a happens-before relationship. Similarly, constructors can not be synchronized and any synchronized blocks inside a constructor are without purpose. Note that people will complain about the overhead of methods instead of accessing the variable directly, but seriously, is your application that sensitive that the overhead of a get method will ruin your application?

Executors

Timer tasks and other such thread management mechanisms are no longer recommended. They are old and are practically deprecated. Use Executors, they are thread safe and optimized for efficiency and stability. Do not bother launching threads yourself.

Singletons

Singleton patterns need have their constructor synchronized as follows

public class MySingleton {
	private Object object;
	public static synchronized Object getObject(){
		if (object == null)
			object = new Object();
		return object;
	}
}

Better yet use static and final appropriately to accomplish this at runtime

public class MySingleton {
	private static final Object object = new Object();
	public static Object getObject(){
		return object;
	}
}

When Will Java Thread Programming Be Safe Again?

The time is now. Think carefully about what variables you want to use to share information between threads and wrap them around synchronized methods or put a volatile keyword in front of its definition (depending on what kind of work you want to optimize). Remember that your object is a monitor, in the case of synchronized static methods, MyClass.class is your monitor. Do not share access to member variables ever! Make all your member variables private and your models immutable. Only use thread Threads, Tasks, sleep, interrupt and join if you really know what you are doing or if you are not passing information between them via memory. Such communication should be done through sockets, databases, files or thread safe objects. Use object oriented design and design paradigms, member variables should only live inside their class, any access to them should be controlled through methods.

For concurrency issues, I recommend that you carefully pass lists between classes, don't be afraid to return new Arraylist(someOtherList); Most of the time the objects in these lists are models and can be made immutable, so you would effectively be doing a deep copy. The reason for this arises because return someOtherList; gives the caller a direct reference to your object and allows them to change the inner workings of your class outside of the thread safe environment you may have created.

Conclusions and Guidelines

If you are not sure how your threads behave in memory, identify the variables you will be sharing, make them private and use synchronized on your read and write methods. This will not make your application noticeably slower and ultimately it will be guaranteed to be correct. Do not launch Threads by yourself, use Executor services. If you want to give member variable access to different threads and you don't care what order the threads are writing to it (I.E. you don't want to introduce the overhead of synchronization) use the volatile keyword. If you are not sure how to use synchronized, use Concurrent Classes. These are optimized and thread safe. These simple rules will ensure that your programmes are correct.

References

  • Inside the JAVA 2 Virtual Machine Second Edition, by Bill Verners, 1999, McGraw-Hill
  • JAVA IN A NUTSHELL A Desktop Quick Reference 5th Edition, by David Flanagan, 2005, O'REILLY
  • Effective Java Second Edition, By Joshua Bloch 2008, Sun Microsystems Inc.
  • Lesson: Concurrency, 2012, Oracle and/or its affiliates