Thursday, April 21, 2011

Deceiving yourself in Scala

I found a cool way to trip myself in Scala today. This is what I meant:
private val lockByKey = new java.util.concurrent.ConcurrentHashMap[Any, Object]
...
locks.get(key)
This is what I actually typed:
private def lockByKey = new java.util.concurrent.ConcurrentHashMap[Any, Object]
...
locks.get(key)
Because I put 'def lockByKey' instead of 'val lockByKey' I created myself a function that creates a new ConcurrentHashMap every time it gets called instead of a single instance. Scala allows leaving out brackets in many cases so code like lockByKey.get(...) compiled fine, but as every access was creating a new map things didn't work entirely correctly.

Thankfully unit tests of basic functionality (eg ability to get the same value back twice) caught that something was awry and I was able to identfy that I had cleverly created a factory function rather than a static instance. Thank god for unit tests ;)

Dynamic lock allocation

Suppose you had some code that manipulated things in the filesystem and you wanted to ensure only one thread worked on any given file at a time. Various parts of the code may construct a File object and they then desire to synchronize access to it. However, each part of the code may have a different instance of the File so while the instances are equal (.equals would return true in Java) they are not the same instance (== would not return true in Java. Visually:
We could just declare a lock and synchronize on that, but that means that regardless of how many different resources we have only one thread can use any of them. It would be handy if we could say (pseudo-code):

val resource = new SomeResourceOrOther("""blah""")
   val lock = lockFor(resource )
   lock synchronized {
      //do stuff w/exclusive access to resource
   }

By acquiring a different lock for each resource we ensure optimal concurrency. This is somewhat similar to how structures like java.util.concurrent.ConcurrentHashMap partition their data into many segments and allocate a key per segement, thus potentially allowing as many threads as there are segments to safely operate on the data structure concurrently. We can build such a structure fairly easily:

object Locks {
 private val lockByKey = new java.util.concurrent.ConcurrentHashMap[Any, Object] 
 @tailrec
 def lockFor(key: Any): Object = {
  val lock = lockByKey.get(key)
  if (lock == null) {
   lockByKey.putIfAbsent(key, new Object())
   lockFor(key)
  } else {
   lock
  }
 }

...

class ConsumerOfLocks {
 val resource = new String("Maybe its a string?") //contrived means of insuring different instance!
 Locks.lockFor(resource) synchronized {
  //wherein I have exclusive access to resource
 }
}


Obviously the resource could be a file/directory or just about anything else. This structure can be particularly useful if you have multiple dynamic local resources you want to synchronize access to. If you move to a lock that is distributed in some manner then the same structure works across multiple nodes. For example, I have used a logically similar structure to ensure that only one node runs a given batch process.

This structure works best if the same set of resources are used consistently. If over time we are continually creating and locking new resources then we have to introduce additional complexity to ensure the lock map doesn't simply grow forever.

Source samples are for Scala 2.8.1.