Thursday, September 23, 2010

Not the timezones you are looking for

Probably purely by coincidence I have recently been asked repeatedly about Java Date values changing when in fact the only difference is how the value was formatted for display.

For example, why did the date change here?

2010-09-23 17:27:54,565 DEBUG [SomeClass] start date Thu Sep 09 00:00:00 UTC 2010 
...
2010-09-23 10:44:24,728 DEBUG [SomeClass] start date Wed Sep 08 17:00:00 PDT 2010

At a glance it looks like the date isn't the same: it isn't the same day, hour, or timezone!

The answer is the first line of the Date class javadoc:
"The class Date represents a specific instant in time, with millisecond precision." (see here)
Note that it does NOT have a timezone. A timezone is applied only when formatting the date. We can even prove the date hasn't changed:

import static org.junit.Assert.assertEquals;

import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.junit.Test;


public class DateTzTest {
 @Test
 public void dateCompare() throws ParseException {
  DateFormat df = new SimpleDateFormat("EEE MMM dd hh:mm:ss z yyyy");
  Date d1 =  df.parse("Thu Sep 09 00:00:00 UTC 2010");
  Date d2 =  df.parse("Wed Sep 08 17:00:00 PDT 2010");
  
  assertEquals(d1.getTime(), d2.getTime());
 }
}

Wednesday, September 15, 2010

Make it CamelCase: removing sequences of capitals leveraging regex lookahead/lookbehind

I ran into a novel (in that it isn't something I need to do very often) problem today: the need to convert names that might have sequences of capitals to CamelCase with no consecutive capitals in Java. For example:

BEFORE => AFTER
MyGoodName => MyGoodName
MYGoodName => MyGoodName
MyGOODName => MyGoodName
MyGoodNAME => MyGoodName
EndOfIT => EndOfIt
EndOfItALL => EndOfItAll
UUIDIsACOOLType => UuidIsAcoolType

It's easy enough to find such sequences using a regular expression but how to correctly replace them was slightly less clear. So, starting with the easy part, to find such sequences we can use a pattern that says "a capital, followed by some more capitals, ending with either another capital or the end of the string":

//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. 
String expr = "[A-Z][A-Z]+([A-Z]|$)";

The middle set of capitals - the [A-Z]+ - are the ones we'd want to change to lowercase. So ... how to match and replace? Probably we'll want to put that into a group so we can easily use it in a replace:

//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. 
String expr = "[A-Z][A-Z]+([A-Z]|$)";

The normal replaceAll style APIs on Matcher do not appear to easily allow you to swap in a modified version of a group. Luckily the appendReplacement and appendTail APIs do allow this. There is a great example in the javadoc.

So ... we should be able to do something like this:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {
 public static void main(String[] argv) {  //
  //1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. 
  String expr = "[A-Z][A-Z]+[A-Z]|$";
  String[] testStrings = { "MyGoodName", "MYGoodName", "MyGOODName", "MyGoodNAME", "EndOfIT", "EndOfItALL", "UUIDIsACOOLType" };
  
  Pattern pattern = Pattern.compile(expr);
  for (String testString : testStrings) {
   Matcher matcher = pattern.matcher(testString);
   
   StringBuffer sb = new StringBuffer();
   while (matcher.find()) {
    matcher.appendReplacement(sb, matcher.group().toLowerCase());    
   }
   matcher.appendTail(sb);
   System.out.printf("%1$s => %2$s\n", testString, sb.toString());
  }
 }    
}

Unfortunately this doesn't work because our match is the entire string so we end up with the following output:

MyGoodName => MyGoodName
MYGoodName => mygoodName
MyGOODName => Mygoodname
MyGoodNAME => MyGoodname
EndOfIT => EndOfIT
EndOfItALL => EndOfItall
UUIDIsACOOLType => uuidisacooltype

We could monkey around with trying to replace only a specific group based on it's start/end indices or some such nonsense but it would really be much nicer to match only the consecutive caps, with the leading cap and trailing cap or end-of-string not actually being considered a part of the match. Luckily java supports lookaround in regular expressions. We can revise our program to use lookahead/behind for the start/end and have only the desired bit be the match:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {
 public static void main(String[] argv) {  //
  //1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. 
  String expr = "(?<=[A-Z])[A-Z]+(?=[A-Z]|$)";
  String[] testStrings = { "MyGoodName", "MYGoodName", "MyGOODName", "MyGoodNAME", "EndOfIT", "EndOfItALL", "UUIDIsACOOLType" };
  
  Pattern pattern = Pattern.compile(expr);
  for (String testString : testStrings) {
   Matcher matcher = pattern.matcher(testString);
   
   StringBuffer sb = new StringBuffer();
   while (matcher.find()) {
    matcher.appendReplacement(sb, matcher.group().toLowerCase());    
   }
   matcher.appendTail(sb);
   System.out.printf("%1$s => %2$s\n", testString, sb.toString());
  }
 }    
}

This will finally produce the desired output:
MyGoodName => MyGoodName
MYGoodName => MyGoodName
MyGOODName => MyGoodName
MyGoodNAME => MyGoodName
EndOfIT => EndOfIt
EndOfItALL => EndOfItAll
UUIDIsACOOLType => UuidIsAcoolType

Both the lookahead/behind and the appendReplacement/Tail are very handy but relatively rarely used in my experience.

Friday, September 10, 2010

The leaking of the JAXB

A novel memory leak was discovered in some of our software today. Apparently JAXBContext will create new dynamic classes each time newInstance is called. This means if you naively implemented a JAXB-based XML translation, similar to the example below, you would create new classes endlessly and run out of memory:

StringWriter sw = new StringWriter();
JAXBContext jaxbContext = JAXBContext.newInstance(myObject.getClass());
Marshaller marshal = jaxbContext.createMarshaller();
marshal.marshal(myObject, sw);

The unfortunate thing is that this is exactly what one would come up with after skimming the examples in the javadoc.

The recommended solution on the internets is to cache the JAXBContext, including re-using it on multiple threads. This is a bit novel as the documentation for the JAXBContext doesn't indicate this will happen and it also doesn't indicate the JAXBContext is thread-safe. To discover that you have to go into specific implementation information, such as https://jaxb.dev.java.net/guide/Performance_and_thread_safety.html.

These details vary by implementation so one might legitimately argue the javase documentation is correct. It's really just aggravating to discover that the implementation is built in a way so likely to cause clients to leak. This strikes me as a great example of a very bad implementation of an API, especially given that the implementation could easily maintain a lookup and avoid re-creating the dynamic classes and creating the leak for us.