I ran into a novel (in that it isn't something I need to do very often) problem today: the need to convert names that might have sequences of capitals to CamelCase with no consecutive capitals in Java. For example:
BEFORE => AFTER
MyGoodName => MyGoodName
MYGoodName => MyGoodName
MyGOODName => MyGoodName
MyGoodNAME => MyGoodName
EndOfIT => EndOfIt
EndOfItALL => EndOfItAll
UUIDIsACOOLType => UuidIsAcoolType
It's easy enough to find such sequences using a regular expression but how to correctly replace them was slightly less clear. So, starting with the easy part, to find such sequences we can use a pattern that says "a capital, followed by some more capitals, ending with either another capital or the end of the string":
//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string.
String expr = "[A-Z][A-Z]+([A-Z]|$)";
The middle set of capitals - the
[A-Z]+ - are the ones we'd want to change to lowercase. So ... how to match and replace? Probably we'll want to put that into a group so we can easily use it in a replace:
//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string.
String expr = "[A-Z][A-Z]+([A-Z]|$)";
The normal replaceAll style APIs on
Matcher do not appear to easily allow you to swap in a modified version of a group. Luckily the appendReplacement and appendTail APIs do allow this. There is a great
example in the javadoc.
So ... we should be able to do something like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] argv) { //
//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string.
String expr = "[A-Z][A-Z]+[A-Z]|$";
String[] testStrings = { "MyGoodName", "MYGoodName", "MyGOODName", "MyGoodNAME", "EndOfIT", "EndOfItALL", "UUIDIsACOOLType" };
Pattern pattern = Pattern.compile(expr);
for (String testString : testStrings) {
Matcher matcher = pattern.matcher(testString);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().toLowerCase());
}
matcher.appendTail(sb);
System.out.printf("%1$s => %2$s\n", testString, sb.toString());
}
}
}
Unfortunately this doesn't work because our match is the entire string so we end up with the following output:
MyGoodName => MyGoodName
MYGoodName => mygoodName
MyGOODName => Mygoodname
MyGoodNAME => MyGoodname
EndOfIT => EndOfIT
EndOfItALL => EndOfItall
UUIDIsACOOLType => uuidisacooltype
We could monkey around with trying to replace only a specific group based on it's start/end indices or some such nonsense but it would really be much nicer to match only the consecutive caps, with the leading cap and trailing cap or end-of-string not actually being considered a part of the match. Luckily java supports
lookaround in regular expressions. We can revise our program to use lookahead/behind for the start/end and have only the desired bit be the match:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] argv) { //
//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string.
String expr = "(?<=[A-Z])[A-Z]+(?=[A-Z]|$)";
String[] testStrings = { "MyGoodName", "MYGoodName", "MyGOODName", "MyGoodNAME", "EndOfIT", "EndOfItALL", "UUIDIsACOOLType" };
Pattern pattern = Pattern.compile(expr);
for (String testString : testStrings) {
Matcher matcher = pattern.matcher(testString);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().toLowerCase());
}
matcher.appendTail(sb);
System.out.printf("%1$s => %2$s\n", testString, sb.toString());
}
}
}
This will finally produce the desired output:
MyGoodName => MyGoodName
MYGoodName => MyGoodName
MyGOODName => MyGoodName
MyGoodNAME => MyGoodName
EndOfIT => EndOfIt
EndOfItALL => EndOfItAll
UUIDIsACOOLType => UuidIsAcoolType
Both the lookahead/behind and the appendReplacement/Tail are very handy but relatively rarely used in my experience.