BEFORE => AFTER
MyGoodName => MyGoodName
MYGoodName => MyGoodName
MyGOODName => MyGoodName
MyGoodNAME => MyGoodName
EndOfIT => EndOfIt
EndOfItALL => EndOfItAll
UUIDIsACOOLType => UuidIsAcoolType
It's easy enough to find such sequences using a regular expression but how to correctly replace them was slightly less clear. So, starting with the easy part, to find such sequences we can use a pattern that says "a capital, followed by some more capitals, ending with either another capital or the end of the string":
//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. String expr = "[A-Z][A-Z]+([A-Z]|$)";
The middle set of capitals - the [A-Z]+ - are the ones we'd want to change to lowercase. So ... how to match and replace? Probably we'll want to put that into a group so we can easily use it in a replace:
//1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. String expr = "[A-Z][A-Z]+([A-Z]|$)";
The normal replaceAll style APIs on Matcher do not appear to easily allow you to swap in a modified version of a group. Luckily the appendReplacement and appendTail APIs do allow this. There is a great example in the javadoc.
So ... we should be able to do something like this:
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexTest { public static void main(String[] argv) { // //1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. String expr = "[A-Z][A-Z]+[A-Z]|$"; String[] testStrings = { "MyGoodName", "MYGoodName", "MyGOODName", "MyGoodNAME", "EndOfIT", "EndOfItALL", "UUIDIsACOOLType" }; Pattern pattern = Pattern.compile(expr); for (String testString : testStrings) { Matcher matcher = pattern.matcher(testString); StringBuffer sb = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(sb, matcher.group().toLowerCase()); } matcher.appendTail(sb); System.out.printf("%1$s => %2$s\n", testString, sb.toString()); } } }
Unfortunately this doesn't work because our match is the entire string so we end up with the following output:
MyGoodName => MyGoodName MYGoodName => mygoodName MyGOODName => Mygoodname MyGoodNAME => MyGoodname EndOfIT => EndOfIT EndOfItALL => EndOfItall UUIDIsACOOLType => uuidisacooltype
We could monkey around with trying to replace only a specific group based on it's start/end indices or some such nonsense but it would really be much nicer to match only the consecutive caps, with the leading cap and trailing cap or end-of-string not actually being considered a part of the match. Luckily java supports lookaround in regular expressions. We can revise our program to use lookahead/behind for the start/end and have only the desired bit be the match:
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexTest { public static void main(String[] argv) { // //1+ CAPS preceeded by a CAP and ending in either another CAP or end-of-string. String expr = "(?<=[A-Z])[A-Z]+(?=[A-Z]|$)"; String[] testStrings = { "MyGoodName", "MYGoodName", "MyGOODName", "MyGoodNAME", "EndOfIT", "EndOfItALL", "UUIDIsACOOLType" }; Pattern pattern = Pattern.compile(expr); for (String testString : testStrings) { Matcher matcher = pattern.matcher(testString); StringBuffer sb = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(sb, matcher.group().toLowerCase()); } matcher.appendTail(sb); System.out.printf("%1$s => %2$s\n", testString, sb.toString()); } } }
This will finally produce the desired output:
MyGoodName => MyGoodName MYGoodName => MyGoodName MyGOODName => MyGoodName MyGoodNAME => MyGoodName EndOfIT => EndOfIt EndOfItALL => EndOfItAll UUIDIsACOOLType => UuidIsAcoolType
Both the lookahead/behind and the appendReplacement/Tail are very handy but relatively rarely used in my experience.
No comments:
Post a Comment