Monday, January 31, 2011

How To use Dom4J XPath with XML Namespaces

Noting this because I can't for the life of me remember this; every time I need to do it I have to Google it. Perhaps because it's not tremendously intuitive, at least to my way of thinking.

Suppose we want to select the filter named terracotta from the XML below via something similar to //filter[filter-name/text()='terracotta'] (with commons-io on classpath for FileUtils if you want to do it *exactly* as shown):
<?xml version="1.0" encoding="UTF-8"?>

<web-app version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">

 <context-param>
  <param-name>javax.servlet.jsp.jstl.fmt.localizationContext</param-name>
  <param-value>messages</param-value>
 </context-param>

 <servlet>
  <servlet-name>default</servlet-name>
  <servlet-class>org.mortbay.jetty.servlet.DefaultServlet</servlet-class>
  <init-param>
   <param-name>dirAllowed</param-name>
   <param-value>false</param-value>
  </init-param>

  <load-on-startup>0</load-on-startup>
 </servlet>
  
  <!-- filter for terracotta session mgmt. See http://www.terracotta.org/documentation/ga/web-sessions-install.html -->
  <filter> <!-- <== WE WANT TO SELECT THIS NODE -->
   <filter-name>terracotta</filter-name>
   <filter-class>org.terracotta.session.TerracottaJetty61xSessionFilter</filter-class>
   <init-param>
    <param-name>tcConfigUrl</param-name>
    <param-value>$TerracottaServerList</param-value>
   </init-param>
  </filter> 

  <!-- ...ommitted...-->
</web-app>  
Our first draft of the code might look something like this:
Document dom = DocumentHelper.parseText(FileUtils.readFileToString(webXmlFile));
  Node tcFilterCfg = dom.selectSingleNode("//filter[filter-name/text()='terracotta']");
Unfortunately this gets us a null as the document uses a default namespace (xmlns="http://java.sun.com/xml/ns/j2ee"). To select it properly we have to build an XPath object that knows about namespaces and then write our query to explicitly indicate which one we are looking for (yes, we could probably set a default but that's sloppy). The result is this nice, clean, intuitive block of garbage:
Document dom = DocumentHelper.parseText(FileUtils.readFileToString(webXmlFile));
  Map<String, String> namespaceUris = new HashMap<String, String>();
  namespaceUris.put("j2ee", "http://java.sun.com/xml/ns/j2ee");
  
  XPath xPath = DocumentHelper.createXPath("//j2ee:filter[j2ee:filter-name/text()='terracotta']");
  xPath.setNamespaceURIs(namespaceUris);
  
  Node tcFilterCfg = xPath.selectSingleNode(dom);

Note that we use the Map to tell the XPath that j2ee:something means the something in the xmlns for http://java.sun.com/xml/ns/j2ee.

The code winds up quite different (alias j2ee to such and such then select the filter from j2ee) than what my brain is thinking about doing (select that one!), which is probably why I can never quite remember this.

Boo-urns!

Erlang Concurrent Echo Server

To continue on from the simplest echo server in Erlang vs a Java equivalent (see here) I decided to upgrade to a basic concurrent server. The idea is basically to have N threads that can each handle doing read/echo on a socket. When idle workers must somehow await the arrival of a new connection. This means we need to either share a listening socket among threads or have a thread that can loop on accept and dole out the resulting connections to handler threads. The former probably fits Erlang better; the latter sounds very likely in an OO implementation.

A little research suggests in Erlang it is safe to have many Erlang processes all calling gen_tcp:accept on a single socket; when a connection comes in one of the processes will unblock from accept, receiving a shiny new socket! This makes life relatively easy: we essentially want a program that starts N processes, each of which runs the exact same logic as the simplest echo server we wrote earlier. In newb-Erlang this winds up looking like this:

%% blocking echo server, take 2
%% based on http://www.erlang.org/doc/man/gen_tcp.html#examples

-module(echoConcurrent).

-export([server/2]).

%% happy path; yay guards!
%% server will open a listen socket on Port and boot WorkerCount workers to accept/echo on it
server(WorkerCount, Port) when 
  is_integer(WorkerCount), WorkerCount > 0, WorkerCount < 1000, 
  is_integer(Port), Port > 0, Port < 65536 + 1 ->
  
  io:format("~p is lord and master; open the listen socket and release the gerbils!~n", [self()]),
  
  case gen_tcp:listen(Port, [{active, false}, {packet, 0}]) of
    {ok, ListenSocket} -> 
      spawnServers(WorkerCount, ListenSocket),
      ok;
    {error, Reason} -> {error, Reason}
  end;

%% badargs to server; show a message and die
server(WorkerCount, Port) ->
  io:format("Must provider worker count between 0 and 1000, port between 1 and 65536."),
  io:format("WorkerCount='~p', Port='~p' is invalid~n", [WorkerCount, Port]),
  {error, badarg}.
  
%% spawning 0 servers is relatively easy  
spawnServers(0, _) -> ok;  
%% to spawn Count servers on ListenSocket you just spawn 1 and recurse for Count-1
spawnServers(Count, ListenSocket) ->
  spawn(fun() -> acceptEchoLoop(ListenSocket) end),
  %% to do this we have to export acceptEchoLoop: spawn(?MODULE, acceptEchoLoop, [ListenSocket]),
  spawnServers(Count-1, ListenSocket).

%% The heart of our server: Our core worker function
%% Accept's an incoming connection, blocking until one shows up, then read/echoes
%% until that connection goes away or errors, then ... does it all again!  
acceptEchoLoop(ListenSocket) ->
  io:format("Gerbil ~p is waiting for someone to talk to~n", [self()]),
  {ok, Socket} = gen_tcp:accept(ListenSocket),
  %% Show the address of client & the port assigned to our new connection
  case inet:peername(Socket) of
    {ok, {Address, Port}} ->
      io:format("Gerbil ~p will chat with ~p:~p~n", [self(), Address, Port]);
    {error, Reason} ->
      io:format("peername failed :( reason=~p~n", [Reason])
  end,
  receiveAndEcho(Socket),
  ok = gen_tcp:close(Socket),
  acceptEchoLoop(ListenSocket).
  
%% read/echo raw data from a socket, print it blindly, and echo it back  
receiveAndEcho(Socket) ->
  %% block waiting for data...
  case gen_tcp:recv(Socket, 0, 60 * 1000) of
    {ok, Packet} ->
      io:format("Gerbil ~p to recv'd ~p; echoing!!~n", [self(), Packet]),
      gen_tcp:send(Socket, Packet),
      receiveAndEcho(Socket);
    {error, Reason} ->
      io:format("Sad Gerbil %p: ~p~n", [self(), Reason])
  end.  

If we start the server on port 8888 and open a couple of telnet localhost 8888 connections each is picked up by a different process in the Erlang server, as can be seen in the console output from our Erlang server:
<0.30.0> is lord and master; open the listen socket and release the gerbils!
Gerbil <0.36.0> is waiting for someone to talk to
Gerbil <0.37.0> is waiting for someone to talk to
Gerbil <0.38.0> is waiting for someone to talk to
Gerbil <0.39.0> is waiting for someone to talk to
Gerbil <0.40.0> is waiting for someone to talk to
Gerbil <0.41.0> is waiting for someone to talk to
Gerbil <0.42.0> is waiting for someone to talk to
Gerbil <0.43.0> is waiting for someone to talk to
Gerbil <0.44.0> is waiting for someone to talk to
Gerbil <0.45.0> is waiting for someone to talk to
ok
2> Gerbil <0.36.0> will chat with {127,0,0,1}:11067
2> Gerbil <0.36.0> to recv'd "h"; echoing!!
2> Gerbil <0.36.0> to recv'd "e"; echoing!!
2> Gerbil <0.36.0> to recv'd "l"; echoing!!
2> Gerbil <0.36.0> to recv'd "l"; echoing!!
2> Gerbil <0.36.0> to recv'd "o"; echoing!!
2> Gerbil <0.36.0> to recv'd " "; echoing!!
2> Gerbil <0.36.0> to recv'd "w"; echoing!!
2> Gerbil <0.36.0> to recv'd "o"; echoing!!
2> Gerbil <0.36.0> to recv'd "r"; echoing!!
2> Gerbil <0.36.0> to recv'd "l"; echoing!!
2> Gerbil <0.36.0> to recv'd "d"; echoing!!
2> Gerbil <0.37.0> will chat with {127,0,0,1}:11068
2> Gerbil <0.37.0> to recv'd "w"; echoing!!
2> Gerbil <0.37.0> to recv'd "a"; echoing!!
2> Gerbil <0.37.0> to recv'd "s"; echoing!!
2> Gerbil <0.37.0> to recv'd "s"; echoing!!
2> Gerbil <0.37.0> to recv'd "z"; echoing!!
2> Gerbil <0.36.0> to recv'd "a"; echoing!!
2> Gerbil <0.37.0> to recv'd "b"; echoing!!
Note that Erlang process <0.30.0> initially ran and booted 10 additional processes, each of which spins in the acceptEchoLoop. When a telnet session connects and sends "hello world" the <0.36.0> process takes the connection. When another telnet session connects the <0.37.0> process picks it up. Thus we can handle many connections concurrently.

Disclaimer: I have used the things I'm claiming to be somewhat complicated "for real" (eg production) but I have not used Erlang in production so the next paragraph is partially speculation.

In "normal" server coding each thread or process started to handle communication with a single client is quite expensive; we thus find ourselves doing complex fiddly multiplexing of many clients onto a single processing thread when possible using I/O completion ports, Java NIO, thread pools, and so on. As Erlang gives us a very cheap process (Erlang process, not OS process) we can afford to spawn a new process for each client and this may lend itself to a simpler and thus less error-prone coding concurrent coding model.

Wednesday, January 26, 2011

Erlang vs Java: Simple Echo Server

For no particular good reason one of my favorite test programs to familiarize myself with a new programming language is a dead simple (read: improper error handling) TCP/IP server that simply reads bytes from the client and echoes them back. Eg:
  1. listen
  2. accept
  3. Until the socket goes away
    1. recv
    2. send [what we recv'd]
  4. Go to step 2 :)
I'm playing with Erlang lately to expand my coding horizons in the functional direction a little so lets see what this looks like in newbie-Erlang:
%% blocking echo server, take 1

-module(echo).

-export([server/1]).

%% happy path
server(Port) when is_integer(Port), Port > 0, Port < 65536 + 1 ->
  io:format("Initializing echo on port ~w~n", [Port]),
  {ok, ListenSocket} = gen_tcp:listen(Port, [binary, {packet, 0}, {active, false}]),
  listenLoop(ListenSocket);
%% bad server args  
server(Port) -> 
  io:format("Invalid port specification; must be int, 1<=port<=65536. '~p' is invalid~n", [Port]).
  
listenLoop(ListenSocket) ->
  io:format("Blocking on accept...~n"),
  {ok, Socket} = gen_tcp:accept(ListenSocket), %%block waiting for connection
  
  %% Show the address of client & the port assigned to our new connection
  case inet:peername(Socket) of
    {ok, {Address, Port}} ->
      io:format("hello ~p on ~p~n", [Address, Port]);
    {error, Reason} ->
      io:format("peername failed :( ~p~n", [Reason])
  end,
  receiveAndEcho(Socket),
  io:format("Gracefully closing ur socket!~n"),
  ok = gen_tcp:close(Socket),
  listenLoop(ListenSocket).
  
receiveAndEcho(Socket) ->
  %% block waiting for data...
  case gen_tcp:recv(Socket, 0, 60 * 1000) of
    {ok, Packet} ->
      io:format("recv'd ~p!~n", [Packet]),
      gen_tcp:send(Socket, Packet),
      receiveAndEcho(Socket);
    {error, Reason} ->
      io:format("~p~n", [Reason])
  end.
This is about as simple as we can go. Compiled and launched in the Erlang shell similar to c(echo), echo:server(8888). our server will boot up and listen on the port specified, block waiting to accept an incoming connection, echo what it reads from the new socket until something goes awry with the read/echo, then go back to listening. After starting our server we can do exciting things like telnet localhost 8888 and see our characters echo back to us!

This seems trivial but there are actually a couple of cool things about this little snippet.

First cool thing, our server(Port) function uses guards to ensure a valid port (when is_integer(Port), Port > 0, Port < 65536 + 1), opens a listen socket (), and launches our core listening loop.

Second cool thing, our listening loop is done via a recursion:
listenLoop(ListenSocket) ->
  ...stuff...,
  listenLoop(ListenSocket).
At a glance this looks like a stack overflow disaster in waiting: listenLoop doesn't loop, it just calls listenLoop again!! This is actually OK because Erlang is tail-recursive. That is, we are NOT piling up stack frames indefinitely here; if the last call is a recursion it will (roughly) drop the current stack frame before placing a new one on for the new call to listen. That is, server calls listenLoop and the stack is [server][listenLoop]. When listenLoop calls itself it drops (or potentially re-uses; doesn't really matter) the listenLoop frame so the stack is still [server][listenLoop], NOT [server][listenLoop][listenLoop]. Further details here.

Third cool thing, I think the Erlang version reads better than the rough equivalent in Java:
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.ServerSocket;
import java.net.Socket;


public class SimplestEchoServer {
 public static void main(String[] argv) throws Exception {
  int port = parsePort(argv);
  if (0 != port) {
   System.out.printf("Initializing echo on port %s\n", port);
   server(port);
  }    
 }
 
 //that's right, we're pretty much not error handling!
 private static void server(int port) throws Exception {
  ServerSocket listenSocket = new ServerSocket(port);
  for (;;) {
   Socket socket = null;
   try {
    socket = listenSocket.accept();
    InputStream is = socket.getInputStream();
    OutputStream os = socket.getOutputStream();
    int b;
    while ((b = is.read()) != -1) {
     os.write(b);
    }    
    os.flush();
   } catch (Exception e) {
    System.out.println("Sad panda: " + e.getMessage());    
   }
   closeQuietly(socket);
  }
 }

 private static void closeQuietly(Socket socket) { 
  if (null == socket || socket.isClosed()) {
   return;
  }
  try {   
   socket.close();   
  } catch (IOException ioe) {
   System.out.println("Failed to close socket; keep on truckin': " + ioe.getMessage());
  }
 }

 private static int parsePort(String[] argv) {
  int port = 0;
  String rawPort = "";
  if (argv.length == 1) {   
   rawPort = argv[0];
   try {
    port = Integer.parseInt(rawPort);
   } catch (NumberFormatException nfe) {
    port = 0; 
   } 
  }
  if (port < 1 || port > 65536) {
   printArgs(rawPort);     
  }   
  return port;
 }

 private static void printArgs(String arg) {
  System.out.printf("Invalid port specification; must be int, 1<=port<=65536. '%s' is invalid\n", arg);
 }
}

Making it multi-threaded, perhaps less reliant on blocking, and so on would make both versions much uglier.

I'm not sure I want to re-write my Java applications into Erlang anytime soon but I do think it would be splendiferous if Java stole some of Erlangs spiffy features! As someone else said, "I do hope more languages will borrow concepts from Erlang. It's not the end all be all, but it's a damn good step (ref).

Erlang Guards for Happy Pandas

As a longtime C-like language developer (mostly Java lately) I find guard clauses (see here) tremendously useful. I vastly prefer a few simple checks at the top with either return or throw as appropriate to a code arrow (see here).

In the odd free minute I have been playing with Erlang and one of the features I am quite taken with is guards (ref manual, 7.24). For example, suppose you want to do something only if an argument is an int within a specific range. In Java one might do something like this:
private doStuff(int arg) {
  if (arg < 1 || arg > 65536) {
   throw new IllegalArgumentException("port must be between 1 and 65536; " + arg + " is invalid.");
  }
  //do stuff
 }
That looks kind of normal but what if there was a way to better split out the error handling from the happy path? Erlang guards offer exactly this functionality:
server(Port) when is_integer(Port), Port > 0, Port < 65536 + 1 ->
  io:format("Initializing echo on port ~w~n", [Port]);
%%look at me; I'm a completely different overload-esque function for the 'error' case
server(Port) -> 
  io:format("Invalid port specification; must be int, 1<=port<=65536. '~p' is invalid~n", [Port]).
The 'when ...' bit specifies under what conditions our function should run so in many cases we can trivially split the happy path from the sad panda branches.

I want this in Java damnit!

Wednesday, January 5, 2011

Simple VisualVM gotchas with local processes

VisualVM is pretty awesome. However, for me it had a couple of slightly non-obvious gotchas the very first time I tried to use it to monitor local processes (ironically monitoring remote processes has always worked just as advertised). First, it only 'sees' processes using the same runtime environment automagically, second my copy seems to freeze if it tries to connect to a process that blocked on reading stdin (eg something like System.in.read()).


Problem 1: It only 'sees' processes using the same runtime environment
The documentation claims that "When you start Java VisualVM, a node for each local Java application is displayed under the Local node in the Applications window". As such I found it surprising when I launched a local Java process, started VisualVM to monitor it, and did not see my application listed under the Local node. WTF! 


It turns out that VisualVM only lists applications started using the same runtime. For example, if you start an application using C:\Program Files\Java\jre1.6.0_07\bin\java.exe and start VisualVM using C:\Program Files\Java\jdk1.6.0_07\bin\jvisualvm.exe it does NOT list your application. If you run your application using the copy of java.exe or javaw.exe from C:\Program Files\Java\jdk1.6.0_07\bin\ (the same version of the runtime visualvm is using) then all will be well.


This is simple enough once you know, and bloody aggravating until you figure out whats going on!


Problem 1.1: Eclipse might not use the same runtime environment as VisualVM
I typically use Eclipse to edit Java code and on occasion I find it helpful to launch a process from Eclipse and then monitor it with VisualVM. Naturally this doesn't work at all if Eclipse uses a different runtime environment than VisualVM. This section notes how to correct this problem.


If you want to start a process from in Eclipse it may be set (under Window>Preferences>Java>Installed JREs on Eclipse 3.6) to use the default JRE, similar to the setup below:


If the JDK is not listed we need to add it. Click add..., choose standard vm as type of JRE, and enter the JRE Home directory, similar to C:\Program Files\Java\jdk1.6.0_07. End result should be something similar to the following:


Next we need to make sure the project uses the appropriate system library. Right-click on the project, choose properties and then Java Build Path you should see the JRE System Library. If it is not set to the same JDK we will launch VisualVM from we'll need to change it. For example, you might see it is set to use a JRE as shown below:


In this case change it to the JDK by removing the JRE entry, then clicking Add Library, choosing to add a JRE System Library, and selecting the JRE to use, either by using Workspace default if you set the JDK as the default or by choosing Alternate JRE and explicitly specifying the JDK for the project.


Once the project is set to use the same JDK that is providing VisualVM you should find VisualVM automagically detects you process. Hooray!


Problem 2: It will freeze connecting to a process blocked on stdin
Having corrected problem 1 we can now see our process under the applications node, as shown below.




However, with the process blocking on a read of stdin double-clicking on the process put VisualVM into a seemingly indefinite (seemingly because after 60s I terminated it) wait, displaying it's helpful progress bar on the bottom right.


Unfortunately I don't know why this should be. If the process is blocked on a synchronization construct (most simply Thread.sleep()) VisualVM seems to connect fine, it just hates to bother a process waiting for user input.


This occurs consistently for me on Windows Server 2003 SP2, Java 1.6.0_23.