So, how do we track a standard deviation from a series of samples? We could keep all samples in memory but that sounds bad; ideally we'd just like a few variables used to keep an running count of the stat. Luckily Knuth solved this problem for us, and even more luckily a statistician has written sample code in C# for us at http://www.johndcook.com/standard_deviation.html.
In Java this winds up looking remarkably similar:
//class public class RunningStat { private int m_n; private double m_oldM; private double m_newM; private double m_oldS; private double m_newS; public RunningStat() { m_n = 0; } public void clear() { m_n = 0; } public void addSample(double sample) { m_n++; // See Knuth TAOCP vol 2, 3rd edition, page 232 if (m_n == 1) { m_oldM = m_newM = sample; m_oldS = 0.0; } else { m_newM = m_oldM + (sample - m_oldM)/m_n; m_newS = m_oldS + (sample - m_oldM)*(sample - m_newM); // set up for next iteration m_oldM = m_newM; m_oldS = m_newS; } } public int getNumSamples() { return m_n; } public double getMean() { return (m_n > 0) ? m_newM : 0.0; } public double getVariance() { return ( (m_n > 1) ? m_newS/(m_n - 1) : 0.0 ); } public double getStdDev() { return Math.sqrt(getVariance()); } } //usage RunningStat timeStat = new RunningStat(); ... long time = System.nanoTime(); //do the thing we are tracking... time = (System.nanoTime() - time) / (1000*1000); //ms timeStat.addSample(time);
This does have some limitations for a typical Java web application, in particular it isn't threadsafe. We can simply slap synchronized on it, using a private lock to avoid unwanted publicity of our synchronization primatives:
public class RunningStat { private int m_n; private double m_oldM; private double m_newM; private double m_oldS; private double m_newS; private Object m_lock = new Object(); public RunningStat() { m_n = 0; } public void clear() { synchronized(m_lock) { m_n = 0; } } public void addSample(double sample) { synchronized(m_lock) { m_n++; // See Knuth TAOCP vol 2, 3rd edition, page 232 if (m_n == 1) { m_oldM = m_newM = sample; m_oldS = 0.0; } else { m_newM = m_oldM + (sample - m_oldM)/m_n; m_newS = m_oldS + (sample - m_oldM)*(sample - m_newM); // set up for next iteration m_oldM = m_newM; m_oldS = m_newS; } } } public int getNumSamples() { synchronized(m_lock) { return m_n; } } public double getMean() { synchronized(m_lock) { return (m_n > 0) ? m_newM : 0.0; } } public double getVariance() { synchronized(m_lock) { return ( (m_n > 1) ? m_newS/(m_n - 1) : 0.0 ); } } public double getStdDev() { synchronized(m_lock) { return Math.sqrt(getVariance()); } } }This leaves open the question of how we actually distribute references to the stat. For example, it could be that anytime a given handler executes we want to update a stat, and we want to grab that stat to use in a periodic stat logger, and we want to grab it for our view statistics page. All these locations need to get a reference to the same instance of RunningStat. If we use something like Spring we can trivially inject the same reference. If not we may wish to manage a stats registry ourselves. A simple implementation using ConcurrentHashMap might look something like this:
import java.util.Collection; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; public class StatisticsRegistry { private static final StatisticsRegistry instance = new StatisticsRegistry(); private final ConcurrentMapClients can request a stat by name and clients like a stat logger or viewer can grab either all stats or specific ones as needed. As the same instance is always returned it is safe for clients to write code to retain RunningStat references.statById = new ConcurrentHashMap (); private final Object writeLock = new Object(); public static StatisticsRegistry getInstance() { return instance; } public RunningStat getNamedStatistic(String name) { /** * Usually the stat will exist; avoid extra sync ops on write lock when possible */ RunningStat stat = statById.get(name); if (null == stat) { synchronized(writeLock) { //someone else could have just inserted it; if so putIfAbsent will return the value they put in RunningStat existing = statById.putIfAbsent(name, new RunningStat()); if (existing != null) { stat = existing; } } } return stat; } public Collection getAllStatistics() { return statById.values(); } }