Friday, September 23, 2011

Tracking a running standard deviation

It is fairly common to want to track statistics on your software as it runs. Sometimes it makes sense to scrape logs and aggregate a massive set of samples. Other times you really want "live" numbers; eg running statistics. Most commonly, arithmetic mean and standard deviation. Tracking arithmetic mean alone is relatively easy but it can be very misleading. I frequently see claims of "good performance" based on mean alone which completely break down when it turns out that if we consider standard deviation a significant percentage of our population is actually experiencing something much, much worse than the mean.

So, how do we track a standard deviation from a series of samples? We could keep all samples in memory but that sounds bad; ideally we'd just like a few variables used to keep an running count of the stat. Luckily Knuth solved this problem for us, and even more luckily a statistician has written sample code in C# for us at http://www.johndcook.com/standard_deviation.html.

In Java this winds up looking remarkably similar:
//class
public class RunningStat {
	private int m_n;
	private double m_oldM;
	private double m_newM;
	private double m_oldS;
	private double m_newS;
	
	public RunningStat() {
		m_n = 0;
	}
	
	public void clear() { m_n = 0; }
	
	public void addSample(double sample) {
        m_n++;

        // See Knuth TAOCP vol 2, 3rd edition, page 232
        if (m_n == 1)
        {
            m_oldM = m_newM = sample;
            m_oldS = 0.0;
        }
        else
        {
            m_newM = m_oldM + (sample - m_oldM)/m_n;
            m_newS = m_oldS + (sample - m_oldM)*(sample - m_newM);

            // set up for next iteration
            m_oldM = m_newM; 
            m_oldS = m_newS;
        }
	}
	
	public int getNumSamples() { return m_n; }
	public double getMean() { return (m_n > 0) ? m_newM : 0.0; }
	public double getVariance() { return ( (m_n > 1) ? m_newS/(m_n - 1) : 0.0 ); }
	public double getStdDev() { return Math.sqrt(getVariance()); }	
}

//usage
RunningStat timeStat = new RunningStat();		
...
long time = System.nanoTime();			
//do the thing we are tracking...						
time = (System.nanoTime() - time) / (1000*1000); //ms
			
timeStat.addSample(time); 

This does have some limitations for a typical Java web application, in particular it isn't threadsafe. We can simply slap synchronized on it, using a private lock to avoid unwanted publicity of our synchronization primatives:
public class RunningStat {
	private int m_n;
	private double m_oldM;
	private double m_newM;
	private double m_oldS;
	private double m_newS;
	
	private Object m_lock = new Object();
	
	public RunningStat() {
		m_n = 0;
	}
	
	public void clear() {
		synchronized(m_lock) {
			m_n = 0;
		}
	}
	
	public void addSample(double sample) {
		synchronized(m_lock) {
	        m_n++;
	
	        // See Knuth TAOCP vol 2, 3rd edition, page 232
	        if (m_n == 1)
	        {
	            m_oldM = m_newM = sample;
	            m_oldS = 0.0;
	        }
	        else
	        {
	            m_newM = m_oldM + (sample - m_oldM)/m_n;
	            m_newS = m_oldS + (sample - m_oldM)*(sample - m_newM);
	
	            // set up for next iteration
	            m_oldM = m_newM; 
	            m_oldS = m_newS;
	        }
		}
	}
	
	public int getNumSamples() { synchronized(m_lock) { return m_n; } }
	public double getMean() { synchronized(m_lock) { return (m_n > 0) ? m_newM : 0.0; } }
	public double getVariance() { synchronized(m_lock) { return ( (m_n > 1) ? m_newS/(m_n - 1) : 0.0 ); } }
	public double getStdDev() { synchronized(m_lock) { return Math.sqrt(getVariance()); } }	
}
This leaves open the question of how we actually distribute references to the stat. For example, it could be that anytime a given handler executes we want to update a stat, and we want to grab that stat to use in a periodic stat logger, and we want to grab it for our view statistics page. All these locations need to get a reference to the same instance of RunningStat. If we use something like Spring we can trivially inject the same reference. If not we may wish to manage a stats registry ourselves. A simple implementation using ConcurrentHashMap might look something like this:
import java.util.Collection;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

public class StatisticsRegistry {
	private static final StatisticsRegistry instance = new StatisticsRegistry();				
	
	private final ConcurrentMap statById = new ConcurrentHashMap(); 
	private final Object writeLock = new Object();
	
	public static StatisticsRegistry getInstance() { return instance; }
	
	public RunningStat getNamedStatistic(String name) {
		/**
		 * Usually the stat will exist; avoid extra sync ops on write lock when possible
		 */
		RunningStat stat = statById.get(name);
		
		if (null == stat) {
			synchronized(writeLock) {
				//someone else could have just inserted it; if so putIfAbsent will return the value they put in
				RunningStat existing = statById.putIfAbsent(name, new RunningStat());
				if (existing != null) {
					stat = existing;
				}
			}
		}
		
		return stat;
	}
		
	public Collection getAllStatistics() {
		return statById.values();
	}
}
Clients can request a stat by name and clients like a stat logger or viewer can grab either all stats or specific ones as needed. As the same instance is always returned it is safe for clients to write code to retain RunningStat references.

10 comments:

navya said...

I have read your blog its very attractive and impressive. I like it your blog.

Java Online Training Java EE Online Training Java EE Online Training Java 8 online training Core Java 8 online training

Java Online Training from India Java Online Training from India Core Java Training Online Core Java Training Online Java Training InstitutesJava Training Institutes

htop said...

thanks for sharing this information
azure training in chennai
best devops training in chennai
best hadoop training in chennai
best hadoop training in omr
hadoop training in sholinganallur

i Digital Academy said...

Nice post!! Thanks for sharing...
SEO Services Bangalore

wwwnortoncomsetup.com said...

I'm so glad and enjoyed your BLOG, It is very informative on the subject or topic, and Thanks For Sharing this post. I have something to share here norton.com/setup

thenortonsetup.com said...

Thank you for sharing this blog, It is an amazing blog, I am really impressed by your blog, It’s really useful norton.com/setup
www.norton.com/setup

avg-retail.com said...

I found this is an informative blog and also very useful and knowledgeable. I would like to thank you for the efforts you have made in writing this blog www.avg.com/retail

www.avg.com/activate

avg.com/activate

avg.com/retail

Norton Product Key said...

There are many blogs I have read But when I read Your Blogs I have found such useful information, fresh content with such amazing editing everything is superb in your blog, Thank you so much for sharing this useful and informative information with us. Wish you all the best for upcoming comments. And I have also few informative links which I am going share here. wwww.norton.com/setup enter product key
www.norton.com/setup, Norton product key, Norton Setup, norton.com/setup

john said...

Great Article android based projects

Java Training in Chennai

Project Center in Chennai

Java Training in Chennai

projects for cse

The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

www.webhealthmart.com said...

Nice post, you give the clear idea of the topic and I subscribed your blog, I have some links to share here

Buy Ambien Online

Order Ambien Online

Buy Ambien 10mg Online



xanax bars for sale


fastest Modafinil delivery


ativan 2mg buy online


ativan 1mg tablet buy online


www.buymodapills.com said...

I'm so glad and enjoyed your BLOG, It is very informative on the subject or topic, and Thanks For Sharing this post. I have something to share here
buy modalert online

buy artvigil online

buy waklert online

buy modvigil online


buy vilafinil online

Post a Comment