2009-06-30

A C++ class for histogramming data

Another recurring issue; How to effectively create a histogram of some data? A histogram is also called a frequency distribution.

I have created a small class in C++ that provides a histogram and can count the frequency of data values within a range. The code is supplied below for your amusement. If you have any idea on how to improve on the code, please let me know in a comment. Of course, the standard template library could be used, e.g. std::vector<unsigned int>, if you like to have an even more simple class. But I like code that have no includes :-)

Update 2011-06-17: I added code to keep track of overflows, i.e. counting the number off calls to the Add method with an argument that would not fit in the histogram range.

Update 2011-08-03: Added track of underflows, similar to the overflow addition 2011-06-17.

// A histogram class.
// The Histogram object can keep a tally of values
// within a range, the range is arranged into some
// number of bins specified during construction.
// Any allocation of a Histogram object may throw
// a bad_alloc exception.
// Dedicated to the public domain.
// Jansson Consulting
// 2009-06-30, updated 2011-06-17 and 2011-08-03
class Histogram
{
   public:
      // Construct a histogram that can count
      // within a range of values.
      // All bins of the histogram are set to zero.
      Histogram(
            const double& Start,
            const double& End,
            const unsigned int& nBins):
         Start(Start),
         nBins_by_interval(nBins/(End-Start)),
         nBins(nBins),
         freq( new unsigned int[nBins] ),
         overflow(0U),
         underflow(0U)
      {
         for(unsigned int i(0); i < nBins; ++i)
            freq[i] = 0U;
      }
      // Construct a histogram by copying another one.
      Histogram(const Histogram& other):
         Start(other.Start),
         nBins_by_interval(other.nBins_by_interval),
         nBins(other.nBins),
         freq( new unsigned int[nBins] ),
         overflow(other.overflow),
         underflow(other.underflow)
      {
         for(unsigned int i(0); i < nBins; ++i)
            freq[i] = other.freq[i];
      }
      // Deallocate the memory that was allocated for
      // the tallied counts.
      ~Histogram() {delete[] freq;}
      // Set this histogram equal to another.
      Histogram& operator=(const Histogram& other)
      {
         if( this != &other )
         {
            Start = other.Start;
            nBins_by_interval = other.nBins_by_interval;
            if( nBins != other.nBins )
            {
               nBins = other.nBins;
               delete[] freq;
               freq = new unsigned int[nBins];
            }
            for(unsigned int i(0); i < nBins; ++i)
               freq[i] = other.freq[i];
            overflow = other.overflow;
            underflow = other.underflow;
         }
         return *this;
      }
      // Increase the count for the bin that holds a
      // value that is in range for this histogram or
      // the under-/overflow count if it is not in range.
      void Add(const double& x)
      {
         if( x < Start )
            underflow++;
         else
         {
            const unsigned int i(
                  static_cast<unsigned int>(
                     (x-Start)*nBins_by_interval) );
            if( i < nBins ) freq[i]++;
            else overflow++;
         }
      }
      // Get the sum of all counts in the histogram.
      unsigned int GetTotalCount() const
      {
         unsigned int c(0U);
         for( unsigned int i(0); i < nBins; ++i )
            c += freq[i];
         return c;
      }
      // Get the overflow count.
      unsigned int GetOverflow() const
      {
         return overflow;
      }
      // Get the underflow count.
      unsigned int GetUnderflow() const
      {
         return underflow;
      }
   private:
      double Start,nBins_by_interval;
      unsigned int nBins;
      unsigned int* freq;
      unsigned int overflow;
      unsigned int underflow;
};

2 kommentarer:

  1. This class will break for data points that occur at the upper limit of your data range

    ReplyDelete
  2. How would it break? The code always checks the bin to be within the array.

    ReplyDelete