2012-04-07

Radioactive decay from a Monte Carlo perspective

We know from the laws of physics that the radioactive decay of an isotope is a random process. The probability of decay per unit time is constant, independent of time, and is given by the decay rate λ. This implies that the probability for an isotope to decay in a sufficiently small time period dt is given by λdt (when λdt << 1). There are cases for which the decay rate is not constant but they will not be elaborated here.

Suppose that we have N0 atoms of an isotope with decay rate λ at the beginning of time. The mathematical formula that describes the number of atoms present at a later time is given by

However, we will try to arrive at the same exponential decay form by using a Monte Carlo technique.

Assuming that we are looking at the number of atoms in sufficiently small time steps so that λdt << 1. The probability that one of these atoms decay within each time step is λdt so we can test if an atom have decay or not by comparing a random number generated from a uniform distribution [0,1) by this probability. This comparison is performed for all atoms, in each time step. Let us write this in pseudo code:

number of atoms = N
for each time step dt:
   for each atom:
      R = random number between 0 and 1
      if( R < lambda * dt ): N = N-1
   Save N for this time step.

So, if we generate a random number that is smaller than λdt, we say that it decays. On the average, after many samples of this kind, the fraction of decayed atoms will converge to the true fraction that would have decayed if we infinitely many atoms.

Let us look at 137Cs. It has a half life of 11018.3 ± 9.5 days, according to NIST. Its decay constant is ln(2)·T-1½ ≈ 22.977·10-3 per year. Suppose we start with 1000 atoms of 137Cs. Running the code above, coded in Javascript with a time step of one year, we get the following number of remaining atoms for the first few years:


We can plot the number of remaining atoms as a function of time:

(View only this post to see the plot.)
Time [years]

You can compare the number of remaining atoms with the theoretical exponential decay function. The number of atoms calculated with the Monte Carlo technique is in better agreement with reality since nature is probabilistic as demonstrated above. At one half life, at about 30 years for 137Cs, we have about half of the atoms remaining.

2012-03-21

Parallelize standard C++ programs using GNU g++

Nowadays, most new computers are shipped with several CPU cores. The trend seem to be going to many or even multi core systems. So, writing code that executes in parallel is becoming more and more important if you want to harness the full potential of your platform.

The GNU compiler for C++ (g++) have at least since version 4.3 been able to parallelize several algorithms from the standard C++ library using its buit-in implementation of OpenMP. As of version 4.6.3, the following algorithms or functions can be parallelized:

From <numeric>:
std::accumulate
std::adjacent_difference
std::inner_product
std::partial_sum
From <algorithm>:
std::adjacent_find
std::count
std::count_if
std::equal
std::find
std::find_if
std::find_first_of
std::for_each
std::generate
std::generate_n
std::lexicographical_compare
std::mismatch
std::search
std::search_n
std::transform
std::replace
std::replace_if
std::max_element
std::merge
std::min_element
std::nth_element
std::partial_sort
std::partition
std::random_shuffle
std::set_union
std::set_intersection
std::set_symmetric_difference
std::set_difference
std::sort
std::stable_sort
std::unique_copy

To test this functionality, I wrote a simple programs that generate many random numbers in a vector followed by a sort. You can find the program below. To turn on the parallelizing feature of GNU g++, the following options need to be supplied to the compiler:

-fopenmp -D_GLIBCXX_PARALLEL -march=native

The -march is needed since atomic operations must be used. On a Mac, the following compiler options could be used (-fast includes -mcpu):

-fopenmp -D_GLIBCXX_PARALLEL -fast

Here's the standard C++ program:

#include <algorithm>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <stdexcept>
#include <vector>
      
int main()
{     
   using namespace std;
   try
   {
      srand(time(NULL));
   
      vector<int> v(100000000);
      cout << "Generating " << v.size() << " random numbers.\n";
      generate(v.begin(),v.end(),rand);

      cout << "Sorting.\n";
      sort(v.begin(),v.end());

      return EXIT_SUCCESS;
   }
   catch(const exception& e)
   {
      cerr << e.what() << '\n';
      return EXIT_FAILURE;
   }
}

Here is a CmakeLists.txt file for CMake to generate build scripts for this program:

cmake_minimum_required (VERSION 2.6)
project ("Test of GNU g++ parallel")

if(CMAKE_COMPILER_IS_GNUCC)
   execute_process(COMMAND ${CMAKE_C_COMPILER} -dumpversion
         OUTPUT_VARIABLE GCC_VERSION)
   if (GCC_VERSION VERSION_GREATER 4.3
         OR GCC_VERSION VERSION_EQUAL 4.3)
      message(STATUS "GNU gcc version >= 4.3")
      set(CMAKE_CXX_FLAGS
            "-fopenmp -D_GLIBCXX_PARALLEL -march=native")
      # On a Mac:
      #set(CMAKE_CXX_FLAGS "-fopenmp -D_GLIBCXX_PARALLEL -fast")
      # This works as well:
      #add_definitions(-D_GLIBCXX_PARALLEL -fopenmp -fast)
   else()
      message(STATUS "GNU gcc version < 4.3.")
   endif()
endif()

# The program is written in 'test.cpp'.   
add_executable(test test.cpp)

If you have access to a more-than-one core machine, then you can relatively easily compile and run the program both with and without parallelizing the code to see the effect.

Happy execution!

2012-02-20

SQL: Cumulative histogram from histogram

Quite often you are given the task of constructing a cumulative histogram from frequency distributions. One reason for such a task might be that you want to sample random numbers from the original histogram.

Sampling a random number between 0 and 1, the normalized cumulative histogram could be used together with a search algorithm to find the corresponding bin. The search could be made quite efficient since the cumulative distribution is monotonically increasing.

Suppose we have defined the following table in a database, i.e. a histogram with weights (or counts) sorted into bins.

CREATE TABLE histogram(bin INTEGER, weight INTEGER);

The following two SQL statements will both produce the original histogram and the cumulative histogram.

SELECT h1.bin AS Bin,
       h1.weight AS Histogram,
       ( SELECT SUM(h2.weight)
         FROM histogram h2
         WHERE h2.bin<=h1.bin ) AS Cumulative
 FROM histogram h1;
SELECT h1.bin AS Bin,
       h1.weight AS Histogram,
       SUM(h2.weight) AS Cumulative
 FROM histogram h1
 LEFT OUTER JOIN histogram h2 ON h2.bin<=h1.bin
 GROUP BY h1.bin;

Here is some sample data:

BEGIN TRANSACTION;
INSERT INTO "histogram" VALUES(1,10);
INSERT INTO "histogram" VALUES(2,20);
INSERT INTO "histogram" VALUES(3,25);
INSERT INTO "histogram" VALUES(4,5);
INSERT INTO "histogram" VALUES(5,15);
INSERT INTO "histogram" VALUES(6,1);
INSERT INTO "histogram" VALUES(7,4);
INSERT INTO "histogram" VALUES(8,10);
INSERT INTO "histogram" VALUES(9,15);
INSERT INTO "histogram" VALUES(10,100);
COMMIT;

Executing the above statements using SQLite as the database engine will produce the follow output:

Bin         Histogram   Cumulative
----------  ----------  ----------
1           10          10        
2           20          30        
3           25          55        
4           5           60        
5           15          75        
6           1           76        
7           4           80        
8           10          90        
9           15          105       
10          100         205