Home / normal distribution pdf / Normal distribution - UCL HEP Group

Normal distribution - UCL HEP Group - normal distribution pdf

Random numbers and simulation

Please read all details of the task before beginning the exercise.

In this exercise we shall make use of a random number generator in Excel to simulate simple processes – an exponential decay and a random walk. Before doing so we shall make a small excursion into statistics by looking at some properties of a random number distribution.

Random numbers

The Excel function RAND() generates “random numbers” on the range 0 to 1. However as they are produced on a digital machine using a “formula”, they are really pseudo-random numbers. If the same “seed” is used in the formula, the same sequence of random numbers is obtained. However, the algorithm used is supposed to produce numbers which satisfy the statistical properties of true random numbers. These mathematical theorems have the familiar “as [pic]” in them. Time and space limitations of Excel prevent us doing more that demonstrating a few features with samples of a few thousand numbers.

Open a blank spreadsheet. Fill column A with 10000 random numbers in the interval -1 to 1. Use the Tools/Data Analysis/ Random Number Generator dialogue box with
• Number of variables = 1
• Number of random numbers = 10000
• Distribution = Uniform
• Parameters between -1 and 1
• Random number seed = any integer value
• Output range = select cell A1.
Use the AVERAGE and STDEV functions to find the average and standard deviation of this sample. In the spreadsheet insert the values to be expected on theoretical grounds for these two quantities and their uncertainties (errors). (See the Data Analysis note 1B40_DA_Lecture-2 if you are not sure of the theoretical answers.)

The distribution that was generated was a “uniform” one in the range -1 to 1. This means that there should be equal numbers of numbers in equal-sized intervals. Since our sample was only 10000 this will only be approximately true.
Produce a histogram of the number of “random numbers” in the range -1 to 1 in intervals of 0.1. Use the FREQUENCY function method as in the Normal distribution exercise. (Note that since the bin values will be -1, -0.9, -0.8. …0.9, 1.0, the way Excel treats these values as the upper value of the intervals, there will be no entries in the bin labelled -1.0 – there being no numbers in the sample less than or equal to -1.) Verify that the total number of entries is 10000.

Central limit theorem

We can use the random numbers that have been generated to illustrate an important theorem in statistics – the central limit theorem – which is the saving grace for many physicists!

When physicists measure a quantity they often assume that the probability distribution of the quantity is a Gaussian (Normal distribution), even when there is little or no evidence of this fact. (The number of counts per unit time from a radioactive source follow a Poisson distribution, but even a Poisson distribution approximates to a Gaussian as the number of counts get large.) With this assumption a standard deviation [pic]is determined with the (unsubstantiated) implication that 66.8% of the measurements would lie within [pic] of the mean. One justification as to why this approach is reasonable comes from the Central Limit Theorem, proved in some statistics books. If we make repeated measurements of a quantity the distribution of the measurements may have a variety of shapes, many of which are not Gaussian. The Central limit theorem shows that if we make several measurements and determine their mean, and repeat the process to obtain many estimates of the mean, then the distribution of these means will approximate to a Gaussian as the number of determinations becomes large.

We can consider the 10000 random numbers generated earlier as representing 100 samples each of 100 measurements of a quantity that has a uniform distribution. We will look at how the means of these samples are distributed. According to the Central limit theorem, the means themselves should follow a Gaussian distribution, even though the quantities themselves (the random numbers) are uniformly distributed.
• Calculate the average value of successive groups of 100 random numbers, i.e. those in cells A1:A100, A101:A200, A201:A300 … A9901:A10000.
• Plot a histogram of these means in the range -0.2 to 0.2 in intervals of 0.025. The plot should look “Gaussian” like!
Note: It can be tedious to have to type formulae like “=AVERAGE(A1:A101)” a 100 times to calculate the average of the 100 samples. The tedium can be reduced by using a textual formula and indirect addressing. This is illustrated by the table below.
|Column B |C |D |E |F |
|Row 33 |1 |100 |="A"&C33&":A"&D33 |=AVERAGE(INDIRECT(E33)) |
|Row 34 |101 |200 |="A"&C34&":A"&D34 |=AVERAGE(INDIRECT(E34)) |
|… |… |… |… |… |
|Row 132 |9901 |10000 |="A"&C132&":A"&D132 |=AVERAGE(INDIRECT(E132)) |

• In rows 33 and 34 of column C type the numbers 1, 101 respectively. Extend these to cover the range 1, 101, 201 … 9901. By a similar method establish the numbers 100, 200, 300 … 10000 in the adjacent column. In the first row of the next column of this block enter the textual formula ="A"&C33&":A"&D33. In the next column enter the formula =AVERAGE(INDIRECT(E33)).
• Copy these formulae through all 100 rows to get the averages.
The spreadsheet should look similar to the table above, unless you have used different rows and columns of your spreadsheet.

Or you can do it the hard way!

Exponential decay: radioactivity

If a sample of radioactive nuclei is studied it is found that the activity – number of decays per unit time – is proportional to the number of undecayed radioactive nuclei. This leads to the well know exponential decay law,
where[pic]is the number of nuclei remaining at time t and λ is the decay rate. This results in a probability distribution
This exponential decay distribution may be modelled using the random number generator. The uniform distribution of the random numbers may be mapped onto the exponential distribution by using
where RN is a random number and τ is the mean lifetime.

Use a second sheet in the workbook.
• Generate in some column 5000 decay times t according to the above formula. In Excel use “=-meanlife*LN(RAND())” where meanlife is the name referencing some cell in the spreadsheet. Choose a value for meanlife of about 4.
• The RAND function will generate new random numbers every time the worksheet is recalculated, which may slow down your work. To avoid this, you can switch off automatic recalculation in the Tools/Options/Calculation menu. If you do this, you need to press F9 when you want to recalculate the formulas in the spreadsheet.
• Using the frequency function obtain the number of decay times in the time range 0 – 20 in intervals of 0.2 time units.
• Produce X-Y plots of
o the number of decays
o the natural logarithm of the number of decays
as a function of time.
• Determine a value for the decay constant from these graphs
• Estimate the half-life of the radioactive nuclei - the time for the number of nuclei to fall to half its initial value.

Random walk

The second process you will simulate is what’s called a random walk. Suppose a very small particle, only slightly larger in size than a molecule of air, is allowed to fall freely under gravity. Its trajectory is recorded using a digital camera. The horizontal (x) position of the particle is located for each of 8000 frames and stored as a set of picture element values (pixels) relative to the x-position in the first frame. A theoretical analysis of this problem suggests that the average x-position of the particle over long periods of time is zero. However, the probability of finding the particle at a particular x-position should follow a normal (Gaussian) distribution centred on x = 0, with a variance which increases linearly with time (the frame number). Thus the standard deviation should increase as the square root of the time. This example is a one-dimensional version of a random walk process, sometimes called the “Drunkard’s Walk”.
Simulate this process in a third sheet of the workbook.
• Fill column A with 8000 rows of random numbers in the interval -1 to 1. Use the Tools/Data Analysis/Random Number Generation dialogue box.
• In cell B1 enter the value 0 (the initial position of the particle).
• In cell B2 enter the formula =B1+A1. Copy this formula into cells B3:B8000. Thus column B contains the positions of the particle at successive intervals of time.
• Plot the x-position of the particle (column B) against the row number (time). This will show the random walk.

You have generated just one sample of a random walk. The position of yours at the end of the walk may well not be zero. Look at the graphs of your neighbours. If they have used a different seed in the random number generation they will have a different set of plots. Perhaps the sample formed from the data of several of your neighbours’ results would average to zero displacement.

You need, therefore, to generate several independent random walks of different time durations to see the statistical prediction that the variance of the distribution increases linearly with time.

Use another worksheet in the workbook.
• In columns A to G generate 10000 random numbers per column. (Use tools/Data Analysis/Random Number Generation, with Number of variables = 7, Number of random numbers = 10000 … Output range = A1. This produces more than the total of random numbers needed, but it is the quickest way to generate those that are needed. (Even quicker if you turn off automatic calculation in the spreadsheet via the Tool/Options menu, but do remember to turn it back on again!)
• In cells H1 to N1 put zero.
• In cell H2 put = H1+A1
• Copy this formula through cells H2 to N10000. Then the data in the columns H to N represent the positions of the particle for 7 random walks of 10000 steps.
The data in each column are independent, so if we reset the zero of displacement at regular intervals we may obtain from this data an even larger number of random walks. For example we may take data in H1 to H100 as a random walk of 100 steps. By subtracting the value in H100 from the next 100 rows we can take H101 to H200 as another random walk of 100 steps starting once again with zero displacement. We next subtract H200 from the next 100 rows etc. This process can be repeated till we have used up all the x-positions in column H. In this case we get 100 samples of random walks each with 100 steps. A similar process can be applied to each of the other columns, but with a different number of steps in the random walk thereby generating a large number of independent samples of random walks.

In this example you will generate 16 samples of a random walk with a fixed number of steps, and then repeat it for 7 choices of the number of steps. The process of doing so may seem tedious, but we can again make it easier by use of textual formulae and indirect addressing as in the first exercise. The table below shows the formulae that can be used.
The formulae in columns P, Q and R should be copied through columns S to AJ.
(Note the selective use of the $ in the addressing is important!)
The quantity N in row 2 corresponds to the number of steps in the random walk. A suitable choice for the sequence is 36, 81, 144, 225, 324, 441 and 576. (Other values may be chosen as you like, except that the value may not exceed 625.)

| |Col O |Column P |Column Q |Column R |Column S |Column T |Column U |
|Row | | | | | | | |
|1 | |H | | |I | | |
|2 |N |36 |0 | |81 |0 | |
|3 |1 |=P$1&P$2*$O3 |=INDIRECT(P3) |=Q3-Q2 |=S$1&S$2*$O3 |=INDIRECT(S3) |=T3-T2 |
|4 |2 |=P$1&P$2*$O4 |=INDIRECT(P4) |=Q4-Q3 |=S$1&S$2*$O4 |=INDIRECT(S4) |=T4-T3 |
|5 |3 |=P$1&P$2*$O5 |=INDIRECT(P5) |=Q5-Q4 |=S$1&S$2*$O5 |=INDIRECT(S5) |=T5-T4 |
|6 |4 |=P$1&P$2*$O6 |=INDIRECT(P6) |=Q6-Q5 |=S$1&S$2*$O6 |=INDIRECT(S6) |=T6-T5 |
|7 |5 |=P$1&P$2*$O7 |=INDIRECT(P7) |=Q7-Q6 |=S$1&S$2*$O7 |=INDIRECT(S7) |=T7-T6 |
|8 |6 |=P$1&P$2*$O8 |=INDIRECT(P8) |=Q8-Q7 |=S$1&S$2*$O8 |=INDIRECT(S8) |=T8-T7 |
|9 |7 |=P$1&P$2*$O9 |=INDIRECT(P9) |=Q9-Q8 |=S$1&S$2*$O9 |=INDIRECT(S9) |=T9-T8 |
|10 |8 |=P$1&P$2*$O10 |=INDIRECT(P10) |=Q10-Q9 |=S$1&S$2*$O10 |=INDIRECT(S10) |=T10-T9 |
|12 |9 |=P$1&P$2*$O11 |=INDIRECT(P11) |=Q11-Q10 |=S$1&S$2*$O11 |=INDIRECT(S11) |=T11-T10 |
|12 |10 |=P$1&P$2*$O12 |=INDIRECT(P12) |=Q12-Q11 |=S$1&S$2*$O12 |=INDIRECT(S12) |=T12-T11 |
|13 |11 |=P$1&P$2*$O13 |=INDIRECT(P13) |=Q13-Q12 |=S$1&S$2*$O13 |=INDIRECT(S13) |=T13-T12 |
|14 |12 |=P$1&P$2*$O14 |=INDIRECT(P14) |=Q14-Q13 |=S$1&S$2*$O14 |=INDIRECT(S14) |=T14-T13 |
|15 |13 |=P$1&P$2*$O15 |=INDIRECT(P15) |=Q15-Q14 |=S$1&S$2*$O15 |=INDIRECT(S15) |=T15-T14 |
|16 |14 |=P$1&P$2*$O16 |=INDIRECT(P16) |=Q16-Q15 |=S$1&S$2*$O16 |=INDIRECT(S16) |=T16-T15 |
|17 |15 |=P$1&P$2*$O17 |=INDIRECT(P17) |=Q17-Q16 |=S$1&S$2*$O17 |=INDIRECT(S17) |=T17-T16 |
|18 |16 |=P$1&P$2*$O18 |=INDIRECT(P18) |=Q18-Q17 |=S$1&S$2*$O18 |=INDIRECT(S18) |=T18-T17 |
|19 |mean | | |=AVERAGE(R3:R18) | | |=AVERAGE(U3:U18) |
|20 |st.dev | | |=STDEV(R3:R18) | | |=STDEV(U3:U18) |
|21 |error | | |=R20/SQRT(2*($O$18-1)) | | |=U20/SQRT(2*($O$18-1)) |
|22 |sqrt(time| | |=SQRT(P2) | | |=SQRT(S2) |
| |) | | | | | | |

• Plot the standard deviations (row 20) against the square root of the time intervals (row 22).
• Take the data for the errors bars from row 21 ([pic] ) - the uncertainty on the standard deviation.)
• Add a trendline with equation and R2.
• For the adventurous student – what should be the predicted value for the slope of the trendline?
You can change the values of N in row 2 and see the changes that occur if you use random walks with other choices of the number of steps.

How to find normal distribution? Abstract. Predicting the radiation dose‒toxicity relationship is important for local tumor control and patients’ quality of life. Introduction. ... Results. ... Discussion. ... Methods. ... Acknowledgements. ... Author information. ... Ethics declarations. ... Additional information. ... Rights and permissions. ... More items...