Next: Paste-Void Proximity Up: Spacing Distributions Previous: Spacing Distributions


Analytical Equations-Discrete Data

This numerical experiment is performed by collecting a finite number of spacing values and comparing this distribution of spacings to an analytic equation. The most straightforward way to do this is to use all of the data to create a discrete cumulative distribution function for the data, and compare percentiles of this function to the same percentiles computed for the analytic equations. As a simple demonstration of this procedure, 20 normally distributed random numbers with a mean of zero and a variance of one were generated by a computer program[9] and sorted from smallest to largest. These data, labeled x, are shown in the first column of Table 1. The adjacent column contains the rank of the sorted x values. A rank of 10 signifies that 10 of the 20 values are equal to or less than x. The percentiles of the x values are calculated by dividing the rank by the total number of variates, 20. This percentile, or relative rank, is labeled y and is shown in the last column of the table. This value represents the numerically determined CDF of the x values, and is plotted in Fig. 2. Since the normal distribution is symmetric about zero, both the mean and 50 th percentile are also zero. From Table 1, the 50 th percentile of the data is approximately -0.104, which differs from the true value of zero. The error is due to the small sample size. Repeating this experiment of 20 random variates and averaging the results would yield a more accurate estimate of the 50 th percentile.

  

x Rank y
-1.0730320 1 0.05
-0.8731335 2 0.10
-0.8409334 3 0.15
-0.7056352 4 0.20
-0.5254872 5 0.25
-0.4553703 6 0.30
-0.2430273 7 0.35
-0.1967312 8 0.40
-0.1494667 9 0.45
-0.1038428 10 0.50
0.0321516 11 0.55
0.0340721 12 0.60
0.2339296 13 0.65
0.2635231 14 0.70
0.2697059 15 0.75
0.4185625 16 0.80
0.5730767 17 0.85
0.8981169 18 0.90
1.1908520 19 0.95
1.5878820 20 1.00


Table 1: A sorted list of 20 random normal deviates x with mean zero and variance one, their rank, and their associated cumulative probability y.

  

Figure 2: The cumulative distribution function for the 20 normal random deviates shown in Table 1.

Another way to increase accuracy is to increase the number of random numbers. Fig. 3 shows the CDF created from a single experiment of 1000 normally distributed random deviates, again with mean zero and variance one. From these data, one could either estimate percentiles of the distribution, or estimate the PDF of the data. The 50 th percentile of the discrete CDF is 0.018, which is a more accurate estimate than for the 20 variates. An estimate of the PDF for these data was calculated by first extracting every 40 th value in the CDF data in order to reduce noise in the data. A one-sided finite difference [10] algorithm was used to calculate the slope at these points (the derivative of the CDF) and these values for the probability density function (PDF) are shown as filled circles in Fig. 3. For a comparison, the true Gaussian PDF is also shown in the figure. Even after smoothing the CDF by selecting every 40 th value, the resulting PDF is still quite noisy. Therefore, a comparison between measured data and analytic estimates is best done through estimating percentiles using the CDF.

  

Figure 3: The cumulative distribution function (CDF) for the 1000 normal random deviates and the estimated probability density function (solid circles) for every 40 th deviate. The true gaussian probability density function is also given as a reference.


Next: Paste-Void Proximity Up: Spacing Distributions Previous: Spacing Distributions