** Next:** Paste-Void Proximity
**Up:** Spacing Distributions
** Previous:** Spacing Distributions

This numerical experiment is performed by collecting a finite number of spacing
values and comparing this distribution of spacings to an analytic
equation. The most straightforward way to do this is to use all of the
data to create a discrete cumulative distribution function for the data, and
compare percentiles of this function to the same percentiles computed for the
analytic equations. As a simple demonstration of this procedure,
20 normally distributed random numbers with a mean of zero and a variance of
one were generated by a computer program[9]
and sorted from smallest to largest. These
data, labeled *x*, are shown in the first column of Table 1.
The adjacent column contains the rank of the sorted *x* values. A rank
of 10 signifies that 10 of the 20 values are equal to or less than *x*.
The percentiles of the *x* values are calculated by
dividing the rank by the total number of variates, 20. This percentile,
or relative rank, is labeled *y* and is shown in the last column of the
table. This value represents the numerically determined CDF of the *x* values,
and is plotted in Fig. 2.
Since the normal distribution is symmetric about zero, both the mean
and 50 * th* percentile are also zero.
From Table 1,
the 50 * th* percentile of the data is approximately -0.104, which differs
from the true value of zero. The error is due to the small sample size.
Repeating this experiment of 20 random variates and averaging the results
would yield a more accurate estimate of the 50 * th* percentile.

x |
Rank | y |
---|---|---|

-1.0730320 | 1 | 0.05 |

-0.8731335 | 2 | 0.10 |

-0.8409334 | 3 | 0.15 |

-0.7056352 | 4 | 0.20 |

-0.5254872 | 5 | 0.25 |

-0.4553703 | 6 | 0.30 |

-0.2430273 | 7 | 0.35 |

-0.1967312 | 8 | 0.40 |

-0.1494667 | 9 | 0.45 |

-0.1038428 | 10 | 0.50 |

0.0321516 | 11 | 0.55 |

0.0340721 | 12 | 0.60 |

0.2339296 | 13 | 0.65 |

0.2635231 | 14 | 0.70 |

0.2697059 | 15 | 0.75 |

0.4185625 | 16 | 0.80 |

0.5730767 | 17 | 0.85 |

0.8981169 | 18 | 0.90 |

1.1908520 | 19 | 0.95 |

1.5878820 | 20 | 1.00 |

**Table 1:** A sorted list of 20 random normal deviates *x* with mean zero and
variance one, their rank, and their associated cumulative probability
*y*.

**Figure 2:** The cumulative distribution function for the 20 normal
random deviates shown in Table 1.

Another way to increase accuracy
is to increase the number of random numbers. Fig. 3
shows the CDF created from a single experiment of 1000 normally distributed
random deviates, again with mean zero and variance one. From these
data, one could either estimate percentiles of the distribution, or
estimate the PDF of the data.
The 50 * th* percentile of the discrete CDF is 0.018, which is a more
accurate estimate than for the 20 variates.
An estimate of the PDF for these data was calculated by first
extracting every 40 * th* value in the CDF data in order to reduce
noise in the data. A one-sided finite difference
[10]
algorithm was used to calculate the slope at these points (the derivative
of the CDF) and these values for the probability density function (PDF)
are shown as filled circles in
Fig. 3. For a comparison, the true Gaussian PDF is
also shown in the figure. Even after smoothing the CDF by selecting
every 40 * th* value, the resulting PDF is still quite noisy.
Therefore, a comparison
between measured data and analytic estimates
is best done through estimating percentiles using the CDF.

**Figure 3:** The cumulative distribution function (CDF)
for the 1000 normal random
deviates and
the estimated probability density function (solid circles) for every
40 * th* deviate.
The true gaussian probability density function is also
given as a reference.