# How to plot normal distribution

You have the right idea. This can be done systematically, comprehensively, and with relatively simple calculations. A graph of the results is called a *normal probability plot* (or sometimes a P-P plot). From it you can see *much* more detail than appears in other graphical representations, especially histograms. and with a little practice you can even learn to determine ways to re-express your data to make them closer to Normal in situations where that is warranted.

Here is an example:

Data are in column A (and named Data ). The rest is all calculation, although you can control the "hinge rank" value used to fit a reference line to the plot.

This plot is a scatterplot comparing the data to values that would be attained by numbers drawn independently from a standard Normal distribution. When the points line up along the diagonal, they are close to Normal; horizontal departures (along the data axis) indicate departures from normality. In this example the points are remarkably close to the reference line; the largest departure occurs at the highest value, which is about $1.5$ units to the left of the line. Thus we see at a glance that these data are very close to Normally distributed but perhaps have a slightly "light" right tail. This is perfectly fine for applying a t-test.

The comparison values on the vertical axis are computed in two steps. First each data value is ranked from $1$ through $n$, the amount of data (shown in the Count field in cell F22 ). These are proportionally converted to values in the range $0$ to $1$. A good formula to use is $\left(\text

to standard Normal values via the NormSInv function. These values appear in the Normal score column. The plot at the right is an XY scatterplot of Normal Score against the data. (In some references you will see the transpose of this plot, which perhaps is more natural, but Excel prefers to place the leftmost column on the horizontal axis and the rightmost column on the vertical axis, so I have let it do what it prefers.)

(As you can see, I simulated these data with independent random draws from a Normal distribution with mean $5$ and standard deviation $2$. It is therefore no surprise that the probability plot looks so nice.) There really are only two formulas to type in, which you propagate downward to match the data: they appear in cells B2:C2 and rely on the Count value computed in cell F2. That's really all there is to it, apart from the plotting.

The rest of this sheet is not necessary but it's helpful for judging the plot: it provides a robust estimate of a reference line. This is done by picking two points equally far in from the left and right of the plot and connecting them with a line. In the example these points are the third lowest and third highest, as determined by the $3$ in the Hinge Rank cell, F3. As a bonus, its slope and intercept are robust estimates of the standard deviation and mean of the data, respectively.

To plot the reference line, two extreme points are computed and added to the plot: their calculation occurs in columns I:J. labeled X and Y .

Source: stats.stackexchange.com

Category: Forex