The log transformation is one of the most useful transformations in data analysis. It is used as a transformation to normality and as a variance stabilizing transformation. A log transformation is often used as part of exploratory data analysis in order to visualize (and later model) data that ranges over several orders of magnitude. Common examples include data on income, revenue, populations of cities, sizes of things, weights of things, and so forth.
In many cases, the variable of interest is positive and the log transformation is immediately applicable. However, some quantities (for example, profit) might contain a few negative values. How do you handle negative values if you want to log-transform the data?
Solution 1: Translate, then Transform
A common technique for handling negative values is to add a constant value to the data prior to applying the log transform. The transformation is therefore log(Y+a ) where a is the constant. Some people like to choose a so that min(Y+a ) is a very small positive
number (like 0.001). Others choose a so that min(Y+a ) = 1. For the latter choice, you can show that a = b – min(Y ), where b is either a small number or is 1.
In the SAS/IML language, this transformation is easily programmed in a single statement. The following example uses b=1 and calls the LOG10 function, but you can call LOG, the natural logarithm function, if you prefer.
Solution 2: Use Missing Values
The preceding statements initially define LogY to be a vector of missing values. The LOC function finds the indices of Y for which Y is positive. If at least one such index is found, those positive values are transformed and overwrite the missing values. A missing value remains in LogY for any element for which Y is negative.
You can use the previous technique for other functions that have restricted domains. For example, the same technique applies to the SQRT function and to inverse trigonometric functions such as ARSIN and ARCOS.