How Censoring is Shown in Real World Survival Curves
It is useful to see when patients are censored so as to get a feeling for the reliability of the curve over time. Survival curves often have a tick mark at each point where a patient was censored. They may show the number remaining "at risk" at several points instead. The number at risk at any point in the survival curve is the number of patients who are still alive and whose follow- up extends at least that far into the curve. Unfortunately, all too many survival curves show neither tick marks nor the number at risk, and so it may be impossible to get a feel for the reliability of the curve, especially towards the end. If the paper does give a minimum follow-up time, at least you know no patients were censored earlier than that.
Example of a Censored Curve with Tick Marks
This Group of Patients Has a Minimum Follow-Up of a Little Over a Year
The Mathematical Details
Last, and probably least, it's time for the math! The good news is that the Kaplan-Meier calculation is relatively straightforward. First, suppose that at after two years, the survival curve has reached 60%. Suppose that during the third year 10% of the surviving patients die. Then, at the start of the fourth year you can calculate that 90% of 60% = 54% of patients are still alive. If during the fourth year, again 10% of the patients who made it to the start of the year die, then at the end of the fourth year 90% of 54% = 48.6% will be alive. To generalize, if the time that the curve covers is broken up into intervals, then the percentage surviving at the start of any interval is equal to the probability of surviving each of the preceding intervals all multiplied together. In the example, the probability of survival at the start of the fifth year (the end of the fourth) is 0.6 * 0.9 * 0.9 = 0.48 (or 48%).
Of course, if at the start of the trial there are 100 patients and 48 are still alive at the start of the 5th year, you can obviously calculate the survival at that point in time as simply 48/100 * 100% = 48%! The above calculation using the probability of surviving all preceding time intervals seems to be rather the long way around! Rest assured there is
a method in the madness.
Remember, that the aim here is to find a way to account for censored patients, that is to "remove" them from the curve at the time their follow-up ends. With the product of intervals method it turns out to be simple. When a patient is censored, the number of patients "at risk" is reduced by one. So when a patient dies, the survival for the interval ending with his death is calculated according to the number remaining at risk at the time of death. So for example, if at the start of the interval 25 patients are alive, and during the interval, two patients are censored then at the time of death 23 patients were at risk (25-2 censored patients). So the chance of surviving the interval is 22/23. If two patients died at the same time, the chance of surviving the interval would be 21/23.
Now that we can calculate the probability of surviving each of the intervals taking the censoring into account, the calculation of the entire survival curve using the product of intervals method I started with is simple. To get the level of the curve at the end of each interval, multiply the chance of surviving all the preceding intervals together. Connect these points with like a staircase and you have your typical survival curve.
Now that you've seen how the calculation works, you can see why censoring patients results in slightly larger steps from then on. In the example, we reduced the number of patients at risk during an interval from 25 to 23 to account for two censored patients. If we had not had censored patients in the interval the death would have caused the curve to step down by 1/25. But because of the censoring, the curve stepped down by 1/23 instead. 1/23 is a little bigger than 1/25 so the step is a little bigger than it would have been. You can also see that if the number of patients still at risk is reasonably large, the effect on the step size will be quite small as in this case, and you won't be able to pick it up just by looking at the curve.
To keep things simple, this example will concern only seven patients - most any real world survival curve would have more. To begin, suppose the survivals of these seven patients (sorted by length) are:
1, 2+, 3+, 4, 5+, 10, 12+