Standard Computer Programs in Statistical Analysis of Survival in Childhood Lymphoblastic Leukemia

A material comprising all children in Sweden with acute lymphoblastic leukemia diagnosed in the years 1973-80 was analysed statistically. The total number of children was 505. Studies were made of 38 different variables, using frequency tables, cross tables, life table studies (1) and linear regression analysis according to Cox's method (2,4). Chi-square tests and log rank tests were included in the me-thods. The combination of life-table studies and linear regression analysis proved to be of value in assessing the significance of different parameters and treatment programs with regard to prognosis. The aim of this paper is to present a method for analysis of a patient material with use of standard computer programs. The re-sults of the total analysis will be published elsewhere (3).


INTRODUCTION
Acute lymphoblastic leukemia (ALL) is a malignant disease which can occur in children of all ages.With regard to age, white blood cell count (WBC) at diagnosis and the presence or absence of central nervous system (CNS) involvement and/or of a mediastinal tumor at diagnosis, the children were classified as suffereing from "high-risk leukemia" or "standard-risk leukemia" ( 3 ) .
The children first received induction treatment for six weeks and if this was successful they were classified as being in complete remission.When remission was not achieved, the children died as a result of the disease and/or the treatment.

~.
After remission, prophylactic radiation of the CNS was given, followed by maintenance therapy.Therapy was discontinued after three years in complete continuous remission (CCR) .
Relapses of the disease may occur during therapy or after discontinuation of therapy, in the bone marrow, CNS, testes, or other organs or a combination of these locations.Following relapse, a second remission may be induced and the child may survive or new relapses may terminate life.Death may also occur during a remission period from other causes than the disease, e.g.infection.
All analysed possible outcomes of the disease are presented schematically in Figure 1.For abbreviation, see text.also those who have been treated for a shorter time than three years.

MATERIAL AND ANALYTICAL PROCEDURES
In the years 1973-80, acute lymphoblastic leukemia was diagnosed in 505 children in Sweden.For these children, 38 clinical variables, for which information was taken from the medical records, were analysed.These 38 variables were divided into four groups:

The data set
In order to minimize the coding errors, a thorough examination of the data set comprising the following three steps was made: the data set was printed and compared with the medical records, frequence tables were used for checking missing values and outliers , cross tabulation was done to check that categorical responses were correctly classified.

Life tables and survival functions
In the commonly used method, with for example 5-year survival, information about patients participating in the studyfora shorter time than five years would not be utilized.The proportion of patients surviving 5 years would in this case be: W e r of patients alive after five years in the study p5 = N -r of patients participating in the study for at least five years The life table technique, on the other hand, utilizes more information by computing this proportion as a cumulative proportion of surviving children.In principle this can be written as follows: where p1 is the proportion surviving one year, p2 the proportion surviving two years provided that the patients survived the first year, and so on.This technique also provides a good idea of the course of the disease.The problem with different starting and follow-up times is solved by rescaling the time variables so that all the patients start at time 0.
The end point can be one of the following: 1 ) Dead (response), i.e. died during CCR or relapse.2) Withdrawn, i.e. alive in CCR at the end of the study (close date).3) Lost, i.e. patients lost at follow-up.
The hazard and the density function are two ways of getting ideas of parametric models describing the survival time.
The hazard function (failure rate), Xi is defined as:' fact an absolute instantaneous rate of death or relapse.The standard errors computed for the survival, hazard and density functions are used for computing confidence intervals andperforming tests.
Tables 1 and 2 and Figure 2 show the computer print out of the life table and survival analysis from the program BMDP, PIL, 1977 (1).The important function values in Table 1 are the CUMULATIVE SUR-VIVAL, which forms the basis of the survival curves in Figure 2. The table also gives the median estimate in the material, i.e. the time in months when half the patients have responded.

RESULTS Table
Table 2 gives a summary of the analyses presented in Tablelfor female and male patients separately.The test statistics inTable 2 represent the results of two non-parametric rank tests €or compar-ison of the cumulative survival functions.The low p values indicate a difference between the two survival functions.Fig. 2 is a graphical illustration of the cumulative proportions of females and males surviving in CCR as shown in Table 1 .
By using grouping variables, in this case sex, and comparing the times to response for different values of the grouping variables, good information on prognostic factors such as sex, age and WBC is obtained.A further possibility is to make the analysis below for two or more grouping variables, e.g.duration of remission for different risk groups of female and male patients.

70.
Fig. 2 Plot of the cumulative proportions of females (F) and males (M) surviving in CCR versus time in CCR in months.
The PHGLM Procedure ( 2 , 4 ) The Cox proportional hazard linear model to one dependent variable can determine the "best" variable to be added to a model in a model explaining time in CCR (TCCR), i.e. the variation in TCCR will be explained by a set of explanatory variables.But as these variables sometimes explain the same variation (are correlated with each other), the strength of the different variables explaining TCCR will be obtained, provided that the other variables are in the model.
Table 3 is the computer print out taken from the last step in the PHGLM Procedure, SAS SUPPLEMENTAL LIBRARY USER'S GUIDE, 1 9 8 0 ( 4 ) .In the print out BETA is comparable with parameters in a multiple linear regression model.CHI-SQUARE is a measure of the strength of the variable and the P value is the level of significance for the variable in the model.The D value gives a measure of the contribution of the variables explaining the variation in TCCR.
The solution gives an answer to the question which variables are the most important of those affecting duration in CCR and is also a measure of the strength of these variables.Explanations to the Table: The variable CNS is not included in the model because it does not contribute enough to the explanation.The higher the D value of a variable, the stronger the influence of this variable on the duration of CCR.

COMMENTS
The aim of this communication is to demonstrate in a practical way how we have used standard computer programs in the evaluation of the influence of different clinical parameters on the outcome of a malignant disease.
The most important factor in this kind of analysis is the quality of the selected material.This must be as complete as possible and selection should be avoided.If there is selection, its conse-quences must be analysed separately.Selection always implies a risk of irrelevant correlations, which can lead to wrong conclusions concerning the material.In our case there is no known selection, as the material includes all known cases of ALL in children in Sweden during the period in question.No child was lost at follow-up, which gives important strength to the material.
Frequency tables and cross tables analyse the material with regard to the distribution of different variables, e.g.age, sex, risk group, location of relapse, etc.The variables can be plotted against each other in a desired way.For instance the relation between duration of CCR and age or sex can easily be determined, but the tables are difficult to read and the results are not easy to evaluate.
Life table analyses ( 1 ) offer better possibilities than frequency tables and cross tables of studying variables affecting the duration of CCR versus clinical parameters and different treatment programs.The life table method gives a graphical illustration of time in CCR against parameters such as age, WBC, treatmentprograms and so on.It also permits mutual comparisons of subgroups in the material, e.g."standard risk patients" against "high risk patients" with regard to sex or age.These analyses will yield vadables explicitly describing the duration of CCR.The problemis that in one individual patient, different parameters often interact with regard to the outcome of the disease.It may thus be difficult to estimate the effect of a single parameter.We have used a linear regression analysis as described by Cox ( 2 ) to solve this problem.This method implies a listing of the internal order of the variables with regard to their influence on the outcome of the disease (Table 3 ) .
Thus we have evaluated the strength of various "high risk criteria" in childhood lymphoblastic leukemia.

Fig. 1
Fig.1 Possible outcomes of leukemia in children.
where q1 hi = the width of the i'th interval.The density function (probability of death or relapse per unit = probability of dying in interval i pi = 1qi time), fi, is defined as: where pi = the estimate of the cumulative proportion, surviving totheThe density is sometimes called the curve of death and is in beginning of the i'th interval.

1 .
Identification variables at diagnosis Name, month and year of birth, age, sex, hospital, home county, municipality and parish, date of diagnosis, presence or absence of CNS leukemia or mediastinal tumor, WBC, immunological classification, risk group, dominating symptom at diagnosis.Died during CCR, i.e. length of time from achieved remission to death during CCR.DREL-ONTHER = Died after RELapsing ON THERAPY, i.e. length of time from achieved remission to death for children relapsing during therapy.time from achieved remission to death for children relapsing after discontinuation of therapy.DREL-OFFTHER = Died after RELapsing OFF THERAPY, i.e. lengt of 4. Other variables REL, = Location of first relapse during therapy.REL2 Location of second relapse during therapy.REL-OFFTHER = Location of first relapse after discontinuation of therapy.CDCCR = Cause of death during CCR (e.g.infection).TREL1-REL2 = Length oft time in months between first and second relapse.Measurements on the 38 variables for the 505 children constituted the data set.
1.Example of survival analysis for female patients with achieved remission (computer print out).
LIFE TABLE AND SURVIVAL RNALYSIS.TIME VARIABLE IS TIDICCR.GROUPING VARIABLE IS KON.LEVEL IS F:

Table 2 .
Table summarizing the survival analyses.Test statistics for comparing the proportions of females and males in CCR.
= Number of patients who have responded, i.e. died in CCR of relapsed.CENSORED = Number of patients withdrawn, i.e. the number of patients in CCR at close date.

Table 3 .
Summary of the PHGLM Procedure (computer print out).