To review basic concepts on survival function, distribution function probability function, hazard function, cumulative hazard function and mean residual lifetime. Focus is placed upon most common parametric models such as Exponential, Weibull, Lognormal and Gamma models. Censoring, likelihood function, truncation and competing risks are also covered.
The survival or reliability function is defined as the probability of an individual surviving beyond time t. The distribution function is defined as the probability of an individual dies before time t. Types of failures:
- Increasing hazard function: Populations with a natural aging or wear. The distribution is called IFR (Increasing Failure Rate);
- Constant hazard function: Populations with no aging. The resulting distribution is the exponential;
- Decreasing hazard function: Populations with a very early likelihood of failure. Individuals get stronger with time. For example, some electronic components in solid state or patients after a transplant. The distribution is called DFR (Decreasing Failure Rate). We find a DFR pattern at the beginning of life of any living being;
- Hazard function with bathtub shape: decreasing at the beginning, constant during a long period of time and increasing at the end of life. Appropriated as a model for populations that are followed from birth. A lot of mortality data follows this kind of curves since at the beginning death are due to childhood disease, later on the failure rate keeps stable and finally we get an increasing failure due to the population aging;
- Hazard function with hump shape: it grows at the beginning and decreases after a period of time. Appropriated as a survival model after surgery since at the beginning there is a high risk of death due to infections and possible hemorrhages, and it decreases as the patient recovers.
In literature about failure times, some parametric models have been used repeatedly. The exponential or Weibull models, for example, are commonly used due to the simplified way that probabilities of distribution tails are expressed, and thus the simplicity of the survival and the hazard function. The lognormal and gamma models, although being less convenient due to computational difficulties, are also often applied. The most commonly used models in survival are Exponential, Weibull, Log-normal, Gamma, Log-logistic and Gompertz. In order to decide if any distribution families that we have studied is appropriated for our problem and our data, we can take into account the following points:
- its technical convenience to the statistical inference,
- the reasonable simplicity of the expressions of its survival or hazard function,
- the good behaviour of the hazard function,
- the value of the coefficient of variation and its analysis with respect to the value 1 as an indicator of exponentiality,
- the representation of the asymmetry taking into account that is equal to 2 in an exponential model, 0 in a normal model and 2/k in a gamma family,
- the behaviour of the survival function for large values of time, and
- the possible connections with a failure model.On the other hand, it is important to realize that in some cases we will not have enough data to validate the chosen law. In that cases it is very important the behaviour of the model for early values of time – for example, in industrial applications when studying guarantee periods-, and for large values of time – in many medical applications we will be more interested in the right tail of the distribution, corresponding to large survival times-.
Data is collected within a time window. Events occurring outside this time window are not observed. Individual times-to-event can be observed leading to an exact observation or not observed leading to a censored observation. We refer to censoring when we only know that the time to event has occurred in a certain interval of time. Censored data can be right-censored (Type I, Type II, Random), left-censored, interval-censored and doubly-censored. The likelihood function is written as the product of the contribution of each individual. The effect of the truncation’s condition is to filter the presence of certain individuals in such a way that the investigator are not aware of their existence. Hence, whatever inference is conditioned to that condition:
- It can be to the left when only individuals of a certain age enter the study (These are known as delayed entries),
- It can be to the right when only those individuals who have had the event are observed (That is, only those patients that fail are included in the study. For example, in a study about mortality which is only based on death certificates).
Sometimes individuals are at risk of other events which don’t allow to observe the event of interest:
- Failures due to other causes (competing risks analysis) (i.e., death due to causes which are different from the one we are interested in, or Secondary failure causing inactivity of a machine);
- Survival is modelled via the cause-specific hazard function. A few examples are provided.
Prof Carles Serrat, Universitat Politècnica de Catalunya
Survival Analysis and Discrete Event Simulation applied to Structural Reliability