Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Survival analysis techniques for censored and truncated data dataset

25/11/2021 Client: muhammad11 Deadline: 2 Day

Statistics for Biology and Health Series Editors K. Dietz, M. Gail, K. Krickeberg, J. Samet, A. Tsiatis

Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo

SURVIVAL ANALYSIS

Techniques for Censored and

Truncated Data Second Edition

John P. Klein Medical College of Wisconsin

Melvin L. Moeschberger The Ohio State University Medical Center

With 97 Illustrations

1 Springer

John P. Klein Melvin L. Moeschberger Division of Biostatistics School of Public Health Medical College of Wisconsin Division of Epidemiology and Biometrics Milwaukee, WI 53226 The Ohio State University Medical Center USA Columbus, OH 43210

USA

Series Editors K. Dietz M. Gail Institut für Medizinische Biometrie National Cancer Institute Universität Tübingen Rockville, MD 20892 Westbahnhofstrasse 55 USA D-72070 Tübingen Germany

K. Krickeberg J. Samet Le Chatelet School of Public Health F-63270 Manglieu Department of Epidemiology France Johns Hopkins University

615 Wolfe St. Baltimore, MD 21205-2103 USA

A. Tsiatis Department of Statistics North Carolina State University Raleigh, NC 27695 USA

Library of Congress Cataloging-in-Publication Data Klein, John P., 1950–

Survival analysis : techniques for censored and truncated data / John P. Klein, Melvin L. Moeschberger. — 2nd ed.

p. cm. — (Statistics for biology and health) Includes bibliographical references and index. ISBN 0-387-95399-X (alk. paper)

1. Survival analysis (Biometry) I. Moeschberger, Melvin L. II. Title. III. Series. R853.S7 K535 2003 610 ′.7 ′27–dc21 2002026667

ISBN 0-387-95399-X Printed on acid-free paper.

© 2003, 1997 Springer-Verlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not especially identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed in the United States of America.

9 8 7 6 5 4 3 2 1 SPIN 10858633

www.springer-ny.com

Springer-Verlag New York Berlin Heidelberg

A member of BertelsmannSpringer Science�Business Media GmbH

Preface

The second edition contains some new material as well as solutions to the odd-numbered revised exercises. New material consists of a discus- sion of summary statistics for competing risks probabilities in Chapter 2 and the estimation process for these probabilities in Chapter 4. A new section on tests of the equality of survival curves at a fixed point in time is added in Chapter 7. In Chapter 8 an expanded discussion is pre- sented on how to code covariates and a new section on discretizing a continuous covariate is added. A new section on Lin and Ying’s additive hazards regression model is presented in Chapter 10. We now proceed to a general discussion of the usefulness of this book incorporating the new material with that of the first edition.

A problem frequently faced by applied statisticians is the analysis of time to event data. Examples of such data arise in diverse fields such as medicine, biology, public health, epidemiology, engineering, eco- nomics and demography. While the statistical tools we shall present are applicable to all these disciplines our focus is on applications of the techniques to biology and medicine. Here interest is, for example, on analyzing data on the time to death from a certain cause, dura- tion of response to treatment, time to recurrence of a disease, time to development of a disease, or simply time to death.

The analysis of survival experiments is complicated by issues of cen- soring, where an individual’s life length is known to occur only in a certain period of time, and by truncation, where individuals enter the study only if they survive a sufficient length of time or individuals are

v

vi Preface

included in the study only if the event has occurred by a given date. The use of counting process methodology has, in recent years, allowed for substantial advances in the statistical theory to account for censoring and truncation in survival experiments. The book by Andersen et al. (1993) provides an excellent survey of the mathematics of this theory. In this book we shall attempt to make these complex methods more accessible to applied researchers without an advanced mathematical background by presenting the essence of the statistical methods and illustrating these results in an applied framework. Our emphasis is on applying these techniques, as well as classical techniques not based on the counting process theory, to data rather than on the theoreti- cal development of these tools. Practical suggestions for implementing the various methods are set off in a series of practical notes at the end of each section. Technical details of the derivation of these tech- niques (which are helpful to the understanding of concepts, though not essential to using the methods of this book) are sketched in a series of theoretical notes at the end of each section or are separated into their own sections. Some more advanced topics, for which some additional mathematical sophistication is needed for their understanding or for which standard software is not available, are given in separate chapters or sections. These notes and advanced topics can be skipped without a loss of continuity.

We envision two complementary uses for this book. The first is as a reference book for investigators who find the need to analyze cen- sored or truncated life time data. The second use is as a textbook for a graduate level course in survival analysis. The minimum prerequisite for such course is a traditional course in statistical methodology. The material included in this book comes from our experience in teaching such a course for master’s level biostatistics students at The Ohio State University and at the Medical College of Wisconsin, as well as from our experience in consulting with investigators from The Ohio State Univer- sity, The University of Missouri, The Medical College of Wisconsin, The Oak Ridge National Laboratory, The National Center for Toxicological Research, and The International Bone Marrow Transplant Registry.

The book is divided into thirteen chapters that can be grouped into five major themes. The first theme introduces the reader to basic con- cepts and terminology. It consists of the first three chapters which deal with examples of typical data sets one may encounter in biomedical applications of this methodology, a discussion of the basic parameters to which inference is to be made, and a detailed discussion of censoring and truncation. New to the second edition is Section 2.7 that presents a discussion of summary statistics for competing risks probabilities. Sec- tion 3.6 gives a brief introduction to counting processes, and is included for those individuals with a minimal background in this area who wish to have a conceptual understanding of this methodology. This section can be omitted without jeopardizing the reader’s understanding of later sections of the book.

Preface vii

The second major theme is the estimation of summary survival statis- tics based on censored and/or truncated data. Chapter 4 discusses es- timation of the survival function, the cumulative hazard rate, and mea- sures of centrality such as the median and the mean. The construction of pointwise confidence intervals and confidence bands is presented. Here we focus on right censored as well as left truncated survival data since this type of data is most frequently encountered in applications. New to the second edition is a section dealing with estimation of competing risks probabilities. In Chapter 5 the estimation schemes are extended to other types of survival data. Here methods for double and interval censoring; right truncation; and grouped data are presented. Chapter 6 presents some additional selected topics in univariate estimation, in- cluding the construction of smoothed estimators of the hazard function, methods for adjusting survival estimates for a known standard mortality and Bayesian survival methods.

The third theme is hypothesis testing. Chapter 7 presents one-, two-, and more than two-sample tests based on comparing the integrated difference between the observed and expected hazard rate. These tests include the log rank test and the generalized Wilcoxon test. Tests for trend and stratified tests are also discussed. Also discussed are Renyi tests which are based on sequential evaluation of these test statistics and have greater power to detect crossing hazard rates. This chapter also presents some other censored data analogs of classical tests such as the Cramer–Von Mises test, the t test and median tests are presented. New to this second edition is a section on tests of the equality of survival curves at a fixed point in time.

The fourth theme, and perhaps the one most applicable to applied work, is regression analysis for censored and/or truncated data. Chap- ter 8 presents a detailed discussion of the proportional hazards model used most commonly in medical applications. New sections in this sec- ond edition include an expanded discussion of how to code covariates and a section on discretizing a continuous covariate. Recent advances in the methodology that allows for this model to be applied to left truncated data, provides the investigator with new regression diagnos- tics, suggests improved point and interval estimates of the predicted survival function, and makes more accessible techniques for handling time-dependent covariates (including tests of the proportionality as- sumption) and the synthesis of intermediate events in an analysis are discussed in Chapter 9.

Chapter 10 presents recent work on the nonparametric additive haz- ard regression model of Aalen (1989) and a new section on Lin and Ying’s (1994) additive hazards regression models. One of these models model may be the model of choice in situations where the proportional hazards model or a suitable modification of it is not applicable. Chapter 11 discusses a variety of residual plots one can make to check the fit of the Cox proportional hazards regression models. Chapter 12 discusses parametric models for the regression problem. Models presented in-

viii Preface

clude those available in most standard computer packages. Techniques for assessing the fit of these parametric models are also discussed.

The final theme is multivariate models for survival data. In Chapter 13, tests for association between event times, adjusted for covariates, are given. An introduction to estimation in a frailty or random effect model is presented. An alternative approach to adjusting for association between some individuals based on an analysis of an independent working model is also discussed.

There should be ample material in this book for a one or two semester course for graduate students. A basic one semester or one quarter course would cover the following sections:

Chapter 2 Chapter 3, Sections 1–5 Chapter 4 Chapter 7, Sections 1–6, 8 Chapter 8 Chapter 9, Sections 1–4 Chapter 11 Chapter 12

In such a course the outlines of theoretical development of the tech- niques, in the theoretical notes, would be omitted. Depending on the length of the course and the interest of the instructor, these details could be added if the material in section 3.6 were covered or additional topics from the remaining chapters could be added to this skeleton outline. Applied exercises are provided at the end of the chapters. Solutions to odd numbered exercises are new to the second edition. The data used in the examples and in most of the exercises is available from us at our Web site which is accessi- ble through the Springer Web site at http://www.springer-ny.com or http://www.biostat.mcw.edu/homepgs/klein/book.html.

Milwaukee, Wisconsin John P. Klein Columbus, Ohio Melvin L. Moeschberger

Contents

Preface v

Chapter 1 — Examples of Survival Data 1

1.1 Introduction 1

1.2 Remission Duration from a Clinical Trial for Acute Leukemia 2

1.3 Bone Marrow Transplantation for Leukemia 3

1.4 Times to Infection of Kidney Dialysis Patients 6

1.5 Times to Death for a Breast-Cancer Trial 7

1.6 Times to Infection for Burn Patients 8

1.7 Death Times of Kidney Transplant Patients 8

1.8 Death Times of Male Laryngeal Cancer Patients 9

1.9 Autologous and Allogeneic Bone Marrow Transplants 10

1.10 Bone Marrow Transplants for Hodgkin’s and Non-Hodgkin’s Lymphoma 11

1.11 Times to Death for Patients with Cancer of the Tongue 12

ix

x Contents

1.12 Times to Reinfection for Patients with Sexually Transmitted Diseases 13

1.13 Time to Hospitalized Pneumonia in Young Children 14

1.14 Times to Weaning of Breast-Fed Newborns 14

1.15 Death Times of Psychiatric Patients 15

1.16 Death Times of Elderly Residents of a Retirement Community 16

1.17 Time to First Use of Marijuana 17

1.18 Time to Cosmetic Deterioration of Breast Cancer Patients 18

1.19 Time to AIDS 19

Chapter 2 — Basic Quantities and Models 21

2.1 Introduction 21

2.2 The Survival Function 22

2.3 The Hazard Function 27

2.4 The Mean Residual Life Function and Median Life 32

2.5 Common Parametric Models for Survival Data 36

2.6 Regression Models for Survival Data 45

2.7 Models for Competing Risks 50

2.8 Exercises 57

Chapter 3 — Censoring and Truncation 63

3.1 Introduction 63

3.2 Right Censoring 64

3.3 Left or Interval Censoring 70

3.4 Truncation 72

3.5 Likelihood Construction for Censored and Truncated Data 74

3.6 Counting Processes 79

3.7 Exercises 87

Contents xi

Chapter 4 — Nonparametric Estimation of Basic Quantities for Right-Censored and Left-Truncated Data 91

4.1 Introduction 91

4.2 Estimators of the Survival and Cumulative Hazard Functions for Right-Censored Data 92

4.3 Pointwise Confidence Intervals for the Survival Function 104

4.4 Confidence Bands for the Survival Function 109

4.5 Point and Interval Estimates of the Mean and Median Survival Time 117

4.6 Estimators of the Survival Function for Left-Truncated and Right-Censored Data 123

4.7 Summary Curves for Competing Risks 127

4.8 Exercises 133

Chapter 5 — Estimation of Basic Quantities for Other Sampling Schemes 139

5.1 Introduction 139

5.2 Estimation of the Survival Function for Left, Double, and Interval Censoring 140

5.3 Estimation of the Survival Function for Right-Truncated Data 149

5.4 Estimation of Survival in the Cohort Life Table 152

5.5 Exercises 158

Chapter 6 — Topics in Univariate Estimation 165

6.1 Introduction 165

6.2 Estimating the Hazard Function 166

6.3 Estimation of Excess Mortality 177

6.4 Bayesian Nonparametric Methods 187

6.5 Exercises 198

xii Contents

Chapter 7 — Hypothesis Testing 201

7.1 Introduction 201

7.2 One-Sample Tests 202

7.3 Tests for Two or More Samples 205

7.4 Tests for Trend 216

7.5 Stratified Tests 219

7.6 Renyi Type Tests 223

7.7 Other Two-Sample Tests 227

7.8 Test Based on Differences in Outcome at a Fixed Point in Time 234

7.9 Exercises 238

Chapter 8 — Semiparametric Proportional Hazards Regression with Fixed Covariates 243

8.1 Introduction 243

8.2 Coding Covariates 246

8.3 Partial Likelihoods for Distinct-Event Time Data 253

8.4 Partial Likelihoods When Ties Are Present 259

8.5 Local Tests 263

8.6 Discretizing a Continuous Covariate 272

8.7 Model Building Using the Proportional Hazards Model 276

8.8 Estimation of the Survival Function 283

8.9 Exercises 287

Chapter 9 — Refinements of the Semiparametric Proportional Hazards Model 295

9.1 Introduction 295

9.2 Time-Dependent Covariates 297

9.3 Stratified Proportional Hazards Models 308

Contents xiii

9.4 Left Truncation 312

9.5 Synthesis of Time-varying Effects (Multistate Modeling) 314

9.6 Exercises 326

Chapter 10 — Additive Hazards Regression Models 329

10.1 Introduction 329

10.2 Aalen’s Nonparametric, Additive Hazard Model 330

10.3 Lin and Ying’s Additive Hazards Model 346

10.4 Exercises 351

Chapter 11 — Regression Diagnostics 353

11.1 Introduction 353

11.2 Cox–Snell Residuals for Assessing the Fit of a Cox Model 354

11.3 Determining the Functional Form of a Covariate: Martingale Residuals 359

11.4 Graphical Checks of the Proportional Hazards Assumption 363

11.5 Deviance Residuals 381

11.6 Checking the Influence of Individual Observations 385

11.7 Exercises 391

Chapter 12 — Inference for Parametric Regression Models 393

12.1 Introduction 393

12.2 Weibull Distribution 395

12.3 Log Logistic Distribution 401

12.4 Other Parametric Models 405

12.5 Diagnostic Methods for Parametric Models 409

12.6 Exercises 419

Chapter 13 — Multivariate Survival Analysis 425

13.1 Introduction 425

13.2 Score Test for Association 427

xiv Contents

13.3 Estimation for the Gamma Frailty Model 430

13.4 Marginal Model for Multivariate Survival 436

13.5 Exercises 438

Appendix A — Numerical Techniques for Maximization 443

A.1 Univariate Methods 443

A.2 Multivariate Methods 445

Appendix B — Large-Sample Tests Based on Likelihood Theory 449

Appendix C — Statistical Tables 455

C.1 Standard Normal Survival Function P [Z � z ] 456

C.2 Upper Percentiles of a Chi-Square Distribution 457

C.3a Confidence Coefficients c10(aL, aU) for 90% EP Confidence Bands 459

C.3b Confidence Coefficients c05(aL, aU) for 95% EP Confidence Bands 463

C.3c Confidence Coefficients c01(aL, aU) for 99% EP Confidence Bands 465

C.4a Confidence Coefficients k10(aL, aU) for 90% Hall–Wellner Confidence Bands 468

C.4b Confidence Coefficients k05(aL, aU) for 95% Hall–Wellner Confidence Bands 471

C.4c Confidence Coefficients k01(aL, aU) for 99% Hall–Wellner Confidence Bands 474

C.5 Survival Function of the Supremum of the Absolute Value of a Standard Brownian Motion Process over the Range 0 to 1 477

C.6 Survival Function of W � ∫ �

0 [B(t)] 2dt , where B(t) is

a Standard Brownian Motion 478

C.7 Upper Percentiles of R � ∫ k

0 |Bo(u)|du, where Bo(u) is a Brownian Bridge 479

Contents xv

Appendix D — Data on 137 Bone Marrow Transplant Patients 483

Appendix E — Selected Solutions to Exercises 489

Bibliography 515

Author Index 527

Subject Index 531

1 Examples of

Survival Data

1.1 Introduction

The problem of analyzing time to event data arises in a number of applied fields, such as medicine, biology, public health, epidemiology, engineering, economics, and demography. Although the statistical tools we shall present are applicable to all these disciplines, our focus is on applying the techniques to biology and medicine. In this chapter, we present some examples drawn from these fields that are used through- out the text to illustrate the statistical techniques we shall describe.

A common feature of these data sets is they contain either censored or truncated observations. Censored data arises when an individual’s life length is known to occur only in a certain period of time. Possible censoring schemes are right censoring, where all that is known is that the individual is still alive at a given time, left censoring when all that is known is that the individual has experienced the event of interest prior to the start of the study, or interval censoring, where the only informa- tion is that the event occurs within some interval. Truncation schemes are left truncation, where only individuals who survive a sufficient time are included in the sample and right truncation, where only individuals who have experienced the event by a specified time are included in the sample. The issues of censoring and truncation are defined more carefully in Chapter 3.

1

2 Chapter 1 Examples of Survival Data

1.2 Remission Duration from a Clinical Trial for Acute Leukemia

Freireich et al. (1963) report the results of a clinical trial of a drug 6-mercaptopurine (6-MP) versus a placebo in 42 children with acute leukemia. The trial was conducted at 11 American hospitals. Patients were selected who had a complete or partial remission of their leukemia induced by treatment with the drug prednisone. (A complete or partial remission means that either most or all signs of disease had disappeared from the bone marrow.) The trial was conducted by matching pairs of patients at a given hospital by remission status (complete or partial) and randomizing within the pair to either a 6-MP or placebo maintenance therapy. Patients were followed until their leukemia returned (relapse) or until the end of the study (in months). The data is reported in Table 1.1.

TABLE 1.1 Remission duration of 6-MP versus placebo in children with acute leukemia

Remission Status at Time to Relapse for Time to Relapse for Pair Randomization Placebo Patients 6 -MP Patients

1 Partial Remission 1 10 2 Complete Remission 22 7 3 Complete Remission 3 32�

4 Complete Remission 12 23 5 Complete Remission 8 22 6 Partial Remission 17 6 7 Complete Remission 2 16 8 Complete Remission 11 34�

9 Complete Remission 8 32�

10 Complete Remission 12 25�

11 Complete Remission 2 11�

12 Partial Remission 5 20�

13 Complete Remission 4 19�

14 Complete Remission 15 6 15 Complete Remission 8 17�

16 Partial Remission 23 35�

17 Partial Remission 5 6 18 Complete Remission 11 13 19 Complete Remission 4 9�

20 Complete Remission 1 6�

21 Complete Remission 8 10�

�Censored observation

1.3 Bone Marrow Transplantation for Leukemia 3

This data set is used in Chapter 4 to illustrate the calculation of the estimated probability of survival using the product-limit estimator, the calculation of the Nelson-Aalen estimator of the cumulative hazard function, and the calculation of the mean survival time, along with their standard errors. It is further used in section 6.4 to estimate the survival function using Bayesian approaches. Matched pairs tests for differences in treatment efficacy are performed using the stratified log rank test in section 7.5 and the stratified proportional hazards model in section 9.3.

1.3 Bone Marrow Transplantation for Leukemia

Bone marrow transplants are a standard treatment for acute leukemia. Recovery following bone marrow transplantation is a complex process. Prognosis for recovery may depend on risk factors known at the time of transplantation, such as patient and/or donor age and sex, the stage of initial disease, the time from diagnosis to transplantation, etc. The final prognosis may change as the patient’s posttransplantation history develops with the occurrence of events at random times during the recovery process, such as development of acute or chronic graft-versus- host disease (GVHD), return of the platelet count to normal levels, return of granulocytes to normal levels, or development of infections. Transplantation can be considered a failure when a patient’s leukemia returns (relapse) or when he or she dies while in remission (treatment related death).

Figure 1.1 shows a simplified diagram of a patient’s recovery process based on two intermediate events that may occur in the recovery pro- cess. These intermediate events are the possible development of acute GVHD that typically occurs within the first 100 days following trans- plantation and the recovery of the platelet count to a self-sustaining level � 40 � 109 � l (called platelet recovery in the sequel). Immediately following transplantation, patients have depressed platelet counts and are free of acute GVHD. At some point, they may develop acute GVHD or have their platelets recover at which time their prognosis (proba- bilities of treatment related death or relapse at some future time) may change. These events may occur in any order, or a patient may die or relapse without any of these events occurring. Patients may, then, experience the other event, which again modifies their prognosis, or they may die or relapse.

To illustrate this process we consider a multicenter trial of patients prepared for transplantation with a radiation-free conditioning regimen.

4 Chapter 1 Examples of Survival Data

acute gvhdplatelet recovery

relapse

death

t r a n s p l a n t

platelet recoveryacute gvhd

Figure 1.1 Recovery Process from a Bone Marrow Transplant

Details of the study are found in Copelan et al. (1991). The preparative regimen used in this study of allogeneic marrow transplants for patients with acute myeloctic leukemia (AML) and acute lymphoblastic leukemia (ALL) was a combination of 16 mg/kg of oral Busulfan (BU) and 120 mg/kg of intravenous cyclophosphamide (Cy). A total of 137 patients (99 AML, 38 ALL) were treated at one of four hospitals: 76 at The Ohio State University Hospitals (OSU) in Columbus; 21 at Hahnemann University (HU) in Philadelphia; 23 at St. Vincent’s Hospital (SVH) in Sydney Australia; and 17 at Alfred Hospital (AH) in Melbourne. The study consists of transplants conducted at these institutions from March 1, 1984, to June 30, 1989. The maximum follow-up was 7 years. There were 42 patients who relapsed and 41 who died while in remission. Twenty-six patients had an episode of acute GVHD, and 17 patients

1.3 Bone Marrow Transplantation for Leukemia 5

either relapsed or died in remission without their platelets returning to normal levels.

Several potential risk factors were measured at the time of trans- plantation. For each disease, patients were grouped into risk categories based on their status at the time of transplantation. These categories were as follows: ALL (38 patients), AML low-risk first remission (54 pa- tients), and AML high-risk second remission or untreated first relapse (15 patients) or second or greater relapse or never in remission (30 patients). Other risk factors measured at the time of transplantation included recipient and donor gender (80 and 88 males respectively), recipient and donor cytomegalovirus immune status (CMV) status (68 and 58 positive, respectively), recipient and donor age (ranges 7–52 and 2–56, respectively), waiting time from diagnosis to transplantation (range 0.8–87.2 months, mean 19.7 months), and, for AML patients, their French-American-British (FAB) classification based on standard morphological criteria. AML patients with an FAB classification of M4 or M5 (45/99 patients) were considered to have a possible elevated risk of relapse or treatment-related death. Finally, patients at the two hospitals in Australia (SVH and AH) were given a graft-versus-host prophylactic combining methotrexate (MTX) with cyclosporine and possibly methyl- prednisolone. Patients at the other hospitals were not given methotrex- ate but rather a combination of cyclosporine and methylprednisolone. The data is presented in Table D.1 of Appendix D.

This data set is used throughout the book to illustrate the methods presented. In Chapter 4, it is used to illustrate the product-limit esti- mator of the survival function and the Nelson–Aalen estimator of the cumulative hazard rate of treatment failure. Based on these statistics, pointwise confidence intervals and confidence bands for the survival function are constructed. The data is also used to illustrate point and interval estimation of summary survival parameters, such as the mean and median time to treatment failure in this chapter.

This data set is also used in Chapter 4 to illustrate summary probabil- ities for competing risks. The competing risks, where the occurrence of one event precludes the occurrence of the other event, in this example, are relapse and death.

In section 6.2, the data set is used to illustrate the construction of estimates of the hazard rate. These estimates are based on smoothing the crude estimates of the hazard rate obtained from the jumps of the Nelson–Aalen estimator found in Chapter 4 using a weighted average of these estimates in a small interval about the time of interest. The weights are chosen using a kernel weighting function.

In Chapter 7, this data is used to illustrate tests for the equality of K survival curves. Both stratified and unstratified tests are discussed.

In Chapter 8, the data is used to illustrate tests of the equality of K hazard rates adjusted for possible fixed-time confounders. A propor- tional hazards model is used to make this adjustment. Model building for this problem is illustrated. In Chapter 9, the models found in Chap-

6 Chapter 1 Examples of Survival Data

ter 8 are further refined to include covariates, whose values change over time, and to allow for stratified regression models. In Chapter 11, regression diagnostics for these models are presented.

1.4 Times to Infection of Kidney Dialysis Patients

In a study (Nahman et al., 1992) designed to assess the time to first exit- site infection (in months) in patients with renal insufficiency, 43 patients utilized a surgically placed catheter (Group 1), and 76 patients utilized a percutaneous placement of their catheter (Group 2). Cutaneous exit- site infection was defined as a painful cutaneous exit site and positive cultures, or peritonitis, defined as a presence of clinical symptoms, elevated peritoneal dialytic fluid, elevated white blood cell count (100 white blood cells � �l with �50% neutrophils), and positive peritoneal dialytic fluid cultures. The data appears in Table 1.2.

TABLE 1.2 Times to infection (in months) of kidney dialysis patients with different catheter- ization procedures

Surgically Placed Catheter

Infection Times: 1.5, 3.5, 4.5, 4.5, 5.5, 8.5, 8.5, 9.5, 10.5, 11.5, 15.5, 16.5, 18.5, 23.5, 26.5 Censored Observations: 2.5, 2.5, 3.5, 3.5, 3.5, 4.5, 5.5, 6.5, 6.5, 7.5, 7.5, 7.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 12.5, 13.5, 14.5, 14.5, 21.5, 21.5, 22.5, 22.5, 25.5, 27.5

Percutaneous Placed Catheter

Infection Times: 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 2.5, 2.5, 3.5, 6.5, 15.5 Censored Observations: 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3.5, 3.5, 3.5, 3.5, 3.5, 4.5, 4.5, 4.5, 5.5, 5.5, 5.5, 5.5, 5.5, 6.5, 7.5, 7.5, 7.5, 8.5, 8.5, 8.5, 9.5, 9.5, 10.5, 10.5, 10.5, 11.5, 11.5, 12.5, 12.5, 12.5, 12.5, 14.5, 14.5, 16.5, 16.5, 18.5, 19.5, 19.5, 19.5, 20.5, 22.5, 24.5, 25.5, 26.5, 26.5, 28.5

The data is used in section 7.3 to illustrate how the inference about the equality of two survival curves, based on a two-sample weighted, log- rank test, depends on the choice of the weight function. In section 7.7, it is used to illustrate the two-sample Cramer–von Mises test for censored data. In the context of the proportional hazards model, this data is used in Chapter 8 to illustrate the different methods of constructing the partial likelihoods and the subsequent testing of equality of the survival

1.5 Times to Death for a Breast-Cancer Trial 7

curves when there are ties present. Testing for proportional hazards is illustrated in section 9.2. The test reveals that a proportional hazards assumption for this data is not correct. A model with a time-varying, covariate effect is more appropriate, and in that section the optimal cutoff for “early” and “late” covariate effect on survival is found.

1.5 Times to Death for a Breast-Cancer Trial

In a study (Sedmak et al., 1989) designed to determine if female breast cancer patients, originally classified as lymph node negative by stan- dard light microscopy (SLM), could be more accurately classified by im- munohistochemical (IH) examination of their lymph nodes with an an- ticytokeratin monoclonal antibody cocktail, identical sections of lymph nodes were sequentially examined by SLM and IH. The significance of this study is that 16% of patients with negative axillary lymph nodes, by standard pathological examination, develop recurrent disease within 10 years. Forty-five female breast-cancer patients with negative axillary lymph nodes and a minimum 10-year follow-up were selected from The Ohio State University Hospitals Cancer Registry. Of the 45 patients, 9 were immunoperoxidase positive, and the remaining 36 remained neg- ative. Survival times (in months) for both groups of patients are given in Table 1.3 (� denotes a censored observation).

TABLE 1.3 Times to death (in months) for breast cancer patients with different immuno- histochemical responses

Immunoperoxidase Negative: 19, 25, 30, 34, 37, 46, 47, 51, 56, 57, 61, 66, 67, 74, 78, 86, 122�, 123�, 130�, 130�, 133�, 134�, 136�, 141�, 143� ,148�, 151�, 152�,153�,154�, 156�, 162�, 164�, 165�, 182�,189�, Immunoperoxidase Positive: 22, 23, 38, 42, 73, 77, 89, 115, 144�

�Censored observation

This data is used to show the construction of the likelihood function and in calculating a two-sample test based on proportional hazards with no ties with right-censored data in Chapter 8. It is also used in Chapter 10 to illustrate the least-squares estimation methodology in the context of the additive hazards model. In that chapter, we also used this data to illustrate estimation for an additive model with constant excess risk over time.

8 Chapter 1 Examples of Survival Data

1.6 Times to Infection for Burn Patients

In a study (Ichida et al., 1993) to evaluate a protocol change in disin- fectant practices in a large midwestern university medical center, 154 patient records and charts were reviewed. Infection of a burn wound is a common complication resulting in extended hospital stays and in the death of severely burned patients. Control of infection remains a prominent component of burn management. The purpose of this study is to compare a routine bathing care method (initial surface decontam- ination with 10% povidone-iodine followed with regular bathing with Dial soap) with a body-cleansing method using 4% chlorhexidine glu- conate. Medical records of patients treated during the 18-month study period provided information on burn wound infections and other med- ical information. The time until staphylococcus infection was recorded (in days), along with an indicator variable—whether or not an infec- tion had occurred. Other fixed covariates recorded were gender (22% female), race (88% white), severity of the burn as measured by per- centage of total surface area of body burned (average of 24.7% range 2–95%), burn site (45% on head, 23% on buttocks, 84% on trunk, 41% on upper legs, 31% on lower legs, or 29% in respiratory tract), and type of burn (6% chemical, 12% scald, 7% electric, or 75% flame). Two time- dependent covariates were recorded, namely, time to excision and time to prophylactic antibiotic treatment administered, along with the two corresponding indicator variables, namely, whether the patient’s wound had been excised (64%) and whether the patient had been treated with an antibiotic sometime during the course of the study (41%). Eighty-four patients were in the group which received the new bathing solution, chlorhexidine, and 70 patients served as the historical control group which received the routine bathing care, povidone-iodine. The data is available on the authors’ web site and is used in the exercises.

1.7 Death Times of Kidney Transplant Patients

Data on the time to death of 863 kidney transplant patients is available on the authors’ web site. All patients had their transplant performed at The Ohio State University Transplant Center during the period 1982– 1992. The maximum follow-up time for this study was 9.47 years. Pa- tients were censored if they moved from Columbus (lost to follow-up) or if they were alive on June 30, 1992. In the sample, there were 432

1.8 Death Times of Male Laryngeal Cancer Patients 9

white males, 92 black males, 280 white females, and 59 black females. Patient ages at transplant ranged from 9.5 months to 74.5 years with a mean age of 42.8 years. Seventy-three (16.9%) of the white males, 14 (15.2%) of the black males, 39 (13.9%) of the white females and 14 (23.7%) of the black females died prior to the end of the study.

In Chapter 6, the problem of estimating the hazard rate, using a kernel smoothing procedure, is discussed. In particular, the effect of changing the bandwidth and the choice of a kernel are considered. In Chapter 8 this data is also used to illustrate methods for discretizing a continuous covariate.

1.8 Death Times of Male Laryngeal Cancer Patients

Kardaun (1983) reports data on 90 males diagnosed with cancer of the larynx during the period 1970–1978 at a Dutch hospital. Times recorded are the intervals (in years) between first treatment and either death or the end of the study (January 1, 1983). Also recorded are the patient’s age at the time of diagnosis, the year of diagnosis, and the stage of the patient’s cancer. The four stages of disease in the study were based on the T.N.M. (primary tumor (T), nodal involvement (N) and distant metastasis (M) grading) classification used by the American Joint Committee for Cancer Staging (1972). The four groups are Stage I, T1N0M0 with 33 patients; Stage II, T2N0M0 with 17 patients; Stage III, T3N0M0 and TxN1M0, with 27 patients; x � 1, 2, or 3; and Stage IV, all other TNM combinations except TIS with 13 patients. The stages are ordered from least serious to most serious. The data is available on the authors’ web site.

In section 7.4, the data is used to illustrate a test for trend to con- firm the hypothesis that the higher the stage the greater the chance of dying. In Chapter 8, a global test for the effect of stage on survival is performed in the context of the proportional hazards model, and local tests are illustrated, after an adjustment for the patient’s age. An analy- sis of variance (ANOVA) table is presented to summarize the effects of stage and age on survival. Contrasts are used to test the hypothesis that linear combinations of stage effects are zero. The construction of con- fidence intervals for different linear combinations of the stage effects is illustrated. The concept of an interaction in a proportional hazards regression model is illustrated through a stage by age interaction fac- tor. The survival curve is estimated for each stage based on the Cox proportional hazards model.

10 Chapter 1 Examples of Survival Data

This data is also used in Chapter 10 to illustrate estimation methodol- ogy in the additive hazards model. In Chapter 12, this data set is used to illustrate the fit of parametric models, using the accelerated failure- time model. The goodness of fit of these models is also discussed. The log logistic model is used in section 12.5 to illustrate using deviance residuals.

1.9 Autologous and Allogeneic Bone Marrow Transplants

The data in Table 1.4 is a sample of 101 patients with advanced acute myelogenous leukemia reported to the International Bone Mar- row Transplant Registry. Fifty-one of these patients had received an autologous (auto) bone marrow transplant in which, after high doses of chemotherapy, their own marrow was reinfused to replace their de- stroyed immune system. Fifty patients had an allogeneic (allo) bone marrow transplant where marrow from an HLA (Histocompatibility Leukocyte Antigen)-matched sibling was used to replenish their im- mune systems.

An important question in bone marrow transplantation is the com- parison of the effectiveness of these two methods of transplant as mea-

TABLE 1.4 Leukemia free-survival times (in months) for Autologous and Allogeneic Trans- plants

The leukemia-free survival times for the 50 allo transplant patients were 0.030, 0.493, 0.855, 1.184, 1.283, 1.480, 1.776, 2.138, 2.500, 2.763, 2.993, 3.224, 3.421, 4.178, 4.441�, 5.691, 5.855�, 6.941�, 6.941, 7.993�, 8.882, 8.882, 9.145�, 11.480, 11.513, 12.105�, 12.796, 12.993�, 13.849�, 16.612�, 17.138�, 20.066, 20.329�, 22.368�, 26.776�, 28.717�, 28.717�, 32.928�, 33.783�, 34.211�, 34.770�, 39.539�, 41.118�, 45.033�, 46.053�, 46.941�, 48.289�, 57.401�, 58.322�, 60.625�; and, for the 51 auto patients, 0.658, 0.822, 1.414, 2.500, 3.322, 3.816, 4.737, 4.836�, 4.934, 5.033, 5.757, 5.855, 5.987, 6.151, 6.217, 6.447�, 8.651, 8.717, 9.441�, 10.329, 11.480, 12.007, 12.007�, 12.237, 12.401�, 13.059�, 14.474�, 15.000�, 15.461, 15.757, 16.480, 16.711, 17.204�, 17.237, 17.303�, 17.664�, 18.092, 18.092�, 18.750�, 20.625�, 23.158, 27.730�, 31.184�, 32.434�, 35.921�, 42.237�, 44.638�, 46.480�, 47.467�, 48.322�, 56.086.

As usual, � denotes a censored observation.

1.10 Bone Marrow Transplants for Hodgkin’s and Non-Hodgkin’s Lymphoma 11

sured by the length of patients’ leukemia-free survival, the length of time they are alive, and how long they remain free of disease after their transplants. In Chapter 7, this comparison is made using a weighted log-rank test, and a censored data version of the median test and the t-test.

This data is used in Chapter 11 to illustrate graphical methods for checking model assumptions following a proportional hazards regres- sion analysis. In section 11.3, the martingale residuals are used to check overall model fit. In section 11.4, score residuals are used to check the proportional hazards assumption on disease-free survival for type of transplant. In section 11.5, the use of deviance residuals is illustrated for checking for outliers and, in section 11.6, the influence of individual observations is examined graphically.

In Chapter 12, this data set is used to illustrate the fit of parametric models using the accelerated failure-time model. The goodness of fit of these models is also discussed. Diagnostic plots for checking the fit of a parametric regression model using this data set are illustrated in section 12.5.

1.10 Bone Marrow Transplants for Hodgkin’s and Non-Hodgkin’s Lymphoma

The data in Table 1.5 was collected on 43 bone marrow transplant pa- tients at The Ohio State University Bone Marrow Transplant Unit. Details of this study can be found in Avalos et al. (1993). All patients had ei- ther Hodgkin’s disease (HOD) or non-Hodgkin’s lymphoma (NHL) and were given either an allogeneic (Allo) transplant from an HLA match sibling donor or an autogeneic (Auto) transplant; i.e., their own marrow was cleansed and returned to them after a high dose of chemotherapy. Also included are two possible explanatory variables, Karnofsky score at transplant and the waiting time in months from diagnosis to trans- plant. Of interest is a test of the null hypothesis of no difference in the leukemia-free survival rate between patients given an Allo or Auto transplant, adjusting for the patient’s disease state. This test, which re- quires stratification of the patient’s disease, is presented in section 7.5. We also use this data in section 11.3 to illustrate how the martingale residual can be used to determine the functional form of a covariate. The data, in Table 1.5, consists of the time on study for each patient, Ti , and the event indicator �i � 1 if dead or relapsed; 0 otherwise; and two covariates Z1, the pretransplant Karnofsky score and Z2, the waiting time to transplant.

12 Chapter 1 Examples of Survival Data

TABLE 1.5 Times to death or relapse (in days) for patients with bone marrow transplants for Hodgkin’s and non-Hodgkin’s lymphoma

Allo NHL Auto NHL Allo HOD Auto HOD

Ti �i Z1 Z2 Ti �i Z1 Z2 Ti �i Z1 Z2 Ti �i Z1 Z2

28 1 90 24 42 1 80 19 2 1 20 34 30 1 90 73 32 1 30 7 53 1 90 17 4 1 50 28 36 1 80 61 49 1 40 8 57 1 30 9 72 1 80 59 41 1 70 34 84 1 60 10 63 1 60 13 77 1 60 102 52 1 60 18

357 1 70 42 81 1 50 12 79 1 70 71 62 1 90 40 933 0 90 9 140 1 100 11 108 1 70 65

1078 0 100 16 176 1 80 38 132 1 60 17 1183 0 90 16 210 0 90 16 180 0 100 61 1560 0 80 20 252 1 90 21 307 0 100 24 2114 0 80 27 476 0 90 24 406 0 100 48 2144 0 90 5 524 1 90 39 446 0 100 52

1037 0 90 84 484 0 90 84 748 0 90 171

1290 0 90 20 1345 0 80 98

TABLE 1.6 Death times (in weeks) of patients with cancer of the tongue

Aneuploid Tumors: Death Times: 1, 3, 3, 4, 10, 13, 13, 16, 16, 24, 26, 27, 28, 30, 30, 32, 41, 51, 65, 67, 70, 72, 73, 77, 91, 93, 96, 100, 104, 157, 167 Censored Observations: 61, 74, 79, 80, 81, 87, 87, 88, 89, 93, 97, 101, 104, 108, 109, 120, 131, 150, 231, 240, 400 Diploid Tumors: Death Times: 1, 3, 4, 5, 5, 8, 12, 13, 18, 23, 26, 27, 30, 42, 56, 62, 69, 104, 104, 112, 129, 181 Censored Observations: 8, 67, 76, 104, 176, 231

1.11 Times to Death for Patients with Cancer of the Tongue

A study was conducted on the effects of ploidy on the prognosis of patients with cancers of the mouth. Patients were selected who had a paraffin-embedded sample of the cancerous tissue taken at the time

1.12 Times to Reinfection for Patients with Sexually Transmitted Diseases 13

of surgery. Follow-up survival data was obtained on each patient. The tissue samples were examined using a flow cytometer to determine if the tumor had an aneuploid (abnormal) or diploid (normal) DNA profile using a technique discussed in Sickle–Santanello et al. (1988). The data in Table 1.6 is on patients with cancer of the tongue. Times are in weeks.

The data is used in exercises.

1.12 Times to Reinfection for Patients with Sexually Transmitted Diseases

A major problem in certain subpopulations is the occurrence of sexu- ally transmitted diseases (STD). Even if one ignores the lethal effects of the Acquired Immune Deficiency Syndrome (AIDS), other sexually transmitted diseases still have a significant impact on the morbidity of the community. Two of these sexually transmitted diseases are the fo- cus of this investigation: gonorrhea and chlamydia. These diseases are of special interest because they are often asymptomatic in the female, and, if left untreated, can lead to complications including sterility.

Both of these diseases can be easily prevented and effectively treated. Therefore, it is a mystery why the incidence of these diseases remain high in several subpopulations. One theory is that a core group of individuals experience reinfections, thereby, serving as a natural reser- voir of the disease organism and transferring the disease to uninfected individuals.

The purpose of this study is to identify those factors which are related to time until reinfection by either gonorrhea or chlamydia, given an initial infection of gonorrhea or chlamydia. A sample of 877 individuals, with an initial diagnosis of gonorrhea or chlamydia were followed for reinfection. In addition to the primary outcome variable just stated, an indicator variable which indicates whether a reinfection occurred was recorded. Demographic variables recorded were race (33% white, 67% black), marital status (7% divorced/separated, 3% married and 90% single), age of patient at initial infection (average age is 20.6 years with a range of 13–48 years), years of schooling (11.4 years with a range of 6–18 years), and type of initial infection (16% gonorrhea, 45% chlamydia and 39% both gonorrhea and chlamydia). Behavioral factors recorded at the examination, when the initial infection was diagnosed, were number of partners in the last 30 days (average is 1.27 with a range of 0–19), oral sex within past 12 months (33%), rectal sex within past 12 months (6%), and condom use (6% always, 58% sometimes, and 36%

14 Chapter 1 Examples of Survival Data

never). Symptoms noticed at time of initial infection were presence of abdominal pain (14%), sign of discharge (46%), sign of dysuria (13%), sign of itch (19%), sign of lesion (3%), sign of rash (3%), and sign of lymph involvement (1%). If the factors related to a greater risk of reinfection can be identified, then, interventions could be targeted to those individuals who are at greatest risk for reinfection. This, in turn, should reduce the size of the core group and, thereby, reduce the incidence of the diseases. The data for this study is available on our web site.

This data is used in the exercises.

1.13 Time to Hospitalized Pneumonia in Young Children

Data gathered from 3,470 annual personal interviews conducted for the National Longitudinal Survey of Youth (NLSY, 1995) from 1979 through 1986 was used to study whether the mother’s feeding choice (breast feeding vs. never breast fed) protected the infant against hospitalized pneumonia in the first year of life. Information obtained about the child included whether it had a normal birthweight, as defined by weighing at least 5.5 pounds (36%), race (56% white, 28% black, and 16% other), number of siblings (range 0–6), and age at which the child was hospi- talized for pneumonia, along with an indicator variable as to whether the child was hospitalized. Demographic characteristics of the mother, such as age (average is 21.64 years with a range of 14–29 years), years of schooling (average of 11.4 years), region of the country (15% North- east, 25% North central, 40% South, and 20% West), poverty (92%), and urban environment (76%). Health behavior measures during pregnancy, such as alcohol use (36%) and cigarette use (34%), were also recorded. The data for this study is available on our web site.

This data is used in the exercises.

1.14 Times to Weaning of Breast-Fed Newborns

The National Longitudinal Survey of Youth is a stratified random sample which was begun in 1979. Youths, aged 14 to 21 in 1979, have been interviewed yearly through 1988. Beginning in 1983, females in the survey were asked about any pregnancies that have occurred since they

1.15 Death Times of Psychiatric Patients 15

were last interviewed (pregnancies before 1983 were also documented). Questions regarding breast feeding are included in the questionnaire.

This data set consists of the information from 927 first-born children to mothers who chose to breast feed their children and who have complete information for all the variables of interest. The sample was restricted to children born after 1978 and whose gestation age was between 20 and 45 weeks. The year of birth restriction was included in an attempt to eliminate recall problems.

The response variable in the data set is duration of breast feeding in weeks, followed by an indicator of whether the breast feeding was completed (i.e., the infant is weaned). Explanatory variables for breast- feeding duration include race of mother (1 if white, 2 if black, 3 if other); poverty status indicator (1 if mother in poverty); smoking status of mother (1 if smoking at birth of child); alcohol-drinking status of mother (1 if drinking at birth of child); age of mother at child’s birth, year of child’s birth, education of mother (in years); and lack of prenatal care status (1 if mother sought prenatal care after third month or never sought prenatal care, 0 if mother sought prenatal care in first three months of pregnancy). The complete data for this study is available on our web site.

This data is used in section 5.4 to illustrate the construction of the cohort life table. In Chapter 8, it is used to show how to build a model where predicting the outcome is the main purpose, i.e., interest is in finding factors which contribute to the distribution of the time to wean- ing.

1.15 Death Times of Psychiatric Patients

Woolson (1981) has reported survival data on 26 psychiatric inpatients admitted to the University of Iowa hospitals during the years 1935–1948. This sample is part of a larger study of psychiatric inpatients discussed by Tsuang and Woolson (1977). Data for each patient consists of age at first admission to the hospital, sex, number of years of follow-up (years from admission to death or censoring) and patient status at the follow- up time. The data is given in Table 1.7. In section 6.3, the estimate of the relative mortality function and cumulative excess mortality of these patients, compared to the standard mortality rates of residents of Iowa in 1959, is considered. In section 7.2, this data is used to illustrate one- sample hypothesis tests. Here, a comparison of the survival experience of these 26 patients is made to the standard mortality of residents of Iowa to determine if psychiatric patients tend to have shorter lifetimes. It is used in Chapter 9 to illustrate left truncation in the context of proportional hazards models.

16 Chapter 1 Examples of Survival Data

TABLE 1.7 Survival data for psychiatric inpatients

Gender Age at Admission Time of Follow-up

Female 51 1 Female 58 1 Female 55 2 Female 28 22 Male 21 30�

Male 19 28 Female 25 32 Female 48 11 Female 47 14 Female 25 36�

Female 31 31�

Male 24 33�

Male 25 33�

Female 30 37�

Female 33 35�

Male 36 25 Male 30 31�

Male 41 22 Female 43 26 Female 45 24 Female 35 35�

Male 29 34�

Male 35 30�

Male 32 35 Female 36 40 Male 32 39�

�Censored observation

1.16 Death Times of Elderly Residents of a Retirement Community

Channing House is a retirement center located in Palo Alto, California. Data on ages at death of 462 individuals (97 males and 365 females) who were in residence during the period January 1964 to July 1975 has been reported by Hyde (1980). A distinctive feature of these individuals was that all were covered by a health care program provided by the center which allowed for easy access to medical care without any additional financial burden to the resident. The age in months when members of

1.17 Time to First Use of Marijuana 17

the community died or left the center and the age when individuals entered the community is available on the authors’ web site.

The life lengths in this data set are left truncated because an individ- ual must survive to a sufficient age to enter the retirement community. Individuals who die at an early age are excluded from the study. Ignor- ing this left truncation leads to the problem of length-biased sampling. The concept of left truncation and the bias induced into the estimation process by ignoring it is discussed in section 3.4.

This data will be used in section 4.6 to illustrate how one estimates the conditional survival function for left-truncated data. The data is used in section 7.3 to illustrate the comparison of two samples (male and female), when there is left truncation and right censoring employing the log-rank test, and in Chapter 9 employing the Cox proportional hazards model.

1.17 Time to First Use of Marijuana

Turnbull and Weiss (1978) report part of a study conducted at the Stanford-Palo Alto Peer Counseling Program (see Hamburg et al. [1975] for details of the study). In this study, 191 California high school boys were asked, “When did you first use marijuana?” The answers were the exact ages (uncensored observations); “I never used it,” which are right-censored observations at the boys’ current ages; or “I have used it but can not recall just when the first time was,” which is a left-censored observation (see section 3.3). Notice that a left-censored observation

TABLE 1.8 Marijuana use in high school boys

Number of Exact Number Who Have Yet Number Who Have Started Age Observations to Smoke Marijuana Smoking at an Earlier Age

10 4 0 0 11 12 0 0 12 19 2 0 13 24 15 1 14 20 24 2 15 13 18 3 16 3 14 2 17 1 6 3 18 0 0 1

�18 4 0 0

18 Chapter 1 Examples of Survival Data

tells us only that the event has occurred prior to the boy’s current age. The data is in Table 1.8.

This data is used in section 5.2 to illustrate the calculation of the sur- vival function for both left- and right-censored data, commonly referred to as doubly censored data.

1.18 Time to Cosmetic Deterioration of Breast Cancer Patients

Beadle et al. (1984a and b) report a retrospective study carried out to compare the cosmetic effects of radiotherapy alone versus radiotherapy and adjuvant chemotherapy on women with early breast cancer. The use of an excision biopsy, followed by radiation therapy, has been suggested as an alternative to mastectomy. This therapy preserves the breast and, hence, has the benefit of an improved cosmetic effect. The use of adjuvant chemotherapy is often indicated to prevent recurrence of the cancer, but there is some clinical evidence that it enhances the effects of radiation on normal tissue, thus, offsetting the cosmetic benefit of this procedure.

To compare the two treatment regimes, a retrospective study of 46 radiation only and 48 radiation plus chemotherapy patients was made. Patients were observed initially every 4–6 months, but, as their recovery progressed, the interval between visits lengthened. At each visit, the clinician recorded a measure of breast retraction on a three-point scale (none, moderate, severe). The event of interest was the time to first

TABLE 1.9 Time to cosmetic deterioration (in months) in breast cancer patients with two treatment regimens

Radiotherapy only: (0, 7]; (0, 8]; (0, 5]; (4, 11]; (5, 12]; (5, 11]; (6, 10]; (7, 16]; (7, 14]; (11, 15]; (11, 18]; �15; �17; (17, 25]; (17, 25]; �18; (19, 35]; (18, 26]; �22; �24; �24; (25, 37]; (26, 40]; (27, 34]; �32; �33; �34; (36, 44]; (36, 48]; �36; �36; (37, 44]; �37; �37; �37; �38; �40; �45; �46; �46; �46; �46; �46; �46; �46; �46. Radiotherapy and Chemotherapy: (0, 22]; (0, 5]; (4, 9]; (4, 8]; (5, 8]; (8, 12]; (8, 21]; (10, 35]; (10, 17]; (11, 13]; �11; (11, 17]; �11; (11, 20]; (12, 20]; �13; (13, 39]; �13; �13; (14, 17]; (14, 19]; (15, 22]; (16, 24]; (16, 20]; (16, 24]; (16, 60]; (17, 27]; (17, 23]; (17, 26]; (18, 25]; (18, 24]; (19, 32]; �21; (22, 32]; �23; (24, 31]; (24, 30]; (30, 34]; (30, 36]; �31; �32; (33, 40]; �34; �34; �35; (35, 39]; (44, 48]; �48.

(a, b]—interval in which deterioration took place.

1.19 Time to AIDS 19

appearance of moderate or severe breast retraction. Due to the fact that patients were observed only at these random times, the exact time of breast retraction is known only to fall in the interval between visits. This type of data is call interval-censored data (see section 3.3).

The data for the two groups is shown in Table 1.9. The data consists of the interval, in months, in which deterioration occurred or the last time a patient was seen without deterioration having yet occurred (right- censored observations). This data is used in section 5.2 to illustrate the computation of an estimate of the survival function based on interval- censored data.

1.19 Time to AIDS

Lagakos et al. (1988) report data on the infection and induction times for 258 adults and 37 children who were infected with the AIDS virus and developed AIDS by June 30, 1986. The data consists of the time in years, measured from April 1, 1978, when adults were infected by the virus from a contaminated blood transfusion, and the waiting time to development of AIDS, measured from the date of infection. For the pediatric population, children were infected in utero or at birth, and the infection time is the number of years from April 1, 1978 to birth. The data is in Table 1.10.

In this sampling scheme, only individuals who have developed AIDS prior to the end of the study period are included in the study. Infected individuals who have yet to develop AIDS are not included in the sample. This type of data is called right-truncated data (see section 3.4). Estimation of the survival function for this data with right-truncated data is discussed in section 5.3.

20 Chapter 1 Examples of Survival Data

TABLE 1.10 Induction times (in years) for AIDS in adults and children

Infection Child Induction Time Adult Induction Time Time

0.00 5 0.25 6.75 0.75 5, 5, 7.25 1.00 4.25, 5.75, 6.25, 6.5 5.5 1.25 4, 4.25, 4.75, 5.75 1.50 2.75, 3.75, 5, 5.5, 6.5 2.25 1.75 2.75, 3, 5.25, 5.25 2.00 2.25, 3, 4, 4.5, 4.75, 5, 5.25, 5.25, 5.5, 5.5, 6 2.25 3, 5.5 3 2.50 2.25, 2.25, 2.25, 2.25, 2.5, 2.75, 3, 3.25, 3.25,

4, 4, 4 2.75 1.25, 1.5, 2.5, 3, 3, 3.25, 3.75, 4.5, 4.5, 5, 5,

5.25, 5.25, 5.25, 5.25, 5.25 1

3.00 2, 3.25, 3.5, 3.75, 4, 4, 4.25, 4.25, 4.25, 4.75, 4.75, 4.75, 5

1.75

3.25 1.25, 1.75, 2, 2, 2.75, 3, 3, 3.5, 3.5, 4.25, 4.5 3.50 1.25, 2.25, 2.25, 2.5, 2.75, 2.75, 3, 3.25, 3.5,

3.5, 4, 4, 4.25, 4.5, 4.5 0.75

3.75 1.25, 1.75, 1.75, 2, 2.75, 3, 3, 3, 4, 4.25, 4.25 0.75, 1, 2.75, 3, 3.5, 4.25

4.00 1, 1.5, 1.5, 2, 2.25, 2.75, 3.5, 3.75, 3.75, 4 1 4.25 1.25, 1.5, 1.5, 2, 2, 2, 2.25, 2.5, 2.5, 2.5, 3,

3.5, 3.5 1.75

4.50 1, 1.5, 1.5, 1.5, 1.75, 2.25, 2.25, 2.5, 2.5, 2.5, 2.5, 2.75, 2.75, 2.75, 2.75, 3, 3, 3, 3.25, 3.25

3.25

4.75 1, 1.5, 1.5, 1.5, 1.75, 1.75, 2, 2.25, 2.75, 3, 3, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25

1, 2.25

5.00 0.5, 1.5, 1.5, 1.75, 2, 2.25, 2.25, 2.25, 2.5, 2.5, 3, 3, 3

0.5, 0.75, 1.5, 2.5

5.25 0.25, 0.25, 0.75, 0.75, 0.75, 1, 1, 1.25, 1.25, 1.5, 1.5, 1.5, 1.5, 2.25, 2.25, 2.5, 2.5, 2.75

0.25, 1, 1.5

5.50 1, 1, 1, , 1.25, 1.25, 1.75, 2, 2.25, 2.25, 2.5 .5, 1.5, 2.5 5.75 0.25, 0.75, 1, 1.5, 1.5, 1.5, 2, 2, 2.25 1.75 600 0.5, 0.75, 0.75, 0.75, 1, 1, 1, 1.25, 1.25, 1.5,

1.5, 1.75, 1.75, 1.75, 2 0.5, 1.25

6.25 0.75, 1, 1.25, 1.75, 1.75 0.5, 1.25 6.50 0.25, 0.25, 0.75, 1, 1.25, 1.5 0.75 6.75 0.75, 0.75, 0.75, 1, 1.25, 1.25, 1.25 0.5, 0.75 7.00 0.75 0.75 7.25 0.25 0.25

2 Basic Quantities

and Models

2.1 Introduction

In this chapter we consider the basic parameters used in modeling sur- vival data. We shall define these quantities and show how they are interrelated in sections 2.2–2.4. In section 2.5 some common paramet- ric models are discussed. The important application of regression to survival analysis is covered in section 2.6, where both parametric and semiparametric models are presented. Models for competing risks are discussed in section 2.7.

Let X be the time until some specified event. This event may be death, the appearance of a tumor, the development of some disease, recur- rence of a disease, equipment breakdown, cessation of breast feeding, and so forth. Furthermore, the event may be a good event, such as remission after some treatment, conception, cessation of smoking, and so forth. More precisely, in this chapter, X is a nonnegative random variable from a homogeneous population. Four functions characterize the distribution of X , namely, the survival function, which is the prob- ability of an individual surviving to time x ; the hazard rate (function), sometimes termed risk function, which is the chance an individual of age x experiences the event in the next instant in time; the probabil- ity density (or probability mass) function, which is the unconditional probability of the event’s occurring at time x ; and the mean residual life at time x , which is the mean time to the event of interest, given the event has not occurred at x . If we know any one of these four

21

22 Chapter 2 Basic Quantities and Models

functions, then the other three can be uniquely determined. In practice, these four functions, along with another useful quantity, the cumulative hazard function, are used to illustrate different aspects of the distribu- tion of X . In the competing risk context, the cause-specific hazard rate, which is the rate at which subjects who have yet to experience any of the competing risks are experiencing the i th competing cause of failure, is often used. This quantity and other competing risk quantities are discussed in detail in section 2.7. In Chapters 4–6, we shall see how these functions are estimated and how inferences are drawn about the survival (or failure) distribution.

2.2 The Survival Function The basic quantity employed to describe time-to-event phenomena is the survival function, the probability of an individual surviving beyond time x (experiencing the event after time x). It is defined as

S (x) � Pr (X � x). (2.2.1)

In the context of equipment or manufactured item failures, S (x) is referred to as the reliability function. If X is a continuous random variable, then, S (x) is a continuous, strictly decreasing function.

When X is a continuous random variable, the survival function is the complement of the cumulative distribution function, that is, S (x) � 1 � F (x), where F (x) � Pr (X � x). Also, the survival function is the integral of the probability density function, f (x), that is,

S (x) � Pr (X � x) � ∫ �

x f (t) dt. (2.2.2)

Thus,

f (x) � � dS (x)

dx .

Note that f (x) dx may be thought of as the “approximate” probability that the event will occur at time x and that f (x) is a nonnegative function with the area under f (x) being equal to one.

EXAMPLE 2.1 The survival function for the Weibull distribution, discussed in more de- tail in section 2.5, is S (x) � exp(��x�), � � 0, � � 0. The exponential distribution is a special case of the Weibull distribution when � � 1. Survival curves with a common median of 6.93 are exhibited in Figure 2.1 for � � 0.26328, � � 0.5; � � 0.1, � � 1; and � � 0.00208, � � 3.

2.2 The Survival Function 23

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Time

Su rv

iv al

P ro

ba bi

lit y

Figure 2.1 Weibull Survival functions for � � 0.5, � � 0.26328 ( ); � � 1.0, � � 0.1 ( ); � � 3.0, � � 0.00208 (------).

Many types of survival curves can be shown but the important point to note is that they all have the same basic properties. They are mono- tone, nonincreasing functions equal to one at zero and zero as the time approaches infinity. Their rate of decline, of course, varies according to the risk of experiencing the event at time x but it is difficult to deter- mine the essence of a failure pattern by simply looking at the survival curve. Nevertheless, this quantity continues to be a popular description of survival in the applied literature and can be very useful in compar- ing two or more mortality patterns. Next, we present one more survival curve, which will be discussed at greater length in the next section.

EXAMPLE 2.2 The U.S. Department of Health and Human Services publishes yearly survival curves for all causes of mortality in the United States and each of the fifty states by race and sex in their Vital Statistics of the United

24 Chapter 2 Basic Quantities and Models

TABLE 2.1 Survival Functions of U.S. Population By Race and Sex in 1989

White White Black Black White White Black Black Age Male Female Male Female Age Male Female Male Female

0 1.00000 1.00000 1.00000 1.00000 43 0.93771 0.97016 0.85917 0.93361 1 0.99092 0.99285 0.97996 0.98283 44 0.93477 0.96862 0.85163 0.92998 2 0.99024 0.99232 0.97881 0.98193 45 0.93161 0.96694 0.84377 0.92612 3 0.98975 0.99192 0.97792 0.98119 46 0.92820 0.96511 0.83559 0.92202 4 0.98937 0.99160 0.97722 0.98059 47 0.92450 0.96311 0.82707 0.91765 5 0.98905 0.99134 0.97664 0.98011 48 0.92050 0.96091 0.81814 0.91300 6 0.98877 0.99111 0.97615 0.97972 49 0.91617 0.95847 0.80871 0.90804 7 0.98850 0.99091 0.97571 0.97941 50 0.91148 0.95575 0.79870 0.90275 8 0.98825 0.99073 0.97532 0.97915 51 0.90639 0.95273 0.78808 0.89709 9 0.98802 0.99056 0.97499 0.97892 52 0.90086 0.94938 0.77685 0.89103

10 0.98782 0.99041 0.97472 0.97870 53 0.89480 0.94568 0.76503 0.88453 11 0.98765 0.99028 0.97449 0.97847 54 0.88810 0.94161 0.75268 0.87754 12 0.98748 0.99015 0.97425 0.97823 55 0.88068 0.93713 0.73983 0.87000 13 0.98724 0.98999 0.97392 0.97796 56 0.87250 0.93222 0.72649 0.86190 14 0.98686 0.98977 0.97339 0.97767 57 0.86352 0.92684 0.71262 0.85321 15 0.98628 0.98948 0.97258 0.97735 58 0.85370 0.92096 0.69817 0.84381 16 0.98547 0.98909 0.97145 0.97699 59 0.84299 0.91455 0.68308 0.83358 17 0.98445 0.98862 0.97002 0.97658 60 0.83135 0.90756 0.66730 0.82243 18 0.98326 0.98809 0.96829 0.97612 61 0.81873 0.89995 0.65083 0.81029 19 0.98197 0.98755 0.96628 0.97559 62 0.80511 0.89169 0.63368 0.79719 20 0.98063 0.98703 0.96403 0.97498 63 0.79052 0.88275 0.61584 0.78323 21 0.97924 0.98654 0.96151 0.97429 64 0.77501 0.87312 0.59732 0.76858 22 0.97780 0.98607 0.95873 0.97352 65 0.75860 0.86278 0.57813 0.75330 23 0.97633 0.98561 0.95575 0.97267 66 0.74131 0.85169 0.55829 0.73748 24 0.97483 0.98514 0.95267 0.97174 67 0.72309 0.83980 0.53783 0.72104 25 0.97332 0.98466 0.94954 0.97074 68 0.70383 0.82702 0.51679 0.70393 26 0.97181 0.98416 0.94639 0.96967 69 0.68339 0.81324 0.49520 0.68604 27 0.97029 0.98365 0.94319 0.96852 70 0.66166 0.79839 0.47312 0.66730 28 0.96876 0.98312 0.93989 0.96728 71 0.63865 0.78420 0.45058 0.64769 29 0.96719 0.98257 0.93642 0.96594 72 0.61441 0.76522 0.42765 0.62723 30 0.96557 0.98199 0.93273 0.96448 73 0.58897 0.74682 0.40442 0.60591 31 0.96390 0.98138 0.92881 0.96289 74 0.56238 0.72716 0.38100 0.58375 32 0.96217 0.98073 0.92466 0.96118 75 0.53470 0.70619 0.35749 0.56074 33 0.96038 0.98005 0.92024 0.95934 76 0.50601 0.68387 0.33397 0.53689 34 0.95852 0.97933 0.91551 0.95740 77 0.47641 0.66014 0.31050 0.51219 35 0.95659 0.97858 0.91044 0.95336 78 0.44604 0.63494 0.28713 0.48663 36 0.95457 0.97779 0.90501 0.95321 79 0.41503 0.60822 0.26391 0.46020 37 0.95245 0.97696 0.89922 0.95095 80 0.38355 0.57991 0.24091 0.43291 38 0.95024 0.97607 0.89312 0.94855 81 0.35178 0.54997 0.21819 0.40475 39 0.94794 0.97510 0.88677 0.94598 82 0.31991 0.51835 0.19583 0.37573 40 0.94555 0.97404 0.88021 0.94321 83 0.28816 0.48502 0.17392 0.34588 41 0.94307 0.97287 0.87344 0.94023 84 0.25677 0.44993 0.15257 0.31522 42 0.94047 0.97158 0.86643 0.93703 85 0.22599 0.41306 0.13191 0.28378

2.2 The Survival Function 25

States Series. In Table 2.1, we present the overall survival probabilities for males and females, by race, taken from the 1990 report (U.S. De- partment of Health and Human Services, 1990). Figure 2.2 shows the survival curves and allows a visual comparison of the curves. We can see that white females have the best survival probability, white males and black females are comparable in their survival probabilities, and black males have the worst survival.

Age in Years

Su rv

iv al

P ro

ba bi

lit y

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Figure 2.2 Survival Functions for all cause mortality for the US population in 1989. White males ( ); white females ( ); black males (------); black females (————).

When X is a discrete, random variable, different techniques are re- quired. Discrete, random variables in survival analyses arise due to rounding off measurements, grouping of failure times into intervals, or

26 Chapter 2 Basic Quantities and Models

when lifetimes refer to an integral number of units. Suppose that X can take on values xj , j � 1, 2, . . . with probability mass function (p.m.f.) p(xj) � Pr (X � xj ), j � 1, 2, . . . , where x1 x2 .

The survival function for a discrete random variable X is given by

S (x) � Pr (X � x) � ∑

xj �x

p(xj). (2.2.3)

EXAMPLE 2.3 Consider, for pedagogical purposes, the lifetime X , which has the p.m.f. p(xj) � Pr (X � j) � 1� 3, j � 1, 2, 3, a simple discrete uniform distribution. The corresponding survival function, plotted in Figure 2.3, is expressed by

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Instant Assignments
Top Academic Guru
Engineering Mentor
Top Writing Guru
George M.
Smart Accountants
Writer Writer Name Offer Chat
Instant Assignments

ONLINE

Instant Assignments

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$16 Chat With Writer
Top Academic Guru

ONLINE

Top Academic Guru

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$42 Chat With Writer
Engineering Mentor

ONLINE

Engineering Mentor

I reckon that I can perfectly carry this project for you! I am a research writer and have been writing academic papers, business reports, plans, literature review, reports and others for the past 1 decade.

$46 Chat With Writer
Top Writing Guru

ONLINE

Top Writing Guru

I reckon that I can perfectly carry this project for you! I am a research writer and have been writing academic papers, business reports, plans, literature review, reports and others for the past 1 decade.

$15 Chat With Writer
George M.

ONLINE

George M.

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$16 Chat With Writer
Smart Accountants

ONLINE

Smart Accountants

I am an academic and research writer with having an MBA degree in business and finance. I have written many business reports on several topics and am well aware of all academic referencing styles.

$21 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Law clerk cover letter - Ethical framework in community services - Most reactive of all elements - Beer lambert law ppt - Tessa furniture harvey norman - Hazardous industries and chemicals branch - 5 wire cdi wiring diagram business leaders health and safety forum - Excel chapter 9 - Chemical structure drawing software online free - Rabbit internal anatomy diagram - Hillbrow between heaven and hell - Cu no3 2 anion and cation - Lou gehrig's farewell to baseball address worksheet answers - Lawlink nsw local daily court lists - Complementary supplementary vertical adjacent and congruent angles - 5 characteristics of a disciple - Econ questions - Cohen theatre brief 11th edition - 4 week training program template - Pexa transfer to beneficiary - Great western hospital departments - 33 bunsen ave emerton - Business - Chapter 21 accounting for leases solutions - Sexuality Assignment - Explain how built in stabilizers work - Australian air force pilot uniform - Badminton doubles court lines - Mohammed mansour al rumaih trading - Physical Security - Ecstasy of fumbling meaning - Can you mix different size solar panels - Defect report template doc - Create risk management plan b - Second chance kids summary - Week 5 socioautobiography audio powerpoint - Emerging threats_7.2 - Swot analysis of nescafe - Comparative essay intro template - Loss minimization in insurance - Information System - Zopa com case study - Week 4 - Bas 3 psychologist report - Heat proof mat science - Excel qm for windows - Aaa configuration cisco asa - Business studies finance revision - Johns hopkins nursing evidence based practice model and guidelines 2007 - Brand identity prism template word - Incentive spirometry after abdominal surgery - Consider some of the main areas that have once been the topic of scrutiny and viewed as abnormal but - What is computer room management ethics - PAD 599 wk 3 - What is metadata which component of a dbms maintains metadata - Canadian electrical code questions - Sociology: Family Discussion - Non experimental research design ppt - Racq road side assist - Monument 14 savage drift summary - Decision trees show the logic structure in a - Telemedicine 1200 words due 9/29/2020 - Reading habits survey questions - 352 pine st yreka ca - Learning theory focuses on the thought processes that underlie learning - The awakening chapter 1 - Hairy bear big book activities - Business Management Memo - Future VA Meats Workplace - Cis report - Luxottica trading and finance ltd - Pennant hills day endoscopy centre - Project completion report template - A limited resource farmer __________ - Discussion Board Replies - What does manor mean in the middle ages - Social work - Poem - Http newengland 0163 portal ebs tafensw edu au - Audio visual commodity dhl - Biomechanics of freestyle swimming - Does george really want lennie to go away - Management Principles 2 and half pages full text, MLA - Physical assessment general survey - Nova making stuff stronger navy airlines worksheet answers - International university of japan - Patricia benner metaparadigm in nursing - Midterm Exam - The Law & Ethics of IA (Grduate Level) - Meiosis results in what - Hyperbole in julius caesar - 3 4x 6 7x simplify - The color off water - 30/5 gillott way st ives - Sport and health tenleytown - Writing Prompt Five - Discussion post - Question 2 Research - Ap psychology personality test - Ira glass celebrity series boston - Discuss how cluetrain manifesto is driving Web 2.0 applications - Due 10 Oct - APA - 4 pages - Plagiarism Free - Baron d holbach free will