Sign up
Forgot password?
FAQ: Login

Dietrich S. Investigation of the machine learning method Random Survival Forest as an exploratory analysis tool for the identification of variables associated with disease risks in complex survival data

  • pdf file
  • size 3,99 MB
  • added by
  • info modified
Dietrich S. Investigation of the machine learning method Random Survival Forest as an exploratory analysis tool for the identification of variables associated with disease risks in complex survival data
Technical University of Berlin, 2016. — 116 p.
Nowadays, societies worldwide are confronted with an epidemic increase of incidence and prevalence of chronic diseases with dramatic consequences for affected individualsand considerable expenditure for health care systems. Many of the frequent chronic diseases, e.g. cardiovasculardiseases, diabetesmellitus type two (T2D)anddementia, are causedand promoted by a variety ofcellular,molecularand metabolicc conditions as well as environmental factors (1-7). Modern technologies and methods enable the exploratory acquisition of possible disease triggers by the generation of complex data which capture the entire spectrum of genes, RNAs, proteins as well as metabolites (8-11). Furthermore, in recent epidemiological studies in depth detailed information on anthropometric markers, diet, and environmental determinants are recorded which in addition increases the analysable data volume (12-15). However, the statistical analysis of such complex data to identify disease associated markers is a daunting challenge. In general, complex data consist of a variety of highly correlated variables causing problems of multiple testing and multicollinearity when using regression methods to identify disease markers (16-19). In order to address these well-known problems of statistical testing, several new variable selection methods have been developed (20-23). A promising method with respect to the statistical analysis of right censored survival data represents the machine learning method Random Survival Forest (RSF) (23). However, applications of RSF in epidemiological studies with complex data to identify biological markers promoting the development of chronic diseases are still a rarity. Against this background, the present thesis aimed to examine the applicabilityof RSF for survival analysis of complex data in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study. ARSF backward selection algorithm was developed for the purpose of variable selection and applied to identify metabolites associated with incident T2D and to identify food groups associated with incident hypertensionin the EPIC-Potsdam study.
  • Sign up or login using form at top of the page to download this file.
  • Sign up
Up