Intego Group’s Senior Data Scientist, Andrey Rekalo spoke on July 10, 2018 at the 29th International Biometric Conference in Barcelona, Spain.
The topic of Andrey’s presentation was on Knowledge Extraction with Topology-based Clinical Data Mining
Clinical data mining refers to the application of data mining methods to clinical data.
While many computational techniques focus on univariate relationships between a specific clinical outcome and a few predictive variables, there is a lack of data integration and visualization tools that can improve our understanding of an entire dataset. Examining clinical data with a focus on a single outcome in isolation from other factors may lead to an incomplete, or even misleading, view of the increasingly complex data
In this paper, we describe a novel topology-based clinical data mining (TCDM) methodology to discover multivariate patterns in clinical trial outcomes. Our approach leverages the benefits of three independent tools:
Multiple Outcomes Analysis
Nonparametric Statistics
Topological Data Analysis
TCDM allows to construct comprehensive topological maps of complex data without first having to develop a model or hypothesis
A topological map provides a compressed, visual representation of a multidimensional set of interrelated clinical outcomes. They help identify and explore subgroups of patients with similar responses within each subgroup from a diverse study population.
The well-established techniques of nonparametric statistical analysis are used to find the predictive variables, e.g. patients’ demographic characteristics or medical history, associated with the subgroups.
The TCDM methodology was adopted to develop a prototype of a software platform that provides a computational environment in which researchers can perform data mining experiments on clinical datasets. We successfully applied the TCDM approach to several publicly available clinical studies
Standard statistical tools are typically used to confirm (or refute) the hypotheses generated by an investigator and, hence, rely on the researchers ability to develop a solid hypotheses. However, in the case of clinical trial datasets, the number of possible hypotheses to explore is very large, and it can be very difficult to select the most valuable. TCDM provides an integrated approach to data analysis and visualization which facilitates the extraction of new knowledge from clinical datasets.