Machine Learning Thesis Proposal

  • Ph.D. Student
  • Machine Learning Department
  • Carnegie Mellon University
Thesis Proposals

Probabilistic Single Cell Lineage Tracing

Cell lineage tracing is a long-standing open problem in biology. New technologies that can profile single-cells have been introduced in the last decade. These studies attempt to construct lineage relationships using time-series single-cell RNA sequencing (scRNA-Seq) data or by utilizing artificial mutations for marking cells. The former studies rely on pseudo-time ordering which suffer from shortcomings that can impact their accuracy. The latter often apply phylogeny-based methods which may lead to unnecessary branching that separates cells at the same biological stage. Additionally, there is no current method to combine single cell lineage trees of the same organism to form a single consensus tree.

In this thesis, we present a Continuous-State Hidden Markov Model (CSHMM) for reconstructing continuous single cell trajectories from time-series scRNA-Seq data. The model is then extended with regulatory information (CSHMM-TF) for improving lineage tracing. In addition, we propose another probabilistic method for reconstructing single cell lineage tree with both mutation and scRNA-Seq data and present some preliminary results. As part of this thesis we also plan to develop a method for constructing a general consensus tree from multiple cell lineage trees based on our probabilistic model. Finally, we apply CSHMM to a new dataset and show that it is capable of reconstructing lineage relationships and provides important novel insights for lung development.

Thesis Committee:
Ziv Bar-Joseph (Chair)
Jian Ma
Roni Rosenfeld
Darrell Kotton (Boston University)

Copy of Draft Proposal Document

For More Information, Please Contact: