I completed my PhD at the BC Cancer Agency in Dr. Steve Jones' group. During this time, I worked on developing machine learning methods for cancer diagnosis. Besides my thesis research I also collaborated with pathologists [1, 2], oncologists [3, 4], and research scientists [5, 6, 7] on an assortment of precision oncology projects covering machine learning in healthcare, genomic analysis of rare cancer-types, and treatment outcomes from precision oncology efforts.
Metastatic disease forms 90% of cancer associated deaths, and the identification of the site of origin of these cancers is an important first step in treatment. Cancer diagnosis usually comes from surveying the pathology of the tissue itself. However, pathology diagnostics can be confounded by complex presentation and pathologist experience, leading to misdiagnosis rates of up to 70% in certain disease etiologies.
My project was motivated by a case study from 2013, in which transcriptomic analysis (looking at the gene expression of specific genes being expressed in the cancer) led to re-alignment of diagnosis for a patient with advanced vulvar cancer.
In my thesis research I showed that we can extend this concept for many other treatment resistant cancers, using molecular measurements like DNA and RNA sequencing to provide a diagnosis. I developed and validated SCOPE, a computational cancer diagnosis tool that uses expression measurements of all the genes in a cancer. We showed that this machine learning based tool works well for the most part (it gets really confused in certain liver biopsies!). Recent findings from this approach show promise for providing biological context in rare cancer types. I am currently involved in tangential projects that have since arisen from this work, focusing more on subtyping and relevance to disease outcomes.
Even more exciting (atleast for me), we found that this method can be used to learn which biological changes are important for an individual cancer (manuscript in prep). We compared these automatically identified changes with what expert computational biologists found manually looking at the patient's tumour genome, and found a significant overlap. This approach automates the way we use sequencing data for diagnosing and understanding cancers, and expands our ability to understand rare and understudied cancers.
As a child of two educators, my world-view is centered around sharing knowledge and deriving happiness from seeing others get excited about science. During my PhD years I published collaborative science communication articles [1, 2, 3] and contributed to Science Borealis, a blog about Canadian science [4].
The supportive environment afforded me during graduate school also allowed me to explore my teaching interests properly. For over 4 years of my Phd, I led the course re-design of an introductory bioinformatics seminar course. Besides this I assisted in assessment re-design for a first-year graduate level statistics course and taught various seminars for graduate students interested in programming and data analysis (check out my teaching materials if you're interested in the materials!).
During my PhD years I was fortunate enough to collaborate with Martin Krzywinski and Dr. Naomi Altman on the Points of Significance series in Nature Methods. In our 3 article collaboration, we talked about Markov models, trying to distill the main concepts and take-aways into short primers that could be of use to a life sciences researcher with minimal background in mathematics.
As a wide-eyed undergraduate I was (and still am) fascinated by the ability to use sequencing data to understand disease mechanisms. My first proper taste of bioinformatics analysis, aka mucking with computer tools to do cool shit came as an undergrad researcher at Dr. Ryan Morin's group at SFU. I started out with a clear-enough research plan focused on circulating tumour DNA, but as these things invariably go for a beginner bioinformatician, I instead spent the bulk of my time in the Morin lab learning loads about using next generation sequencing data for cancer research, integrating different NGS data to understand a cancer type, and developing analysis pipelines that doesn't break if there is a version update for a single tool (no regrets!). I was also peripherally involved in ongoing work in the lab that aimed to make genomic analysis accessible to wet-lab folks in bioinformatics.
Exome sequencing allows us to sequence ~1% of the total genome. Circulating tumour DNA (ct-DNA) in turn, can comprise <1% of total DNA found in the bloodstream. Can we use exome sequencing to detect ct-DNA in patients where we have the genomic profile of the actual tumour for context?
The proposed plan was to use pre-existing copy number variant callers to detect copy number variants in bulk sequencing data of the tumour genome, and see if we could detect a similar 'scale-up' in the acquisition of reads in these regions from the ct-DNA sequencing. I never got around to the project itself, and a lot of cool research has happened in this space since then, but I'm still holding out hope to be able to participate in some circulating tumor DNA research projects one day!
Utility of machine learning approaches for cancer diagnosis and analysis from RNA sequencing
Joint bachelors in Computing Science, Molecular Biology and Biochemistry