Supervising the Unsupervised: Maximizing Biological Impact in Cellular Imaging
Recorded On: 02/05/2018
The exciting challenge of imaging data is the sheer number of options to recognize and retrieve meaningful content; while some turn to the ever-growing algorithmic tool-shed of machine learning, others utilize a priori knowledge of the biology at hand to arrive at the answer. With a balance between these two paramount, we implemented a hybrid workflow to re-analyse compound data in a phenotypic COPD screen. Allowing biological subject matter expertise to guide data-driven decisions, and vice-versa, we used a combination of knowledge-based, supervised, and unsupervised methods to de-convolute patient-derived macrophages into patient-specific subpopulations. At this level of granularity, we could discern previously masked effects of compounds on healthy and diseased cells, both in their physical properties and population makeup. These differences proved to be key when understanding the underlying phenotypic changes. Avoiding “black box” algorithms, instead favouring those which could be interrogated by biological and data scientists alike, led to faster and more relevant analysis cycles, and helped cement a “marriage” between statistical significance and biological relevance. Here, we discuss the analytical methodologies invoked to achieve this.
Degree in mathematics and computational biology from Cambridge University. Started at GSK October 2016.