Applying 'NewSQL' technologies to scientific data to enable self-guided data discovery and analysis

Recorded On: 02/05/2018

Scientific data organization and analysis remains a significant impediment to drug discovery particularly in late-stage animal studies, despite years of effort and ongoing “data lake” projects.   Recent shifts to more heavily employ outsourced research have further fragmented data standards, increased reliance on ad hoc reports, and yielded single-use data.  We have developed a novel approach to data curation and aggregation that enables scientists to self-serve scientific data regardless of its originating source and deposit those data into self-guided and open-ended analysis.  Our approach relies on NoSQL database technologies to connect to structured existing data source(s) (like internally developed data lakes) or ad hoc sources like folders of Excel spreadsheets.  All the results, regardless of source, are indexed into a common data shape that drive performance and ensures a consistent user experience.  Discovered results are presented through RESTful web services or a “NewSQL” front-end.  During the course of the past year, we have refined this approach through a collaborative program with a large drug discovery company.  In this presentation, we will describe the motivation of our approach, show the results, and provide metrics for how much this novel approaches speed data discovery and utilization.

Daniel Weaver

PerkinElmer

Dr. Daniel C. Weaver is a Senior Product Manager for Research Informatics at PerkinElmer Informatics.  Prior to joining PerkinElmer, Dr. Weaver was the Director of Scientific Computing at Array Biopharma, Inc. in Boulder, Colorado, where he led all aspects of scientific software development and acquisition.  Over the course of the last decade, Dr. Weaver’s team delivered systems to support scientific endeavors ranging from target identification though drug discovery and into clinical development and translational medicine.  In a previous life, Dr. Weaver was the Lead Scientist for Gene Expression Analysis at Genomica.  He received his doctorate in developmental genetics from the University of Colorado, Boulder under the direction of Dr. William B. Wood where he studied patterning in early development. 

Components visible upon registration.