Email address: email@example.com
A.B. 1976, University of California, Berkeley
Ph.D. 1981, Stanford University
W. L. Hubbell Professor of Chemistry
Hall-Fischer Professor of Chemistry
Technology development lies at the heart of the Smith Lab. We endeavor to anticipate and then create the new analytical tools needed by the biological researchers of tomorrow. Throughout our history, from DNA sequencing to advanced proteomics, we are always pushing the cutting edge. Our credo is to turn science fiction into science. Imagine something important that cannot yet be done, conceive of how to do it, and make it happen. We apply a broad array of techniques to solve many challenges including wet bench biochemistry and cell biology, analytical chemistry, separations, instrument development, instrument software control, bioinformatics, data mining and machine learning. Students trained in our lab develop skills at the bench and on the computer, working at the interface between biology and technology. Students also participate in collaborations with world experts in fields such as cancer, infectious disease, and diabetes.
Specific Research Areas
Biological Mass Spectrometry
Mass spectrometry is a core technology for many of the group’s projects. We have interest in developing new methods for preparing, fractionating, separating, and introducing samples into the mass spectrometer, to improve sensitivity for the most challenging analytical problems. We are creating new software control methods to improve the efficiency of the mass spectrometer operation and to improve proteomic coverage. We place a great emphasis on data analysis. We have developed robust software programs for bottom-up and top-down discovery proteomics, PTM-discovery, label-free quantification, O-glycopeptide discovery and localization, and data visualization.
Proteoforms and the Proteoform Pipeline
Our group introduced the concept of the proteoform to the proteomics community. The proteoform is a specific form of a protein with its posttranslational modifications. The proteoform is the actual biological actor in complex systems. Each distinct proteoform can have a unique biological function. There are several thousand different proteoforms in a typical complex biological sample. Identifying all of them presents one of the most difficult current challenges for proteomics. The fields very best attempts identify only about 10% coverage of the proteoforms in a sample, far fewer than what is possible at the peptide analysis level (referred to as “bottom-up” proteomics). Pursuit of complete proteoform coverage is a focus of our lab. To that end, we have created the Proteoform Pipeline, which touches every aspect of proteoform analysis from sample preparation to data analysis, and everything in between. We have dozens of specific active research projects available with impact throughout the pipeline, and we will be pursuing this for the next several years.
Bioinformatics and Data Mining
The data analysis phase of our mass spectrometry work looks at every aspect of the proteome. We have developed proteogenomic tools to reveal protein sequence alterations unique to the organism, including unknown amino acid substitutions and splice variants. We have tools to reveal a broad spectrum of unknown PTMs including glycosylations. And we have tools for quantitative data analysis and visualization. We are interested in all aspects of biological big data with an emphasis on improving the depth, the specificity, and the accuracy of reported results. A particular interest is expanding the predominant narrow view of proteomics, which is based only on mass spectrometric analysis, to include other data sources such as genomic and transcriptomic datasets, sample (patient)-specific databases, and network/pathway relationships. An integrating concept in our work is the proteoform family, the set of all of the various forms of a protein derived from a particular gene. We see future proteomics as being based upon the proteoform family concept; for example, 20,000 human proteoform families for the ~20,000 protein-coding genes in the genome.
We have a long-term interest in conceiving, creating, and validating powerful new tools for the
discovery, identification, and quantification of human and viral RNAs and for the comprehensive proteomic analysis of their protein interactomes. Our tools will have potential to advance a wide spectrum of biological analyses. This area involves significant cell biological and biochemical laboratory work including cell culture, tissue analysis, RNA-capture, multiplexing and mass spectrometry. There is also a significant informatics component including proteogenomics, PTM discovery, quantification and proteoform identification.
Awards and Honors
|Fellow, American Association for the Advancement of Science||2010|
|Pittsburgh Analytical Chemistry Award||2010|
|American Chemical Society Award in Chemical Instrumentation||2005|
|Association of Biomolecular Resource Facilities Award for Development of Automated DNA Sequencing||1997|
|Member, Faculty of 1000||2009|
|Proteoform: a single term describing protein complexity. Nature Methods. 2013;10:186-187..|
|Discovery of Chromatin-Associated Proteins via Sequence-Specific Capture and Mass Spectrometric Protein Identification in Saccharomyces cerevisiae. Journal of Proteome Research. 2014;13:3810-3825..|
|Enzymatic Fabrication of High-Density RNA Arrays. Angewandte Chemie-International Edition. 2014;53:13514-13517..|
|Global Post-Translational Modification Discovery. Journal of Proteome Research. 2017;16:1383-1390..|
|Elucidating Proteoform Families from Proteoform Intact-Mass and Lysine-Count Measurements. Journal of Proteome Research. 2016;15:1213-1221..|