特邀专家
截止2018年8月10日,CNCP-2018 组委会已经邀请到来自海内外的25位学者做大会报告(按姓名笔划为序):
Quantitative proteomic and kinomic analysis of hepatocellular carcinoma tissues by SWATH-MS reveals complex reprogramming of cell metabolic pathways
Abstract: Hepatocellular carcinoma (HCC) is one of the most common cancers worldwide and remains one of the most prevailing and lethal malignancies due to low early diagnosis rate and poor prognosis. Understanding of the molecular pathogenesis of HCC will benefit to manage this disease. Protein kinases are highly tractable targets for the treatment of many cancers including HCC, due to their essential role in tumor cell proliferation and survival. Here, we will present the quantitative proteomic and kinomic study of hepatocellular carcinoma (HCC) tissues by SWATH mass spectrometry (SWATH-MS) approach. In proteomic study, 4216 proteins were reliably quantified and 338 were differentially expressed, with 191 proteins were up-regulated and 147 were down-regulated when expression level was compared between HCC tissues and adjacent non-tumorous tissues. To maximize kinome identification and quantification coverage, multiplexed kinase inhibitor beads were used to enrich kinases. In total, 93 kinases were significantly changed between HCC tissue and adjacent non-tumorous tissue. Functional analysis of these differential proteins and kinases revealed complex reprogramming of cell metabolic and signaling pathways of HCC. Integration of proteome and kinome alteration with overall survival analysis helped to understanding the pathogenesis of HCC and find therapeutic target candidates.
Key words: quantitative proteomics, quantitative kinomics, SWATH-MS, kinase inhibitor beads, hepatocellular carcinoma
Selenium-encoded chemical proteomics
Abstract: Selenium is one of the indispensable trace elements for human health and its dominant form in human body is selenocysteine, which serves as critical active-site residues in selenoproteins in regulating redox balance. Selenium excess and deficiency are both implicated with severe diseases. However, it remains challenging to profile selenoproteins by traditional shotgun proteomics tools due to its low abundance and versatile activity states. We have developed a computational program to detect the characteristic isotope envelope of selenium-containing peptides from complex proteomic data and use the information to guide the proteomic analysis in a targeted mode. We showed that the method can dramatically improve the sensitivity of detecting natural selenoproteins in cellular and tissue proteomes. Our selenium-encoded chemical proteomic strategy will be a novel tool and great resource for in-depth proteomic analysis of selenoproteins and other selenium-containing biomolecules, which will aid functional studies of selenoproteins and their implications in human health.
Key words: chemical proteomics; selenium; selenoprotein; activity-based probes; selenocysteine
Systematic survey of PRMT interactome reveals key roles of arginine methylation in global regulation of translation and splicing
Abstract: Arginine methylation, catalyzed by various members of protein arginine methyltransferase (PRMT) family, is increasingly recognized as a widespread post-translational modification in human. Thousands of proteins are found to contain methylated arginine, and the functional consequences of several examples have been studied. However a systematic understanding of the catalytic network for each PRMT is unclear, limiting the global understanding of the biological roles of arginine methylation. Here we conducted a systematic identification of interacting proteins for all human PRMTs, and the resulting interactomes of PRMTs are significantly overlapped with the known proteins containing arginine methylation. We further studied the substrate specificity of each PRMT by identifying conserved motifs around the putative methylated arginine, and found high similarity among the putative methylation motifs of different PRMTs. Our results suggest that that arginine methylation is highly enriched in RNA binding proteins with functions in RNA splicing and translation. Consistently, inhibition of several PRMTs leads to global alteration of alternative splicing and inhibition of translation as judged by RNA-seq and polysomal profiling. Collectively, this study provides a global landscape of PMRT substrates and reveals new functions of arginine methylation in regulation of RNA processing and translation.
Characterization of oligomeric and nonnative proteins using hybrid mass spectrometric approaches
Abstract: Protein denaturation and oligomerization have attracted significant interests because of its pivotal role in human diseases and drug safety. To understand the molecular mechanisms of these processes, various non-native protein conformers and oligomers involved at different stages should be characterized in detail. However, the heterogeneity and transient nature displayed by the co-existing species pose much challenge. Here we developed hybrid approaches to allow characterization of such systems at multiple levels. These approaches include determination of protein-binding stoichiometries and complex abundances using native mass spectrometry (MS), evaluation of protein thermostability using temperature-controlled MS, conformer-specific characterization using hydrogen/deuterium exchange (HDX) MS, and disulfide mapping using top-down MS.
Key words: native MS; temperature-controlled MS; top-down; hydrogen/deuterium exchange (HDX)
Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis
Abstract: The big bang of the data produced by next-generation proteomics requires a powerful, integrated, online data processing and analyzing platform able to host millions of datasets and analyze hundreds and thousands of experiments simultaneously. However, such a platform has not yet been developed. In this correspondence, we report Firmiana (V1.0) (www.firmiana.org), a one-stop proteomic data processing and integrated omics analysis cloud platform that allows scientists to deposit MS raw files, perform proteome identification/quantification online, carry out bioinformatics analyses, extract knowledge, and visualize results using a biologist-friendly web interface in an automated, high-throughput fashion and without the need for programming expertise. As a big data platform, Firmiana 1.0 can connect to other repositories for data migration and to achieve the multi-omics integration of life sciences. The high-Firmiana transferring, and the finished modules have exhibited significant increase in speed and capability in the HPC environment. To date, Firmiana 1.0 has hosted, processed, and presented over 794M spectra from 18,337 experimental datasets (approximately 31,223 MS files), covering 39,795 gene products from 10 species. We envision that Firmiana 1.0 and its subsequent Firmiana 2.0 be released based on the HPC will provide an online resource for the proteomics and biology community to take advantage of proteomics in the big data era, bridging proteomics to biology and medicine.
Quantifying the Genetic Control of Disease Proteotypes by Data Independent Acquisition
Abstract: The central dogma describes the flow of genetic information from DNA to RNA and protein. However, it makes no statement about the quantitative relationships between the transcript and protein concentrations due to the complex post-transcriptional mechanisms. Gene dosage imbalance is a general working hypothesis for studying the genetic diseases, such as human syndromes, because the DNA copy number variations (CNVs) are causal for many genetic diseases. However, recent important studies have showed that many CNVs resulting in mRNA changes only weakly influence corresponding protein abundances. Herein we show by proteome-wide abundance, post-translational modification, and turnover analysis that CNV’s substantially remodel the proteome. The sensitive, quantitatively accurate, and highly reproducible Data Independent Acquisition (DIA) methods such as SWATH mass spectrometry will be discussed as the main approach used in our studies. In the presentation I will discuss specific instances of CNV’s varying in the extent and magnitude of affected loci, and disease phenotypes. Using a cancer cell lineage model, we discovered organelle-specific proteome remodeling and buffering of protein abundance by protein complex stoichiometry, mediated by the adaptation of protein turnover rates. Our results indicate a striking degree of genomic variability, the rapid evolution of genomic variability in culture and a complex translation of genomic variability into expressed molecular and phenotypic variability. By associating quantitative proteotype and phenotype measurements we identified protein patterns that explained the varying response of the different cell lines studied to bacterial infection. Our data further highlights a proteomic understanding of “gene dosage imbalance” in genetic disorders and cancer cells.
Key words: Genetics, Copy number variation, Data Independent Acquisition, Proteotype
Proteome reveals highly activated protein synthesis and energy metabolism in hypopharyngeal glands of nurse bees enhance secretory performance of royal jelly
Abstract: Royal jelly (RJ) is a proteinaceous secretion of hypopharyngeal glands (HGs) in head of honeybee workers. A stock of honeybees, high RJ bees (RJBs), selected from Italian bees (ITBs) since 1980s’ in China, could produce 10 folds more RJ than that of ITBs. Knowledge on HGs of RJBs achieving the stronger performance in RJ production still remains elusive. The time-resolved HG proteome during the development of worker bees was investigated and compared between ITBs and RJBs, and functions of proteins related to RJ secretion were biologically confirmed. A hitherto in-depth of HG proteome coverage (2,701 protein groups) suggests the HGs of both stocks develop similar proteome settings to cement the young gland growth, produce RJ in nurse bees (NBs), and secrete convertase to ripen nectar into honey in forager bees (FBs). Particularly, in NBs, the HGs morphologically develop voluminous and numerous ovoid acini that is vital for RJ secretion. Moreover, in HGs of NBs, actin in cytoskeleton structure of secretary cells is well developed to provide a stabilizing framework of the canalicular membrane system during high exocytic activity. Notably, in RJB NBs, pathways related to protein synthesis and energy metabolism are functionally induced to boost stronger activity for RJ secretion comparing to ITBs. This is evidenced by the biochemical verifications of highly abundant ribosomal proteins in RJB NBs relative to those in ITBs. Furthermore, the strongly expressed RpL13, RpS4, RpS26, RpL28, malate dehydrogenase, UTP-glucose-1-phosphate uridylyltransferase, citrate synthase, Lamin Dm0, and hexamerin 110 in behavior manipulated NBs demonstrates their critical roles in regulation of RJ secretary activity in RJB NBs. Our data gain a novel understanding of the regulatory mechanism of HGs increasing RJ outputs of RJBs.
Key words: Royal jelly; Hypopharyngeal glands; Honeybees; Proteome
Native Top-Down Mass Spectrometry Meets Structural Biology: Enabling Tools to Link Sequence, Structure and Function of Marcomolecular Protein Complexes
Abstract: Mass spectrometry (MS) has become a crucial technique for the analysis of protein complexes. Native MS has traditionally examined protein subunit arrangements, while proteomics MS has focused on sequence identification. These two techniques are usually performed separately without harvesting the synergies between them. Here we describe the development of an integrated native MS and top-down proteomics method using Fourier transform ion cyclotron resonance (FTICR) to analyze macromolecular protein complexes in a single experiment. We address the challenges of employing top-down MS to directly fragment large macromolecular complexes in their native state in the gas phase, and we demonstrate the efficacy of this technique for direct acquirement of sequence to higher order structural information with several large complexes. We then summarize the unique functionalities of different activation/dissociation techniques. The platform expands the ability of MS to integrate proteomics and structural biology to provide insights into protein structure, function and regulation.
Key words: native top-down MS, FTICR, protein complexes
Discovery of Unexpected Protein PTMs by Quantitative Chemoproteomics
Abstract: We develop a generalized, quantitative chemoproteomic platform that can be broadly applicable to the analyses of biorthogonal-chemically engineered PTMs. A key feature of this method is the use of light and heavy-labeled Azido biotin reagents with a photocleavable linker, which provides a means not only to site-specifically and quantitatively compare abundances of the biorthogonal-chemically engineered protein PTMs, but also to minimize the false discovery rate of identification. In combination with a blind search tool like TagRecon that enables the identification of all possible mass shifts on a detected peptide sequence, our chemoproteomic platform can also be applied to discover unexpected PTMs labeled by activity-based ‘clickable’ probes. We have successfully use this strategy in characterization of several previously unknown PTMs, including 4-oxo-2-nonenal (an endogenous lipid electrophile) derived pyrrole-adduction and N-terminal formylation of protein degradants. We foresee that, in combination with new activity-based probes, our chemoproteomics-based strategy can be used to discover more unexpected PTMs with certain functional groups.
Key words: Chemoproteomics; Click chemistry; PTM; Blind search
Advancing Mass Spectrometry-based Large-Cohort Proteomics for Precision Medicine – An International Cancer Moonshot Multiple Site Study
Abstract: To successfully elevate discovery proteomics to translational research in the pipeline of precision medicine, large-cohort studies are essential in discovery and verification of protein biomarkers. Apart from sensitivity and specificity, to reproducibly and reliably quantify large numbers of proteins in different laboratories remain challenges. To address these challenges, we present a high-throughput and streamlined analytical workflow using high resolution MS1-based quantitative data-independent acquisition (HRMS1–DIA) mass spectrometry. The HRMS1-DIA workflow is standardized with well-defined experimental steps and systematically applied to a set of test samples. Our approach is to increase the chromatographic and mass spectral resolution, utilizing high resolution accurate mass MS1 data for quantitation and interspersed DIA for qualitative analysis, spiking quality control peptides, and creating in-depth spectral libraries for each proteome used in the experiment. Robust and reproducible chromatographic separation using a 60-minute capillary flow gradient enables high-throughput analysis. Besides setting FDR at 1%, a roll-up statistic strategy was applied to improve the quantitation precision. Robust and straightforward SOPs are created for the HRMS1-DIA workflow, to define the breadth of instrumental aspects such as chromatographic retention time stability, ionization spray stability, and product ion distribution overlap with a common spectral library. The study was benchmarked across multiple Cancer Moonshot sites worldwide utilizing identical instrument platforms, procedures, and software, and demonstrated to be stable in a 24/7 operation mode for 7 consecutive days. To ensure the reliability of the results, a QC sample is defined and routinely applied in the study. The resulting data were processed individually and combined to evaluate proteome coverage and quantitative capabilities. Reported metrics used to evaluate workflow performance include sub-proteome coverage and differential expression analysis, inter-days data reproducibility, as well as comparative data overlap among all different laboratories. In our initial data, at 1% FDR, > 5,000 proteins from > 40,000 peptides of the QC sample, as well as > 7,000 proteins from > 50,000 peptides of the mixed proteome sample, are consistently detected and reliably quantified across all site. The ratios of the mixed three proteomes accurately reflect the expected values.
Key words: Cancer moonshot multi-site study, Large-cohort study, High Resolution MS1 based Data-independent acquisition (HRMS1-DIA), Robustness, Throughput, Reproducibility, Scalability.
Glycotopes and Protein Glycosylation Analysis at Omics level - Less is More
Abstract: A majority of the key events taking place at the cell surface are either directly or indirectly mediated by glycosylation. In mammals, a limited range of sialylated, fucosylated and/or sulfated terminal glycotopes are carried on the termini of glycans extending from specific sites of membrane glycoproteins. Over the last 2 decades, advances in mass spectrometry have enabled glycosylation analysis at the omics level but such global undertakings often lack specific details of glycobiology relevance. Precision at the level of defining individual glycotopes and glycoforms is compromised by misguided quest for numbers. Given the non-template encoded nature of glycosylation, it is arguable that whether one needs to delineate every single glycoform and glycomic entity. We propose instead to develop and implement workflows that focus on in depth analysis of glycotopes and specific target glycoproteins. How different mass spectrometry techniques currently available can be utilized judiciously and in concert will be demonstrated in this presentation using specific case examples. These include complementary modes of fragmentation that can be acquired in parallel or in product ion dependent manner with increasing speed and sensitivity, the high resolution accurate mass capability at not only MS1 but also MS2/MS3 level, and the requisite informatics solutions. At all stages, we abide by the principle that which is less complicated is often better understood and more appreciated than what is more complicated. Homing in on the few glycotopes and makes such analysis amenable to all non-experts at high throughput confers a better mileage than aiming at unraveling the full glycomic complexity to no avail.
PTM-Invariant Peptide Identification
Abstract: Identifying peptides with unlimited post-translational modifications (PTMs) is a challenging task in the database searching based framework. The search space expands very fast with respect to the number of considered PTMs since the traditional database searching methods need to score all possible theoretical spectra with different PTM patterns. There are two approaches trying to address this problem. The first one uses the alignment-based algorithms to avoid enumerating all PTM patterns. Tools, such as MS-Alignment and MODa, use this approach. The second one compares each experimental spectrum with the theoretical spectra from the unmodified peptide sequences within a large precursor mass window. Tools, such as MSFragger and Open-pFind, use this approach. Although these tools use brilliant algorithms to achieve remarkable performance, there are still issues unaddressed.We proposed a PTM-invariant peptide identification (PIPI) concept in identifying peptides with unlimited PTMs. We transform the PTM-variant spectra and peptide sequences into PTM-invariant ones with a novel coding method. After such a transformation, we can perform a fast and simple database search just like there is no PTM but still get the correct results. Finally, we used data sets to demonstrate the power of our method.
Key words: open PTM identification, database search, peptide identification
Genome mining and structural characterization of sactipeptides, a class of ribosomally synthesized and posttranslationally modified natural products
Abstract: Ribosomally synthesized and posttranslationally modified peptides (RiPPs) are a major class of natural products revealed by the genome sequencing efforts of the past decades. These compounds are produced in all three domains of life and possess vast structural diversity. Sactipeptides are a small but growing class of RiPPs that contain highly unusual thioether bonds bridging the sulfur of a cysteine residue with an alph-carbon of an acceptor amino acid. These peptides are stable at high temperature and to proteolysis, and many of them exhibit potent activities against various multi-drug resistant bacteria, representing promising candidates for antibiotic development. However, structural characterization of sactipeptides is challenging due to their complicated ring systems. We here report genome mining of a series of novel sactipeptides from a bacillus strain. The ring structures of these sactipeptides were revealed by an algorithm named HSEE, which is based on the automated analysis of multiple high resolution mass spectra and evaluation of a collection of predicted hypothetical structures. The resulting structures were further validated by chemical derivatization studies.
A Computer Scientist's Journey of Opening up OpenSWATH
Abstract: Not knowing what a peptide is, two years ago we started the journey of trying to optimize, for both quality and performance, of a state-of-the-art peptide detection software, OpenSWATH. Like most interdisciplinary collaborations, the most interesting (and "painful")part of the project is to fill the gap between two different languages (computer science and biology) and to deal with the complicated structure of modern data-driven, biology software. In this talk, I will introduce our recent effort to simplify the mammoth to a "minimal core". By modeling the whole peptide detection process as a compressive sensing problem, we are able to achieve almost comparable detection quality as OpenSWATH, but with only less than 200 lines of MATLAB code. The simplicity brings unique opportunities for performance optimization and analysis of the system behavior. For example, under this framework, GPU-based acceleration becomes simple, and we are able to rigorously analyze the sensitivity of the detection quality with respect to lossy data reduction (e.g., quantization) of the input. We hope that this simple baseline can form the starting point and common basis of future collaborations between the computer science and the proteomics community.
Series of H37Rv-specific novel genes revealed by proteogenomics for rapid and accurate diagnosis of Mycobacterium tuberculosis complex in clinic
Abstract: Tuberculosis (TB) has been a major global health threat, there is an urgent need to develop rapid, non-culture-required biomarker for rapidly and accurately detecting TB. Mycobacterium tuberculosis (MTB) antigens in patient can provide direct evidence of TB, while common applied antigens hadn’t listed as available method owing to their homology with antigens of many nontuberculous mycobacteria (NTM). MTB-specific antigen will be make up for that deficit. Here, we refined H37Rv annotation by proteogenomics based on two public and our H37Rv MS datasets considering different growing period, protein resource, sample separation, digestion strategy, and so on. Total 50,820 peptides were identified, which mapping to 3,201 (79.67%) proteins among 4,018 annotated protein-coding genes in H37Rv (TubercuList, 2016). After novel peptides’ strict filtering by spectrum quality and peptide synthesis, we found 28 novel gene-coding N-terminus regions and verified 22 confirmed novel genes. We found half of novel genes behaved Mycobacterium tuberculosis complex (MTBC) specific, which were verified by PCR amplification and comparative genomics. This study describes a rational approach to searching and verifying H37Rv novel genes, and highlights the importance of accurate diagnosis of MTBC in clinic and selecting MTB-specific antigens.
Key words: Proteogenomics; Mycobacterium tuberculosis complex; Diagnosis; Specificity
Aspirin Reprograms Acetylome in Mouse
Abstract: Aspirin, the acetylsalicylic acid (ASA), is a most widely prescribed medication used to relieve pain, fever, and inflammation. Recent studies have revealed new benefits of aspirin, including reduction of heart attack or stroke, anti-cancer, and life-extension etc. Given profound effects of aspirin, the mechanism of action remains to be determined. Here, we used a deuterium-labeled aspirin together with mass spectrometry-based acetylome measurement, termed “DAcMS”, to unravel the landscape of in vivo protein acetylation as induced by aspirin. We propose an in-depth atlas of aspirin-induced acetylome that reveals novel mechanistic insights into the biological roles of aspirin.
High Efficiency Characterization of Protein Glycosylation by Combining Glycosite and Site-specific Glycoform Analysis
Abstract: It is of increasing interests to investigate the alterations of glycosylation in some malignant diseases. Yet, high micro-heterogeneity of protein glycosylation makes the comprehensive analysis of glycosylation still challenging. In our work, de-glycopeptides have been employed to enhance the identification efficiency of intact glycopeptides. In the approach, the intact glycopeptides were enriched by using HILIC and then divided into two parts with a certain ratio. One part was treated using PNGase F to remove the glycans, and the resulting de-glycopeptides were mixed with the other part of intact glycopeptides for LC-MS/MS analysis. In this method, HCD in stepped NCE mode was applied to efficiently fragment both de- and intact glycopeptides, which resulted in the acquisition of the MS2 spectra of intact glycopeptides and de-glycopeptides with rich fragments in the same LC-MS/MS run, simultaneously. Then, the two types of spectra of intact glycopeptides and de-glycopeptides were matched according to Y0 ions, and the intact glycopeptides could be identified with high efficiency. By using this strategy, the glycosites and site-specific glycoforms of human liver tissues between HCC and adjacent normal tissues were well characterized, simultaneously. Totally, 4491 site-specific glycoforms attached onto 786 glycosites could be quantified from the above two samples, including many significant changed glycosites and site-specific glycoforms. Additionally, this method was utilized to evaluate the aberrations of sialylated glycosylation between the HCC and healthy serum samples. And several significantly changed glycoforms attached onto immune proteins were obtained, and a small cohort study with 19 individual serum samples (10 HCC patients and 9 normal human volunteers) were performed using PRM based targeted analysis. Interestingly, several glycoforms were still obtained with the p-Value as low as 4.7X10-5. This platform provides a powerful tool for comprehensive analysis of glycosylation, which will have broad applications in glycoproteomics analysis.
Key words: Glycosylation, micro-heterogeneity, glycosites, site-specific glycoforms.
Demonstrating the relationship of epi-proteomics, whole-proteomics, and phospho-proteomics
Abstract: For the same samples, we can generate multiomics data, e.g. epi-proteomics (histones), whole-proteomics, phospho-proteomics. Generally, we do the research in each individual proteomics. What if we build the networks between these multiomics data? One breakthrough point is to look at the changes of histone modifications, the changes of regulated genes related to histone modifications, and the changes of phosphopeptides on these regulated genes. We can image that the data size will grow very fast, e.g. three omics mentioned above, seven time points in each omics, triplicates for each time points (3x7x3=63). After the data acquisition, the most important work is to correctly identify and quantify each data and build up the networks between these multiomics data. Once the workflow is finished, it will be very clear to find out the relationship between histone modifications and regulated genes. Then it can be used as a direction for drugs’ treatment. Therefore, the workflow will build up a bridge to connect foundation research and clinical application.
Key words: combination, epi-proteomics, whole-proteomics, phospho-proteomics, post-translational modifications
New sample preparation methods for “bottom-up” proteomic analysis
Abstract: Sample preparation is a critical step to ensure the sensitivity, accuracy, precision and throughput of both qualitative and quantitative proteome analysis. Bottom-up proteomics typically employs multistep sample preparation workflow, which is subjected to long-time manual operation, sample loss and contamination. To solve these problems, we developed a hollow fiber membrane-aided fully automated sample treatment (FAST) method, by which proteomic samples could be denatured, reduced, desalted and digested within 6-20 min via “one-stop” service. Through the on-line combination of FAST with nano-liquid chromatography-electrospray ionization tandem mass spectrometry (nLC-ESI-MS/MS), we further established a fully integrated platform for high-throughput proteome quantification.Furthermore, efficient extraction and processing of trace proteins from complex matrices is also a scientific issue of widespread concern in proteomics research. To meet these requirements, we developed a new proteomic sample preparation method, called as solid-phase alkylation (SPA), in which detergent-solubilized proteins were first reduced by tris (2-carboxyethyl) phosphine hydrochloride (TCEP), and then covalently bound onto the idoacetic acid brushes grafted on silica microspheres as protein extraction sorbents and micro-reactor, finally the detergent and other interferences in proteins were completely removed by extensive washing of proteins conjugated on the solid supports with methanol and buffer solution, followed by highly efficient on-beads protein digestion. Compared to conventional liquid-phase based sample preparation protocols, the solid-phase method did not add any additional sample preparation steps, and exhibited many advantages, allowing single-run analyses of trace samples with high throughput and deep proteome coverage.
Key words: Bottom-up Proteomics, Sample preparation, Fully automated sample treatment, Chemical immobilization, Trace samples
Ten computational challenges in SWATH/DIA-based high-throughput proteomics
Abstract: SWATH/DIA mass spectrometry enables generation of proteomics data sets at increasing pace, leading to a series of computational challenges, which are currently partially relieved by improving the computational pipelines developed initially for shotgun and SRM/MRM data sets. On the other hand, computer science is rapidly progressing. Data science and machine learning are advancing social, astronomical, and biomedical sciences. Data sets including microscopic images and recently genomic data are being analyzed by advanced computational approaches. To date, application of advanced computer science technologies to proteomics data analysis is still sparse, if any. Here, we try to build a bridge that translates the ten most pressing computational challenges of SWATH/DIA-based proteomics into computer science language and provide benchmark data sets to promote the advance of both fields.
Promise and Challenges in Developing Proteins in Extracellular Vesicles as Biomarkers
Abstract: The state of protein modification can be a key determinant of cellular physiology such as early stage cancer. Here we demonstrate, for the first time, a strategy to isolate and identify glyco- and phospho-proteins in extracellular vesicles (EVs) from human plasma as potential markers to differentiate disease from healthy states. We identified thousands of peptides with post-translational modifications (PTMs) in EVs isolated from small volumes of plasma samples. Using label-free quantitative proteomics, we identified proteins with PTMs in plasma EVs that are significantly higher in patients diagnosed with breast cancer as compared to healthy controls. Several novel biomarkers were validated in individual patients using Paralleled Reaction Monitoring for targeted quantitation. I will discuss the feasibility in developing proteins of PTMs in plasma EV as disease biomarkers that may transform cancer screening and monitoring and its enormous challenges leading to the promising land.
Native Protein Identification via In Cell Mass Spectrometry: Challenges and Opportunities
Abstract: To achieve single cell mass spectrometry, our group has worked on fundamental theory for ion suppression, development of novel ambient ionization mass spectrometry and the application for living cell analysis. The major contribution was to firstly overcome both matrix effect and ion suppression effect during conventional electrosrpay process, the advantages of ambient ionization methods were introduced to the more common used technique. With induced high voltages, inductive electrospray was developed, to meet the requirements of high sensitivity, high selectivity and high throughput for mass spectrometry. Finally, in cell mass spectrometry has enabled direct metabolite/protein measurements in single living cells. Now we are working on the possibility to accomplish proteomics at single living cells, which might preserve informations of protein/protein or protein/ligand interactions with their real status in living cells.
Key words: Single cell MS, Protein status in cell, Single cell proteomics
A journey to the unbiased label-free protein quantification by predicting peptide quantitative factors
Abstract: Mass spectrometry (MS) has become a prominent choice for large-scale absolute protein quantification, but its quantification accuracy still has substantial room for improvement. A crucial issue is the bias between the peptide MS intensity and the actual peptide abundance, i.e., the fact that peptides with equal abundance may have different MS intensities. This bias is mainly caused by the diverse physicochemical properties of peptides. Here, we investigated MS intensity measurement errors systematically and quantitatively using the natural properties of isotopic distributions. Then, we propose an algorithm for label-free absolute protein quantification, LFAQ, which can correct the biased MS intensities by using the predicted peptide quantitative factors for all identified peptides. When validated on datasets produced by different MS instruments and data acquisition modes, LFAQ presented accuracy and precision superior to those of existing methods. In particular, it reduced the quantification error by an average of 46% for low-abundance proteins. The advantages of LFAQ were further confirmed using the data from published papers.
Key words: mass spectrometry, label-free quantification, algorithm, quantitative proteomics
pGlycoNovo: a database-free algorithm for large-scale identification of intact glycopeptides
Abstract: Database search algorithms for the identification of site-specific glycopeptides has much improved in past years. In the database search, the glycan part of a glycopeptide is identified using a user-defined glycan composition list or a glycan structure database such as GlycomeDB, and the peptide part is identified using a protein sequence database. However, glycans are built by glycosyltransferase in a non-template driven fashion, the incompleteness of the glycan database has been reported by many publications. Hence, the glycan de novo sequencing algorithm for glycan part identification is needed. For an MS/MS spectrum, pGlycoNovo firstly searches and scores all possible glycan candidates with the de novo sequencing algorithm. Here we used a dynamic programming approach to speed up this scoring step. Then pGlycoNovo selects top-k ranked glycan candidates to search the corresponding peptide parts. The peptides are pre-indexed by their masses in pGlycoNovo, thus we can access a peptide by “precursor mass – glycan mass” within O(1) time. After peptide parts are searched, the glycan and peptide combinations of the spectrum can be identified by fine-tuning the glycopeptide score. N-glycopeptide samples from five mouse tissues including liver, brain, lung, heart, and kidney were enriched by HILIC and then analyzed by the Orbitrap Fusion individually. Comparing with the results of pGlyco 2.0, pGlycoNovo could cover ~80% glycopeptide-spectrum matches with ~99% consistency at a fine-tuned glycan match score. In N-glycopeptide samples from C.elegans and A.thaliana, pGlycoNovo could identify hundreds of N-glycopeptides, showing that pGlycoNovo is suitable for identification of N-glycopeptides from different species. We also showed that pGlycoNovo could be used to identify the modifications on the glycans in fission yeast samples. pGlycoNovo showed a good performance for the identification of O-glycopeptide compositions. By using the de novo algorithm, glycan composition of the glycan part of an O-glycopeptide can be identified quite well, but the glycosylation sites are not that easy to be determined in HCD-MS/MS spectra.
Key words: glycopeptide, glycan de novo, algorithm
A comprehensive investigation of data dependent acquisition workflow for deep coverage of proteome
Abstract: We made a comprehensive investigation of data dependent acquisition workflow for deep c of proteome based on shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS) from 2016 to 2017. In 2018, we performed a standard sample acquisition study to try to identify LC-MS/MS parameters leading to more identification results. We distributed the same HeLa samples to about 20 Chinese laboratories, which submitted their data sets acquired from different liquid chromatography mass spectrometry (LC-MS) platforms. We evaluated the correlation between the MS2 scanning capacity utilization and scan speed, and interpreted how the MS parameters impacted the performance of DDA to provide a suite of reasonable DDA settings. Following the given optimizing suggestions, several laboratories access almost 100% more proteins by using the same LC-MS platforms. Our centralized analysis show that improving chromatography and maximizing utilization of the MS/MS capacities could significantly improve sampling depth and identifications in proteomics.
Key words: Quality Control, Data Independent Acquisition