

As with most PSI formats, gelML is designed to work with an accompanying controlled vocabulary (CV) called PSI-SEP. The two-dimensional gel spots are annotated with coordinates, shapes, density, and identifier information that can be referenced later in the MS.

It can encode basic sample origin information, how the gel is prepared, and details about how the gel bands or spots are excised from the gels. The gelML standard format is specifically designed to encode information related to one-dimensional and two-dimensional gel electrophoresis prior to MS. The PSI has developed a pair of formats for pre-MS sample handling, gelML ( 6) and sepML ( 7).

There are broadly two categories of such formats: one that describes sample handling, and one that contains formats for mass spectrometer target input. those not specific to one vendor) for information specifically focused on information prior to MS. There are relatively few common formats ( i.e. Following the overview of all the formats covered here, a brief discussion of topics related to the application and evolution of these formats is presented. Of course, many less common formats, especially simple tab-separated-value (TSV) formats of endless variety, cannot be covered here. The many formats cannot be described in great detail, but they are described very briefly, and relevant references or URLs are provided.
CDF FILES MASS SPEC SOFTWARE
Here we present an overview of the formats in common use in MS proteomics by popular software tools. The PSI aims to bring together representatives from commercial instrument manufacturers, software vendors, journal editors, and academic software developers and users to create common exchange formats and minimum information specifications that are then rigorously reviewed and approved as PSI standards. The largest and most active standards development group in MS proteomics is the Human Proteome Organization (HUPO) 1 Proteomics Standards Initiative (PSI) ( 5). Some formats have been developed by a single lab and are oriented around that lab's software, whereas other formats have emerged after a long process of collaborative development by a diverse group of contributors, often under the organization of a standards development group. Open formats are generally created by the developers of analysis software and databases in order to enable the exchange of data between tools. Formats not falling into either of these two categories are simply referred to as other formats in this review.Įach of the major instrument vendors uses its own proprietary formats, continually updating the formats to support new features of their instruments. Official standards are approved by a standards body, typically after a formal process of review and refinement, whereas de facto standards lack any official approval but are widely used by a large number of software tools and generally accepted as being a preferred mechanism of data exchange. Open formats can be further separated into three categories: official standards, de facto standards, and other formats. Open formats enable improved data sharing by allowing the data to be read by a variety of software tools without licensing restrictions. These formats may be broadly separated into open formats and proprietary formats. As a result of efforts to enable the movement of complex data types among analysis tools and the sharing of data and results with others in the community ( 4), a wide variety of data formats have emerged. There are a wide variety of software tools available to assist with this analysis, including open-source software as well as proprietary and commercial products. In nearly all of these high-throughput workflows, extensive analysis with software is required in order to translate the mass spectra into peptide identifications and perform abundance measurements ( 3). Individual species of peptide ions are isolated and fragmented to generate a fragment ion spectrum which may then be identified via software. Each fraction is then subjected to liquid chromatography (LC), ionized, and injected into the mass spectrometer. The most common workflow for tandem mass spectrometry (MS/MS) ( 2) generally begins with the isolation of proteins from an original sample, digestion of the proteins into peptides with an enzyme such as trypsin, and separation of the peptides into multiple fractions to reduce the complexity in each fraction. Mass spectrometry (MS) has accelerated the field of proteomics by enabling the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment ( 1).
