Mass Spectrometry

Untargeted and Fully Automated Analysis of Multi-Selective Data

  • © Smith, C., et al. Therapeutic Drug Monitoring, 27 (6), 747-751. (2005)© Smith, C., et al. Therapeutic Drug Monitoring, 27 (6), 747-751. (2005)
  • © Smith, C., et al. Therapeutic Drug Monitoring, 27 (6), 747-751. (2005)
  • Fig. 1: Color-coded retention time deviation of different GC/MS chromatograms from cancer cell extracts.
  • Fig. 2: Cloud plot of signal differences  obtained from cancer cell extracts after treatment of cells with pyruvate (0.44 g/L)  or glucose (1 g/L).
Jeevan Sharma1 and Claudia Birkemeyer1
Data analysis of complex samples by hyphenated methods in mass spectrometry commonly requires tedious preparation and powerful data processing programs to obtain a meaningful result. A very useful help here can be XCMS [1], a free online platform enabling vendor-independent, high-throughput automated processing of datasets competitively performing peak picking, alignment and integration. Moreover, XCMS allows you to share, visualize and statistically analyze your data with tools such as heat maps, principal component analysis and many more.
Automated Untargeted Data 
Processing Is More Efficient than Targeted Analysis
Modern “omic” techniques deal with identification and quantitation of analytes present in a large number of complex samples; they anticipate detection and quantitation of a broad range of analytes with different physical and chemical properties in a wide range of concentrations. Therefore, the field of “omics” is dominated by methods applying two very powerful detection techniques, namely nuclear magnetic resonance (NMR) and mass spectrometry (MS), often coupled to high performance separation techniques such as gas (GC) or liquid chromatography (LC) [2]. Here, the superior resolution of the separation method combined with the specificity of MS or, less often, NMR detection can be employed to decipher a multicomponent mixture, and large data sets are generated by such coupled methods. 
In contrast to the challenge of analyzing all, sometimes unknown, components of a large number of samples in an untargeted, non-selective way, commonly a known, targeted number of analytes in a sample is analyzed using pre-assembled methods. Thus, using different untargeted analytical tools of vendor-supplied software, lists of target peaks are created and afterwards used to pick these known peaks in a chromatogram and quantify them based on their characteristic mass-to-charge ratio, m/z. Usually, this approach is suitable for evaluation of a limited number of compounds but can become very tedious when the number of analytes rises considerably or many unknown compounds need to be evaluated. 
Consequently, in the analysis of multi-component mixtures the so called untargeted approach is anticipated [3].

Here, peak picking is carried out based on non-selective peak finding and all found, unknown and known compounds, are searched for in a given data set und subsequently quantified. Several commercial software and shareware is available to follow this approach in an automated fashion promising a considerable ease of workload for such data sets. Platforms such as XCMS allow for such global profiling of the analytes in a mixture, and, moreover, provide useful tools for visualization of the large amount of data [4]. 

Peak Identification, Integration and Retention Time Alignment
The online version of XCMS ( (Scripps Centre for Metabolomics and Mass Spectrometry, USA) is a freely available platform for untargeted analysis of hyphenated mass spectrometry data; it is easy to apply and does not require any expertise in programming [5]. Although XCMS was originally created to be used for LC/MS, it is increasingly applied for GC/MS data as well. Moreover, though vendor supplied software is often restricted by a very limited number of suitable raw data formats, XCMS supports a range of file formats providing unified, high throughput data handling solutions.
Evaluation of data can be started after uploading the LC/MS or GC/MS files to the online server in one of the following formats:  netCDF, mzXML, mzData and Agilent .d folders [5]. Analysis parameters can be selected as default parameter sets typical for certain instrument combinations from the drop down menu of the parameter field, or set as appropriate and saved as a customized set. Parameters are e.g. the retention time format for data output (s/min) and the allowed retention time deviation, the ionization mode (polarity), mass resolution (FWHM) and accuracy (ppm), statistical tests to choose from (t-test, Mann-Whitney, Wilcoxon signed rank, post-hoc analysis) and corresponding thresholds. After data processing, an Email notification is sent to the user and the results can be viewed online (using the `View results´ tab) and/or downloaded in .zip format. 
As a first result, XCMS corrects deviations in the retention time of a peak among the chromatograms by peak alignment (fig. 1), i.e. aligning the retention time of a particular peak to a representative value [4]. 
While the retention time deviation within a sample batch is usually within 6 s it may easily exceed the commonly allowed retention time window of 30 s in targeted methods when comparing the results of data sets analyzed on different days, may be even after column maintenance. Deviations in RT of this magnitude can hamper peak recognition and would need to be corrected manually [6], a tedious and time consuming procedure when processing a large number of target analytes. Instead, after XCMS processing, automatically aligned and integrated data can be downloaded from the website and conveniently used for further processing.
Though automated data processing often has an impaired peak recognition and produces much higher signal variance and noise [3], the ratios of peak areas obtained after XCMS processing compared to the vendor supplied software, Xcalibur were within a tolerance of 10% suggesting that XCMS performs absolute competitively. However, not all selective mass traces analyzed by targeted analysis were present in the dataset obtained after evaluation with XCMS using a standard parameter set indicating that optimization of the parameter may be required.
Visualization of Data Distribution and Differential Display of Signals in Large Data Sets
In addition to the common features of vendor software such as peak identification and integration, XCMS offers an array of different formats to overview large data sets and assess the structure of the data. For example, cloud plots are “differential feature plots”, a very useful application of XCMS for visualization of the differences between two datasets after untargeted signal recognition [7]. It highlights “dysregulated” signals by a certain ratio and significance. Figure 2 shows the cloud plot obtained after XCMS evaluation of pyruvate and glucose treated cells. 2335 features (unique combinations of RT and m/z) were detected between two samples with a p-value ≤ 0.01 and a fold-change ≥1.5. The total ion chromatogram (TIC) is illustrated for comparison in the background.
The m/z of analytes which are higher in one sample are shown in green color while those whose signal intensity was found higher in the other samples are displayed in red color; the intensity of color shades corresponds to the p value and the size of dots to the log fold change. Cloud plots are very clear and easy comprehensible presentations of non-targeted evaluation enabling a first glance on potential differences between two samples.
The online platform XCMS provides many advantages compared to the traditionally used vendor-supplied software. In addition to a result table including all detected m/z and the corresponding peak area it provides very useful tools for exploratory analysis of large datasets and allows for interactive viewing of tables and features such as cloud plots, heat maps, PCA, mirror plots, multidimensional scaling plots and other [4]. In conclusion, XCMS is a simple to use and extremely useful tool box which is increasingly used for untargeted analysis of data obtained from different methods of hyphenated mass spectrometry and data visualization. The algorithmic details of XCMS were published by Smith et al. [8]. 
1 University of Leipzig, Institute of Analytical Chemistry, Leipzig, Germany
Claudia Birkemeyer
University of Leipzig
Institute of Analytical Chemistry
Leipzig, Germany
Related Articles:
XCMS – Free online platform:

[1] Benton, H.P., et al.: Analytical Chemistry 87 (2), 884–891 (2015). doi: 10.1021/ac5025649
[2] Gowda, G.A. and Djukovic, D.: Molecular Biology 1198, 3–12 (2014). doi: 10.1007/978-1-4939-1258-2_1
[3] Birkemeyer, C., et al.: Chemical Senses, 1-11 (2016). doi: 10.1093/chemse/bjw056
[4] Gowda, H., et al.: Analytical Chemistry 86 (14), 6931-6939 (2014). doi: 10.1021/ac500734c
[5] Tautenhahn, R., et al.: Analytical Chemistry 84(11), 5035–5039 (2012). doi: 10.1021/ac300698c
[6] Skelton, D., Dissertation: Investigating mammalian cellular metabolism using 13C-glucose and GC-MS, The Florida State University (2011).
[7] Patti G.J., et al.: Analytical Chemistry 85(2), 798−804 (2013). doi: 10.1021/ac3029745
[8] Smith C.A., et al.: Analytical Chemistry 78(3), 779-787 (2006). doi: 10.1021/ac051437y

Register now!

The latest information directly via newsletter.

To prevent automated spam submissions leave this field empty.