Multivariate and Multiway Calibrations
What Analytical Chemists Would Ask From Aladin’s Lamp
- © JackMoreh, freerangestock.com
- Fig. 1: Hierarchy of data structures and nomenclature. For a single sample, univariate data are scalars (zeroth-order), multivariate data are vectors (first-order) and beyond, and multiway data are matrices (second-order), three-dimensional arrays (third-order) and additional, not shown, higher-dimensional data. For a sample set, the corresponding data arrays are named as one-, two-, three- and four-way arrays respectively.
- Fig. 2: Univariate chromatogram for a mixture of polycyclic aromatic hydrocarbons, carried out according to the official protocol, and involving fluorescence detection at a single wavelength and solvent gradient. FLT, fluoranthene, PYR, pyrene, CHR, chrysene, BaA, benzo[a]anthracene, BbF, benzo[b]fluoranthene, BeP, benzo[e]pyrene, BjF, benzo[j]fluoranthene, BkF, benzo[k]fluoranthene, BaP, benzo[a]pyrene, DBA, dibenz[a,h]anthracene, BgP, benzo[g,h,i]perylene, IcP, indeno[1,2,3-cd]pyrene. The blue trace corresponds to the analytes; BeP and BjF are interferents.
- Fig. 3: Top: three-dimensional landscape of an elution time-fluorescence emission wavelength data matrix, recorded for a mixture of similar composition to the sample of Figure 2, but under isocratic conditions and in a much shorter time. Bottom: pure analyte chromatograms obtained by multiway calibration of elution time-fluorescence emission wavelength matrices. Individual analyte determination proceeded even when interferents are present in test samples. Analyte acronyms as in Figure 2. Reprinted with permission from . Copyright 2009 American Chemical Society.
The holy grail of chemical analysis is the monitoring of sample properties or concentrations of selected substances, remotely, non-invasively, automatically, and avoiding the use of solvents or specific reagents. This can be achieved by measuring certain optical signals, e.g., near infrared or Raman spectra, provided they are processed by multivariate calibration models, appropriately trained with a large and diverse basis set of reference samples. More complex signals obtained by hyphenated chromatography or matrix fluorescence spectroscopy provide an even more revolutionary bonus: quantitating analytes in complex interfering samples by calibrating with a handful of pure standards.
Classical analysis is a straightforward two-step process: (1) calibration using pure analyte standards, which renders the calibration model (slope and intercept of a calibration line), followed by (2) analyte quantitation in unknown samples, interpolating its signal in the calibration line. This classical (univariate) approach requires that the analyte is the only substance producing signal, or that: (1) all interferences are removed prior to the analysis (extraction, masking, distillation), (2) the analyte reacts specifically so that only its product produces signal, (3) the analyte and interferences are physically separated (chromatography), etc.
What is Multivariate Calibration?
In comparison with this approach, multivariate calibration may seem like a mythical animal. When building calibration models with spectra, such as near infrared (NIR) or Raman, there is no need to separate analytes from interferences, and measurements can be done without dissolving or grinding the sample. Just pointing a handheld NIR spectrometer the size of a smartphone at a sample may provide useful analytical results in a matter of seconds. Examples that are today routine in most industrial laboratories or field activities may appear as chapters of a science fiction book: fat, protein, moisture and starch can be measured directly on intact oil seeds, simultaneously, instantaneously and without organic solvents, organoleptic properties of foodstuffs (wine, coffee, beer, meat, olive oil) and textile properties or plant species can be assessed without human intervention.
Many other examples abound in various industrial fields.
NIR cameras have introduced a new dimension to this scenario: the spatial one. Today it is possible to measure NIR or Raman spectra in each pixel of a material surface, so that a data table is collected with spectro-spatial structure. These data are called hyperspectral, and allow one, among other things, to monitor the spatial distribution of chemical species on a surface. Applications include the study of the homogeneity of pharmaceuticals in solid tablets or pellets, the distribution of chlorophyll and other plant components in crop fields, etc., all made in a remote, non-invasive and automatic manner.
Where is it used?
The approach is not restricted to the digital processing of spectra. Other instrumental signals are slowly entering the analytical scene, whether optical (fluorescence, mid/far infrared, laser induced breakdown, nuclear magnetic resonance) or electrical (sensor arrays, impedance spectroscopy, voltammetry). Other scientific fields make use of similar approaches, e.g. bird species can be automatically classified from their sound, musical emotions and moods can be predicted from audio records, business cycles can be forecasted in econometric studies. It is a world open to scientific exploration as never before.
Exciting as it may seem, however, multivariate spectral calibration is not the analytical panacea. Infrared spectral sensitivity is rather low, so that detecting traces of analytes is not an easy task, and some analyte signals are difficult to measure (e.g., metallic elements), meaning that chromatography and atomic spectroscopy are still needed. Moreover, to be able to cope with the presence of interferents, the mathematical multivariate models need to be properly trained. This means that a large and diverse reference sample set is required for model building, usually involving hundreds or thousands of samples, for which nominal analyte values or target properties should be previously measured by classical techniques. Furthermore, the model performance needs to be monitored over time, because there is nothing to prevent future samples containing new interferents, not present in the calibration set. Should this occur, re-calibrating the model is required, adding the new interferents to the data base. These undesirable features can be overcome, however, by going one step further in the number of mathematical dimensions of the measured data.
The Mathematical View
From a mathematical perspective, spectra can be viewed as vectors (lists of numbers one below each other). However, more complex data can be measured and processed, and this is when we move to the multiway calibration field. For example, a liquid chromatograph hyphenated to a diode array detector is able to measure a data table for a given sample, i.e., a data matrix. Its columns are UV-visible spectra, each of them collected at a different elution time. In an analogous fashion, a gas chromatograph with mass spectral detection can measure a data table whose columns are mass spectra. One could also employ a fluorescence spectrophotometer to measure an excitation-emission fluorescence matrix for each sample. Its columns are emission spectra, each of them collected at a different excitation wavelength. Scanning emission spectra at various excitation wavelengths is today possible in a matter of seconds using modern fast-scanning spectrofluorimeters. Less popular in industrial laboratories, fluorescence matrix spectroscopy has played a major role in developing multiway calibration, and is highly appreciated in scientific research. In principle, the complexity and number of data modes (the independent directions of a data array) can be increased, and some developments have been described by processing three- and even four-dimensional instrumental data per sample. However, this higher multiway protocols are still in their infancy regarding their mathematical understanding and potential analytical advantages.
Multiway calibration and its outstanding properties are even more fabulous than those of multivariate spectral calibration. Just to give a common example, by measuring chromatographic-spectral matrices or data tables, you may be able to quantitate analytes in complex samples, without the need of a large training set. Simply prepare a few pure analyte standards, measure their data matrices, join these data with those for the unknown sample, and let a multiway calibration model to mathematically separate the analyte contribution from those of the interferences. In other words, you do not need to worry about sample pre-treatment or clean-up, baseline resolution of every analyte peak or background corrections. Chromatographic protocols become simpler, isocratic, cheaper, faster and, perhaps more importantly, greener. The approach has been called chroMATHography, a nice game on words proposed by a famous chemometrician. A more technical name for this property is second-order advantage.
A Successful Application
A case worth mentioning here is the determination of polycyclic aromatic hydrocarbons (PAH) in aqueous samples. PAHs are environmentally concerned substances, many of them suspected to be carcinogenic to humans. The official liquid chromatographic protocol involves fluorescence detection and a mobile phase with solvent gradient, taking ca. 40 min to achieve baseline resolution. However, it is not free from potential interferents, and some non-regulated PAHs may coelute with the analytes. If liquid chromatographic matrices are measured with spectral (instead of single-wavelength) fluorescence detection, the job can be done under isocratic conditions and in less than 5 min. In our lab, ten highly co-eluting PAHs have been resolved and quantitated at sub-ppb levels in aqueous samples using this methodology, even in the presence of uncalibrated interferents in unknown specimens. The resolution of the ten individual analytes, and their digital separation from the interferents, was possible by applying a powerful data processing algorithm known as multivariate curve resolution-alternating least-squares (MCR-ALS), which is based on the so-called bilinear model for matrix chromatographic data. Without going into further details, bilinear means that the matrix data can be conceived as the product of two separate matrices, one of them containing pure component chromatograms and another one the associated spectra. After the decomposition phase, the ‘virtual’ pure chromatograms can be employed in a classical manner to produce a calibration line, where the test sample signal is interpolated to yield the concentration of a specific analyte.
Fluorescence spectroscopy itself is also able to yield data matrices, by collecting emission spectra at a number of excitation wavelengths. The technique has allowed researchers to develop protocols for many different analytes in really complex natural or industrial samples. In fact, the first report showing that the second-order advantage was possible, published in 1975, described the determination of a polycyclic aromatic hydrocarbon in the presence of other fluorescent congeners, calibrating only with pure analyte solutions. No one suspected at that time that multiway calibration would be so revolutionary to analytical chemistry. A nice sequel of this work was recently developed in our lab: the determination of four PAHs on a nylon membrane attached to a rotating disk, which was left in contact with aqueous test solutions for a few minutes. Fluorescence matrices were then read directly on the membrane. Thanks to the pre-concentration properties of nylon, the protocol allowed the quantitation of individual analytes with detection limits in the range from 20 to 100 ng L−1, i.e., 20 to 100 parts-per-trillions!
Why is there no Widespread Adoption?
Even with all these almost unbelievably useful properties, there are no crowds of analytical chemists knocking on the door of multiway calibration. This implies that considerable work needs to be done on the communication side between chemometricians and end analytical users. Regrettably, there are many valuable resources for the analytical community buried in highly specific journals devoted to pure chemometrics. The communication issue has been addressed in a recent meeting (Topics in Chemometrics, TIC, Szeged, Hungary, May 2019), where one researcher suggested that the chemometricians should be out there, offering the digital products to chemists, rather than trying to solve problems that do not exist, or waiting until chemists come to them. Chemometricians may have the answer to the Chemist’s problem - so talk to each other!
Alejandro C. Olivieri1
1 Departamento de Química Analítica, (IQUIR-CONICET), Rosario, Argentina
Professor Alejandro C. Olivieri
Universidad Nacional de Rosario
Instituto de Química de Rosario
 Bortolato SA; Arancibia JA; Escandar, GM; “Non-trilinear chromatographic time retention-fluorescence emission data coupled to chemometric algorithms for the simultaneous determination of 10 polycyclic aromatic hydrocarbons in the presence of interferences” Anal. Chem. 2009, 81, 8074-8084. DOI: 10.1021/ac901272b.
 Cañas A; Richter P; Escandar GM; “Chemometrics-assisted excitation-emission fluorescence spectroscopy on nylon-attached rotating disks. Simultaneous determination of polycyclic aromatic hydrocarbons in the presence of interferences” Anal. Chim. Acta 2014, 852, 105-111. DOI: 10.1016/j.aca.2014.09.040.
 Olivieri AC; Escandar GM; “Practical three-way calibration” Elsevier, Waltham, US, 2014. DOI: 10.1016/B978-0-12-410408-2.00001-6.
 Smilde A; Bro R; Geladi P; “Multi-way analysis: applications in the chemical sciences” Wiley, Chichester, 2004. DOI: 10.1002/0470012110.
 Murphy KR; Stedmon CA; Graeber D; Bro R; “Fluorescence spectroscopy and multi-way techniques. PARAFAC” Anal. Methods 2013, 5, 6557-6566. DOI: 10.1039/C3AY41160E.