Multivariate Calibration Models: Process Analysis and Control
- Process Analysis and Control
- Fig. 1: Ammonia concentrations predicted by 4 PLS models developed using mid-infrared spectroscopic data.
- Prof. Ian Marison‘s research group at the Laboratory of Integrated Bioprocessing (LiB) at Dublin City University and National Institute of Bioprocessing Research and Training (NIBRT)
Alongside the huge technological advances over the last 20 years has come the development of powerful instrumentation capable of in-depth process analysis and control. However the use of such instruments introduces the need for mathematical techniques capable of interpreting information and providing reliable results. Multivariate calibration models are frequently used to express large volumes of data in simpler, more manageable forms.
PAT Applications in Bioprocessing
In 2002 the FDA announced its new initiative, "Pharmaceutical cGMPs for the 21st Century", the aim of which was to support and promote risk-based and science-based approaches in regulatory decision making . Process Analytical Technology (PAT) is a guiding principle of this initiative. PAT embraces and develops new and existing technologies and exploits these to manage or control a process. In order to enforce such stringent, tight control, instrumentation and systems must be developed to allow for accurate, reliable measurements in real-time. In bioprocessing, this need translates to a search for instrumentation and techniques capable of monitoring and controlling various process steps from initial media composition to substrate depletion, product formation and biomass growth rates. Much of the work to date in these areas has been done using infrared techniques [2, 3], which produce multivariate data sets. The useful information embedded within these data sets needs to be extracted, if it is to be of use as part of a control strategy, and so, it is the development of multivariate calibration models, which is the key to unlocking this information.
What is a Multivariate Calibration Model?
Calibrations may be divided into 2 types: univariate and multivariate. The former is a simple method, where a change in the variable to be measured is directly reflected by a change in the sensor, and these can be correlated, e.g. in a mercury thermometer an increase in temperature will result in an increase in volume and hence the height of liquid in the glass tubing, from which the temperature increase can be established. A multivariate calibration is much more complex.
In this case a large number of independent variables are generated, which relate to one predicted dependent variable e.g. absorbance values (independent variables) over a range of wavenumbers in the mid-infrared region can be used to predict the concentration (dependent variable) of a particular component. These variables are related to each other by a calibration model.
Chemometrics (multivariate analysis techniques) are used to extract this data and in the case above, establish correlations between concentration and absorbance. They are often used as data reduction techniques since chemometric analysis allows multivariate data to be transformed into a much smaller number of variables. The important information is maximized and system noise is minimized. A number of chemometric techniques can be employed, depending on the required information.
Principle Component Analysis (PCA) is often used for exploratory analysis and pattern recognition. PCA is an unsupervised technique as it only uses the chemical information provided by the samples and the model does not rely on training set data. In this way PCA can identify differences and similarities between samples, but not specify why these differences and similarities may exist i.e. it can not distinguish to what class or group these belong. In bioprocessing PCA can be used as a qualification technique for raw materials or products where differences between samples may be quite subtle.
Discriminant analysis techniques can be used to assign samples to a particular class. Models are developed for each class and these models are then applied to the samples of interest in order to establish to which class, if any, the samples may belong. In bioprocessing, such models may be used in media acceptance testing or product qualification. Soft Independent Modeling of Class Analogy (SIMCA) is a discriminant analysis technique where a PCA model is developed for each class and Partial Least Squares-Discriminant Analysis (PLS-DA) is a technique which uses the class information in the model development and as such is a supervised technique. Partial Least Squares (PLS) regression can also be used to correlate spectral absorbance data with concentration i.e. for quantifying information and therefore can be used as predictive tools for in process monitoring.
Making a Quantitative Model
The efficacy of a calibration model is largely dependent on the calibration data set used in the development of the model. The data set must be representative of the process stage to be monitored, e.g. if monitoring substrate concentration during the course of a cell culture, the maximum and minimum concentrations must be used, in addition to a broad range of intermediate concentrations. A detailed but manageable experimental design must be employed. This experimental design should take the full concentration range of the component of interest into account, but also other influencing factors, such as other components present in the culture media and the temperature at which the culture will be carried out. It is typical when developing a calibration model that some form of cross validation be carried out. However a validation set should also be generated to test or validate the model. This is a direct test of the accuracy of the model as it is applied to an independent data set, completely separate from the data set used in the development of the model.
Care must be taken not to make the model too specific to the calibration data set as would be the case when too many chemometric-transformed variables are used and noise has been included in the model. This is known as overfitting. In this case the model will predict with great accuracy the concentration of the component of interest in samples belonging to the calibration set, but other samples will fall outside of the prediction ability of the model. However, it is important to make sure that all of the relevant information has been used in the model development as if not, the accuracy of the predicted values will fall below an acceptable level. This is known as underfitting. A good calibration model will be developed from a calibration set determined by a careful experimental design and will perform well when applied to an independent validation sample set.
It is possible to generate a wealth of information at any stage of a bioprocess using the various types of instrumentation currently available, however in the case of multivariate data the strength and reliability of the information obtained is largely dependent on the calibration model used to translate this data into meaningful results. This highlights the importance of the multivariate calibration model and its role in bioprocess monitoring. The development of a precise and accurate model can harness untapped process information easily, information which would either be much more difficult to obtain or, not be determined at all.
 FDA. Pharmaceutical cGMPS for the 21st Century - A Risk-Based Approach - Final Report, 2004).
 Kornmann et. al.: J. Biotechnol. 113, 231-245 (2004).
 Cervera A. E. et. al.: Biotechnol. Prog. 25, 1561-1581 (2009).