Application of Fourier transform infrared spectroscopy and orthogonal projections to latent structures/partial least squares regression for estimation of procyanidins average degree of polymerisation.

Fourier transform infrared (FTIR) spectroscopy has being emphasised as a widespread technique in the quick assess of food components. In this work, procyanidins were extracted with methanol and acetone/water from the seeds of white and red grape varieties. A fractionation by graded methanol/chloroform precipitations allowed to obtain 26 samples that were characterised using thiolysis as pre-treatment followed by HPLC-UV and MS detection. The average degree of polymerisation (DPn) of the procyanidins in the samples ranged from 2 to 11 flavan-3-ol residues. FTIR spectroscopy within the wavenumbers region of 1800-700 cm(-1) allowed to build a partial least squares (PLS1) regression model with 8 latent variables (LVs) for the estimation of the DPn, giving a RMSECV of 11.7%, with a R(2) of 0.91 and a RMSEP of 2.58. The application of orthogonal projection to latent structures (O-PLS1) clarifies the interpretation of the regression model vectors. Moreover, the O-PLS procedure has removed 88% of non-correlated variations with the DPn, allowing to relate the increase of the absorbance peaks at 1203 and 1099 cm(-1) with the increase of the DPn due to the higher proportion of substitutions in the aromatic ring of the polymerised procyanidin molecules.


Introduction
Grape seeds are a rich source of flavan-3-ols. These structures are present in mixtures of monomers together with oligomeric and polymeric procyanidins, mainly composed by residues of (−)epicatechin, (+)-catechin, and (−)-epicatechin-O-gallate ( Fig. 1) [1,2]. In grape seed tissues, although the quantification of procyanidins has only been accurately performed for oligomers up until tetramers, they have been shown to represent more than 70% of the total flavan-3-ols [3]. Procyanidins with different degrees of polymerisation have different properties, namely, they confer different organoleptic properties to the food [4,5] and they have different absorption behaviour at the gastrointestinal tract level [6]. The estimation of the average degree of polymerisation (DPn) of procyanidins is thus a useful parameter to evaluate the type of procyanidins present in a sample. Because the majority of procyanidin-rich samples contain molecules with different degrees of polymerisation [7], fractionation steps can be used to improve their homogeneity.
For the calculation of the DPn, the polymers need to be submitted to an acid-catalysed degradation in the presence of nucleophilic agents, such as thiolysis that promote the formation of distinct monomers corresponding to the terminal and extension units of the polymer [1,8]. These compounds are then usually characterised by reversed-phase HPLC with UV detection at 280 nm [9][10][11][12]. More recently, advanced techniques based on mass spectrometry analysis have become very effective in qualitative analysis. These highly sensitive methods are able to characterise procyanidins in complex matrices, allowing the direct identification of the molecules with different degrees of polymerisation [13][14][15][16]. Anyway, the analyses using HPLC-UV are both time consuming and expensive and the advanced techniques such as HPLC-MS require highly specialised equipments. Therefore, versatile, cheap, and rapid analyses are desirable and of great interest for its use in routine and generalised analyses.
Infrared spectroscopy has been a very useful tool for rapid evaluation of the procyanidin composition of a given sample [17,18]. Nevertheless, the signals obtained for potential information are complex. Chemometric methods allow the extraction of useful information from the large amount and complex data generated. Among the vast field of chemometrics, the multivariate regression methods have been widely used to provide a better insight into the  systems and to build calibration and prediction models. Partial least squares (PLS) regression is one of the most used models [19][20][21][22]. The PLS algorithm is based on a bilinear model, where the information contained in the X data matrix is projected over a small number of latent variables known as PLS components. The Y data matrix is actively used for estimating the latent variables and ensure that the first components of these are the most relevant for predicting the Y dependent variables. The interpretation of the relationships between the X and Y data is simplified to the relationships between the smaller number of PLS components [23,24].
The method of multivariate regression by PLS has been extensively used in chemometrics, where a wide field of applications has been found. This has been proposed and implemented in the routine analysis of a large number of parameters on wine industry [17,25,26]. Also, it has been used for the identification and characterisation of polysaccharides and proteins and for their quantification [27][28][29]. The estimation of the procyanidins DPn in the range of 2-6 residues by using PLS algorithm was recently attempted in dried red wine samples previously purified by C18solid phase extraction [30].
In this work, regression models based on the FTIR spectral region between 1800 and 700 cm −1 using PLS1 and O-PLS1 were assayed in grape seeds freeze-dried procyanidin extracts. Identification of procyanidins molecular features related to the absorbance characteristics in the mid infrared region were established and correlated with procyanidins DPn.

Materials
Methanol, ethyl acetate, n-hexane, and acetone, from Sigma-Aldrich Co. Other reagents were of analytical grade or higher available purity.

Plant material
Seeds were obtained from grapes (Vitis vinifera L.) of the white variety 'Chardonnay' at technological maturity, in Bairrada Appellation, from an experimental vineyard (Estaç ão Vitivinícola da Bairrada, Anadia, Portugal), during transfer of the musts for fermentation. A mixture of red grape varieties 'Touriga Nacional', 'Touriga Francesa', and 'Tinta Roriz', were also obtained from the wine producers of Adega Cooperativa de Pinhel (Pinhel, Portugal). The remaining pulp and skins were separated from grape seeds by decantation and sieving (pore size <2.8 mm diameter). The seeds were then submitted to several wash cycles with water (200 g L −1 ) under gentle stirring with a magnetic bar at 4 • C for a minimum of 3 days, with two water exchanges a day, until a minimum turbidity was constant, assuring that no remaining adherent tissues were present. The purified seeds were then washed with ethanol, air dried at room temperature, and stored at 4 • C until further analysis.

Procyanidin crude extracts (PCE)
Seeds were immersed into liquid nitrogen, milled in a domestic coffee mill and sieved (pore size <0.75 mm diameter). The extraction methodology was adapted from Guyot et al. [12], as described by Cardoso et al. [31]. Seed powder was extracted three times with n-hexane to remove the lipids. It was then treated three times with methanol containing 5% acetic acid to extract the phenolic compounds. The methanol extracts were filtered through a G3 sintered glass filter, combined, concentrated under vacuum at 40 • C, with several additions of water to assure the complete removal of methanol and acetic acid, frozen, and freeze-dried, to give the methanol procyanidin crude extracts (PCE). The residue resultant from the methanolic treatment was extracted three times with acetone/water solution containing 5% acetic acid, whereas the acetone Scheme 1. Extraction and fractionation of grape seed flavan-3-ols. SPE: solid phase extraction; SCP: sequential chloroform precipitations.
was eliminated as described for methanol. The aqueous solution was frozen and freeze-dried to obtain the acetone/water PCE.

Procyanidins fractionation
The methanol PCE and the acetone/water PCE from white and red grape seeds were fractionated according to the methanol/chloroform graded precipitations proposed by Saucier et al. [9] as summarised in Scheme 1. The PCE powder (10 g L −1 ) was dissolved in water containing 5% acetic acid and the undissolved material (F 0 ) was removed by centrifugation (Centrifuge 3K30, Sigma, St. Louis, MO, USA). The supernatant was then submitted three times to a liquid-liquid extraction with ethyl acetate, using a water/ethyl acetate ratio of 6:4 (v/v), resulting in an organic phase (F 1 ) and an aqueous phase (F 2 ). The F 1 solution (PCE organic phase) was concentrated and loaded into a C18-solid phase extraction column (SPE-C18, SPE, Supelco-Discovery -5 g) by eluting with diethyl ether followed by methanol, allowing to obtain fractions F 1.1 and F 1.2 , respectively. The F 2 solution (PCE aqueous phase) was evaporated to dryness and redissolved in methanol (10 g L −1 ). The undissolved material was removed by centrifugation and the supernatant was submitted to successive additions of chloroform until a new precipitate was formed. The precipitate was then collected by centrifugation, dissolved in water, rotary-evaporated with several additions of water to completely remove the organic solvents, frozen, and freeze-dried. The material soluble in chloroform after the last precipitation was named "SN" and was recovered as described for the precipitated fractions. The composition of the fractions containing polymeric procyanidins is given in Table 1.

Thiolysis and HPLC
Thiolysis was carried out according to the methodology described by Naczk and Shahidi [10] and HPLC analysis followed the conditions described by Peng et al. [32]. The HPLC apparatus used was from PerkinElmer (series 200), with UV-vis detector (785A UV-VIS Detector). Samples were loaded at 30 • C into a C 18 column (LichroCart 250-4 Superspher 100 RP-18) equipped with a C 18 guard cartridge with the same packing material equilibrated with 0.2% (v/v) formic acid (eluent A). Phenolic compounds were eluted by a gradient with 82% (v/v) acetonitrile and 0.04% (v/v) formic acid (eluent B) from 0% to 15% eluent B in the first 15 min; 15% to 16% from 15 to 40 min; 16% to 17% from 40 to 45 min; 17% to 43% from 45 to 48 min; 43% to 52% from 48 to 49 min, held isocratic at 52% from 49 to 56 min, reduced from 52% to 43% from 56 to 57 min, reduced from 43% to 17% from 57 to 58 min, and reduced from 17% to 0% from 58 to 60 min. Samples were loaded, at least, in duplicate. Peaks were detected at 280 nm and the monomers and procyanidin B2 dimer were identified by comparison of their retention times with standards. The epicatechin thioderivative was identified by comparison with the retention time of the products of procyanidin B2 dimer after thiolysis; the catechin and epicatechin-O-gallate thioderivatives were identified by their retention times and abundance and confirmed by analysis of their mass spectra using a LC-MS (Waters alliance 2690) as described by Passos et al. [14]. The average degree of polymerisation (DPn) was calculated as the ratio of all the areas of flavan-3-ols units (thioether adducts plus terminal units) to the sum of the areas of catechin, epicatechin, and epicatechin-O-gallate corresponding to terminal units. The DPn estimated was in the range DPn ± 0.5.
The calibration curves for estimation of phenolic compounds were obtained using (+)-catechin, (−)-epicatechin, (−)epicatechin-O-gallate, and procyanidin B2 dimer, in the range of concentration of 0.005-0.5 g L −1 . The quantification of phenolics in the fractions was made by comparison of the chromatographic area after thiolytic degradation of the samples with the respective calibration curve. As thioderivative standards were not available, they were quantified by using the respective monomer calibration curves based on the similar response factors to the correspondent monomeric units [13].

FTIR spectroscopy and multivariate analyses
The FTIR spectra of each fraction presented in Table 1 were obtained using a Golden Gate single reflection diamond ATR system in a Bruker IFS-55 spectrometer with a Deuterated Triglycine Sul-  fate (DTGS) detector. The spectra were recorded at the absorbance mode from 4000 to 400 cm −1 (mid infrared region) at the resolution of 8 cm −1 . Five analytical replicate spectra (128 co-added scans) were collected for each sample. The measured spectra were transferred via a JCAMP.DX format into the data analysis software developed in the Institut National Agronomique Paris-Grignon in collaboration with the University of Aveiro [33]. The multivariate calibration was applied in the 1800-700 cm −1 region and due to amplification spectrum effects they were pre-processed using SNV (Standard Normal Variate). Fig. 2 shows the used spectral region before and after the SNV correction.

Calibration model framework
In order to build the calibration model for the quantification of the DPn, a Monte Carlo cross-validation [34] framework was used. The dataset was split into a calibration (learning set) and a validation (external) set to assess the predictive power of the DPn model. Table 1 includes the 26 samples used to calibrate and the four PCE samples (which are the crude extracts, containing a heterogeneous material) for prediction. The splitting process consisted in sorting the DPn values and then, randomly, selected 40% of the samples, where replicates are considered a sample, were used as validation set. The remaining 60% of the samples, with replacement, were used as calibration set. This procedure was repeated several times (iterations), 200 regression models were built and for each one the "optimal" model dimensionality based on the RMSECV value and LV was recovered. This allowed one to see how many times of a given LV/RMSECV "optimal" pair (distribution profile) was used to build a predictive model. The selection of model complexity was based on the most frequent pair of LV/RMSECV. Then, the selected model dimensionality was used to predict the parameters of interest from the external set, expressed as root mean square error of prediction (RMSEP).
Since this approach is very computational demanding when using PLS1 [20], the Principal Component Transform PLS1 (PCT-PLS1) [23] was used instead to build the calibration models in order to accelerate the Monte Carlo cross-validation process.

Characterisation of grape seed procyanidin fractions by HPLC
The procyanidin dataset used in this study is composed by 26 samples purified from the methanol and acetone/water procyanidin crude extracts (PCE) from seeds of white and red grapes. The flavan-3-ols content vary from 8.6% (WM-F 0 ) to 84.1% (WM-F1), having an average of 36.1% and standard deviation of 17.8%, thus, covering a large range and dispersion of procyanidin concentrations in samples (Table 1). These flavan-3-ols are composed by flavan-3-ol monomers of catechin, epicatechin and epicatechin-Ogallate, and by procyanidins (oligomers and polymers) with these same three constituting units. In particular, the procyanidin crude extracts (Table 1, PCE) have a DPn range between 4.2 and 7.4. By fractionation in chloroform/methanol solutions, the data set was enlarged, allowing to obtain fractions containing procyanidins with DPn ranging from 1 (SN material) to 10.8. According to the data in Table 1, an increase in the percentage of chloroform allows to obtain precipitated procyanidin fractions with lower DPn (Table 1, F 2 extracts).
For the majority of the samples, catechin is the main terminal unit, whereas epicatechin occurs as the main extension unit. The epicatechin-O-gallate unit accounts for approximately 4.5-32.5% of the total procyanidin residues.

Characterisation of procyanidins by FTIR
The characteristic wavenumbers related to the phenolic compounds are associated to the presence of an OH band between 3600 and 3200 cm −1 . Also, they show aromatic, ester, alcohol, and ether bands in the region between 1800 and 700 cm −1 . All spectra have a similar profile (Fig. 2) and, according to the bibliography [35,36], changes in the procyanidins aromatic ring bands are expected to occur in this spectral region.
3.3. Calibration models for estimation of procyanidin DPn 3.3.1. Calibration model for estimation of procyanidin DPn using PCT-PLS1 Using the 1800-700 cm −1 region for all the FTIR spectra of the procyanidin fractionated samples, excluding the crude extracts, a PLS1 regression procedure was applied for estimation of their DPn. To accomplish that, a calibration model with 8 latent variables (LVs), using an internal-cross-validation (leave-5-out) procedure, was seen to have predictive power. The relative root mean square error of cross-validation (rRMSECV) obtained was 11.7%, with a coefficient of determination (R 2 ) of 0.91 and a root mean square error of prediction (RMSEP) of 2.58.
However, the relative high number of LVs in the previous model introduces difficulties in the interpretation of the pp (predictive loadings) vector profiles. As a matter of fact the y variance explained by this PLS1 regression model shows some irregularity concerning LV2 ( Table 2), suggesting that some systematic variations present in the spectra are not related to the y variability. In these cases, one approach is to remove from the model all spectra variations orthogonal to the factor of interest (DPn). One such method is the O-PLS (Orthogonal Projections to Latent Structures) [37]. Therefore, this procedure was applied with the aim of improving the interpretation of the PLS1 regression model by removing orthogonal artefacts not related to the DPn profile. Nonetheless, it should be noted that O-PLS method does not improve the robustness of a calibration model.

Calibration model for estimation of procyanidin DPn using O-PLS
The O-PLS was used in the Monte Carlo cross-validation procedure (similar to the one described in Section 2.5) to remove orthogonal systematic variations from the spectra with respect to the DPn values. The Monte Carlo cross-validation indicated seven O-PLS components to be removed from the regression model resulting into a 1 LV model for calibration/interpretation purposes. This procedure has removed 88% of non-correlated variations present in FTIR spectra. The regression model obtained presented a relative rRMSECV of 8.6%, with a R 2 of 0.95 and a RMSEP of 2.58 (Fig. 3a), which shows that it is not significantly different from the PLS1 model with 8 LVs.

Validation of O-PLS model for estimation of procyanidin DPn
In order to validate the regression model obtained for the estimation of the procyanidins DPn using the O-PLS1 model (Fig. 3), the procyanidin crude extracts (PCE) were used. The results show that similar values were obtained using the FTIR/O-PLS1 calibration curve and the thiolysis/HPLC procedures. Such evidence is even more important considering that the PCE extracts represent a more complex matrix than the fractionated extracts, allowing to infer that this approach can be a useful tool for the estimation of DPn in non-purified extracts.  structure of a dimer such as the one represented in Fig. 1, comprises two flavan-3-ol residues (one terminal and one extension unit) bonded by an interflavanic linkage. Each flavan-3-ol residue is composed by two aromatic rings (A and B) and a non-aromatic ring (C). The hydroxyl groups on ring A occur in C5 and C7, resulting in a 1,3-disubstitution (meta-substitution), whereas on B ring, the hydroxyl groups occur in C3 and C4 , forming a 1,2-disubstitution of the aromatic ring (ortho-substitution). The interflavanic linkage between the two composing units of the dimer originates an extra C7-C8 ortho-substitution of the aromatic ring A of the terminal unit of the procyanidin. In total, a dimeric procyanidin with a DP = 2, formed by two flavan-3-ol units, has one meta-substitution and one ortho-substitution per unit plus one meta-substitution due to the bond formed between the two composing units. In general, a procyanidin formed by n flavan-3-ol units (DP = n), has n + (n − 1) 1,2-disubstitutions and n 1,3-disubstitutions. As a consequence, the ratio 1,2-disubstitutions/1,3-disubstitutions can be expressed by the following formula: (2n − 1)/n. Fig. 4 shows the relationship of the solubility of procyanidins in methanol/chloroform solutions and their average degree of polymerisation. The material that precipitate in solutions of approximately 40% chloroform tend to have a DPn of 8-10, the material that precipitate in solutions near 60% chloroform tend to have a DPn of 6-7, and the samples that precipitate in solutions in the order of 80% chloroform tend to have a DPn of 3-5. The samples that do not precipitate in these solutions contain mainly flavan-3ol monomers or, at least, procyanidin dimers (data not shown). The graded precipitation in methanol/chloroform solutions is a simple and quick method to evaluate the DPn of procyanidins in a fraction, as well as a rough but suitable methodology to recover enriched fractions of procyanidins with a defined degree of polymerisation that can be used, for example, in industry. In this work, the precipitation methodology was an undoubtedly important step in order to obtain samples for performing a suitable calibration/validation data set.

Concluding remarks
The need for testing fast and cheap but reliable methodologies for the estimation of the DPn of procyanidins has been the basis of the present work. The difference in the solubility of the procyanidins in methanol/chloroform solutions was shown to be useful for this purpose. This methodology of fractionation also allowed to obtain the number and representative samples required to build a FTIR model for the determination of the DPn of procyanidins. Using the resulting calibration model it is possible to estimate the DPn of grape seed procyanidins and to assign the major positive changes in the pp vector at 1203 and 1099 cm −1 to the increase of aromatic substitutions in polymerised molecules.
Although the methodology has not yet been tested for raw materials other than grape seeds, it is expected that applications to other procyanidin sources can be successfully obtained using these methodologies.