A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables

Research output: Contribution to conference › Poster › Research › peer-review

Standard

A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables. / Tashk, Ashkan; Engelsen, Søren Balling; Khakimov, Bekzod; Steenstrup Pedersen, Kim; Sørensen, Klavs Martin; kwr854, kwr854.

2023. Poster session presented at Food Analytics Conference, Copenhagen, Denmark.

Research output: Contribution to conference › Poster › Research › peer-review

Harvard

Tashk, A, Engelsen, SB, Khakimov, B, Steenstrup Pedersen, K, Sørensen, KM & kwr854, K 2023, 'A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables', Food Analytics Conference, Copenhagen, Denmark, 15/11/2023.

APA

Tashk, A., Engelsen, S. B., Khakimov, B., Steenstrup Pedersen, K., Sørensen, K. M., & kwr854, K. (2023). A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables. Poster session presented at Food Analytics Conference, Copenhagen, Denmark.

Vancouver

Tashk A, Engelsen SB, Khakimov B, Steenstrup Pedersen K, Sørensen KM, kwr854 K. A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables. 2023. Poster session presented at Food Analytics Conference, Copenhagen, Denmark.

Author

Tashk, Ashkan ; Engelsen, Søren Balling ; Khakimov, Bekzod ; Steenstrup Pedersen, Kim ; Sørensen, Klavs Martin ; kwr854, kwr854. / A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables. Poster session presented at Food Analytics Conference, Copenhagen, Denmark.

Bibtex

@conference{72abb2ba54cf4a079cd07c68c561f62a,

title = "A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables",

abstract = "Missing values are frequent problems in data analytics studies, especially when a calibration model should be designed [1]. This study introduces a novel method, utilizing the PLS2 algorithm [2, 3] for imputing missing values within Ultra Centrifugation (UC) measurements of lipoprotein (LP) subfractions in human plasma. LP subfractions are essential biomarkers of food-related diseases such as obesity and cardiovascular diseases. They are categorized into four main fractions based on density and size: very low-density (VLDL), intermediate density (IDL), low-density (LDL), and high-density (HDL) LPs. Our proposed algorithm leverages proton (1H) nuclear magnetic resonance (NMR) spectroscopy of LP in human blood plasma as a promising analytical method for rapid quantification of LP subfractions [5]. The robust and reliable NMR spectral data (p=1500) serves as the input X variable, while the UC variables (p=65) are used as the response variables. The UC variables are prone to measurement errors, occasionally considered as missing variables. The proposed imputation method is iterative and consists of several stages to impute the missing values effectively. First, the samples are stratified based on the number of missing values in each sample. Subsequently, bootstrapping cross-validation is applied using PLS2 modeling. PLS2 models and their associated number of Latent Variables (LVs) with root mean squared error of cross-validation (RMSECVs) falling within a confidential interval of the RMSECV distribution, i.e., [µ-σ, µ] are extracted. Next, the weighted mean of the predicted values for the extracted PLS2 models is calculated. The imputation process then progresses to the next stratification level until all samples are included and all missing values are imputed. Finally, the whole procedure is repeated until all the imputed variables converge.Comparative analysis reveals that the proposed PLS2-based imputation method outperforms other commonly used imputation strategies such as iterative PCA [6] and missMDA [7]. The algorithm demonstrates superior performance in accurately imputing missing values within UC measurements of LP subfractions, enabling researchers to obtain more reliable and accurate results.In conclusion, this study highlights the significance of addressing missing values in chemometrics studies. The proposed algorithm, which combines PLS2 modeling and NMR spectroscopy, offers a robust approach for imputing missing values within UC measurements of LP subfractions. Initial results demonstrate that the method's efficacy surpasses traditional imputation strategies and may contribute to enriching and improving lipoprotein prediction models based on NMR spectroscopy.",

author = "Ashkan Tashk and Engelsen, {S{\o}ren Balling} and Bekzod Khakimov and {Steenstrup Pedersen}, Kim and S{\o}rensen, {Klavs Martin} and kwr854 kwr854",

year = "2023",

language = "English",

note = "Food Analytics Conference ; Conference date: 15-11-2023",

url = "https://food.ku.dk/english/calender/events/food-analytics-conference-2023/",

}

RIS

TY - CONF

T1 - A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables

AU - Tashk, Ashkan

AU - Engelsen, Søren Balling

AU - Khakimov, Bekzod

AU - Steenstrup Pedersen, Kim

AU - Sørensen, Klavs Martin

AU - kwr854, kwr854

PY - 2023

Y1 - 2023

N2 - Missing values are frequent problems in data analytics studies, especially when a calibration model should be designed [1]. This study introduces a novel method, utilizing the PLS2 algorithm [2, 3] for imputing missing values within Ultra Centrifugation (UC) measurements of lipoprotein (LP) subfractions in human plasma. LP subfractions are essential biomarkers of food-related diseases such as obesity and cardiovascular diseases. They are categorized into four main fractions based on density and size: very low-density (VLDL), intermediate density (IDL), low-density (LDL), and high-density (HDL) LPs. Our proposed algorithm leverages proton (1H) nuclear magnetic resonance (NMR) spectroscopy of LP in human blood plasma as a promising analytical method for rapid quantification of LP subfractions [5]. The robust and reliable NMR spectral data (p=1500) serves as the input X variable, while the UC variables (p=65) are used as the response variables. The UC variables are prone to measurement errors, occasionally considered as missing variables. The proposed imputation method is iterative and consists of several stages to impute the missing values effectively. First, the samples are stratified based on the number of missing values in each sample. Subsequently, bootstrapping cross-validation is applied using PLS2 modeling. PLS2 models and their associated number of Latent Variables (LVs) with root mean squared error of cross-validation (RMSECVs) falling within a confidential interval of the RMSECV distribution, i.e., [µ-σ, µ] are extracted. Next, the weighted mean of the predicted values for the extracted PLS2 models is calculated. The imputation process then progresses to the next stratification level until all samples are included and all missing values are imputed. Finally, the whole procedure is repeated until all the imputed variables converge.Comparative analysis reveals that the proposed PLS2-based imputation method outperforms other commonly used imputation strategies such as iterative PCA [6] and missMDA [7]. The algorithm demonstrates superior performance in accurately imputing missing values within UC measurements of LP subfractions, enabling researchers to obtain more reliable and accurate results.In conclusion, this study highlights the significance of addressing missing values in chemometrics studies. The proposed algorithm, which combines PLS2 modeling and NMR spectroscopy, offers a robust approach for imputing missing values within UC measurements of LP subfractions. Initial results demonstrate that the method's efficacy surpasses traditional imputation strategies and may contribute to enriching and improving lipoprotein prediction models based on NMR spectroscopy.

AB - Missing values are frequent problems in data analytics studies, especially when a calibration model should be designed [1]. This study introduces a novel method, utilizing the PLS2 algorithm [2, 3] for imputing missing values within Ultra Centrifugation (UC) measurements of lipoprotein (LP) subfractions in human plasma. LP subfractions are essential biomarkers of food-related diseases such as obesity and cardiovascular diseases. They are categorized into four main fractions based on density and size: very low-density (VLDL), intermediate density (IDL), low-density (LDL), and high-density (HDL) LPs. Our proposed algorithm leverages proton (1H) nuclear magnetic resonance (NMR) spectroscopy of LP in human blood plasma as a promising analytical method for rapid quantification of LP subfractions [5]. The robust and reliable NMR spectral data (p=1500) serves as the input X variable, while the UC variables (p=65) are used as the response variables. The UC variables are prone to measurement errors, occasionally considered as missing variables. The proposed imputation method is iterative and consists of several stages to impute the missing values effectively. First, the samples are stratified based on the number of missing values in each sample. Subsequently, bootstrapping cross-validation is applied using PLS2 modeling. PLS2 models and their associated number of Latent Variables (LVs) with root mean squared error of cross-validation (RMSECVs) falling within a confidential interval of the RMSECV distribution, i.e., [µ-σ, µ] are extracted. Next, the weighted mean of the predicted values for the extracted PLS2 models is calculated. The imputation process then progresses to the next stratification level until all samples are included and all missing values are imputed. Finally, the whole procedure is repeated until all the imputed variables converge.Comparative analysis reveals that the proposed PLS2-based imputation method outperforms other commonly used imputation strategies such as iterative PCA [6] and missMDA [7]. The algorithm demonstrates superior performance in accurately imputing missing values within UC measurements of LP subfractions, enabling researchers to obtain more reliable and accurate results.In conclusion, this study highlights the significance of addressing missing values in chemometrics studies. The proposed algorithm, which combines PLS2 modeling and NMR spectroscopy, offers a robust approach for imputing missing values within UC measurements of LP subfractions. Initial results demonstrate that the method's efficacy surpasses traditional imputation strategies and may contribute to enriching and improving lipoprotein prediction models based on NMR spectroscopy.

M3 - Poster

T2 - Food Analytics Conference

Y2 - 15 November 2023

ER -

ID: 375015102

Department of Computer Science