Binary Logistic Regression and Linear Discriminant Analyses in Evaluating Laboratory Factors Associated with COVID-19: A Comparison of Two Statistical Methods
PDF
Cite
Share
Request
RESEARCH ARTICLE
P: 20-20
January 2022

Binary Logistic Regression and Linear Discriminant Analyses in Evaluating Laboratory Factors Associated with COVID-19: A Comparison of Two Statistical Methods

Mediterr J Infect Microb Antimicrob 2022;11(1):20-20
1. Baqiyatallah University of Medical Sciences, Students’ Research Committee, Tehran, Iran
2. Baqiyatallah University of Medical Sciences, Neuroscience Research Center; Baqiyatallah University of Medical Sciences, School of Medicine, Department of Physiology and Medical Physics, Tehran, Iran
3. Baqiyatallah University of Medical Sciences, Applied Virology Research Center, Tehran, Iran
4. Baqiyatallah University of Medical Sciences, Lifestyle Institute, Health Research Center, Tehran, Iran
No information available.
No information available
PDF
Cite
Share
Request

Summary

Introduction: Few studies have been conducted to construct a reliable predictive model for the differential diagnosis of severe and non-severe Coronavirus disease-2019 (COVID-19) in the early stages of the disease. This study aimed to compare the accuracy of linear discriminate analysis (LDA) and binary logistic regression (BLR), as two empirical correlations, in predicting COVID-19 severity using single laboratory data and calculated indexes such as the neutrophil-to-lymphocyte ratio (NLR) and systemic immune-inflammation index (SII).
Materials and Methods: We investigated 109 patients with confirmed COVID-19 pneumonia. Epidemiological, demographic, clinical, laboratory, and outcome data were obtained, and the patients were classified into two groups: mild group (42 patients) and severe group (67 patients).
Results: A comparison of the clinical data in the severe and non-severe groups showed significant differences in SpO2 and respiratory rate. In addition, significant difference in NLR, SII, white blood cell count, neutrophil count, mean corpuscular volume and mean corpuscular hemoglobin, lymphocyte count, erythrocyte sedimentation rate, lactate dehydrogenase, and blood urea nitrogen was found between both groups. Moreover, there was a small difference between the LDA and LR models, and LDA was more appropriate for a smaller sample size.
Conclusion: Our predictive models could help clinicians to identify patients at risk of severe COVID-19 Such prediction can be performed by a simple blood test. LDA and BLR can be used to effectively classify patients with severe and non-severe COVID-19, even with violation of the normality assumption.

Keywords:
Severe COVID-19, linear discriminant analysis, binary logistic regression, blood test data

Introduction

The novel Coronavirus, named by the World Health Organization as Severe acute respiratory syndrome-Coronavirus-2, has diffusion worlwide[1]. Coronavirus disease-2019 (COVID‐19) is an infectious disease with a high incidence that affects people differently and poses a threat to people’s life and health. Of patients with novel Coronavirus infections, approximately 81% of patients were mild, 14% were severe, and 5% were critical cases[2], and severe illness often led to death, based on the available infestigations[2, 3]. Critically ill patients have a high mortality and poor prognosis. Therefore, the early prediction of moderate or severe acute respiratory syndrome (ARS) is vital and can help  clinicians to reduce the mortality rate[2, 4, 5].

The neutrophil-to-lymphocyte ratio (NLR) and systemic immune-inflammation index (SII), as indicators of inflammation and immune response, were calculated from a routine blood test[6, 7]. Neutrophil-to-lymphocyte ratio is known as a risk factor for mortality from infectious diseases, malignancies, intracerebral hemorrhage, and dermatomyositis[6, 8]. Systemic immune-inflammation index is a prognostic factor in some malignancies such as breast cancer, hepatocellular carcinoma, and esophageal squamous cell carcinoma[7, 9, 10]. It has been demonstrated that severe and critical ARS cases tend to have higher neutrophil counts and lower lymphocyte counts. Several studies suggest that NLR and SII are two independent predictors of COVID-19 progression in the early stages of the disease[3, 6, 7]. In addition, several single laboratory and clinical markers, including C-reactive protein, lymphocyte and neutrophil counts, creatine phosphokinase, erythrocyte sedimentation rate (ESR), urea, and creatinine, have been tested on patients with COVID-19, and can also be used for predicting the ARS severity[4, 11-14].

Thus far, studies on COVID-19 have focused on the epidemiology of the disease, clinical characteristics of patients, and the risk factors associated with mortality during hospitalization in critical COVID-19 cases. However, few studies have been conducted to predict progression among patients in the early stages of the disease, and these studies were based on calculated indexes or single blood test factors[4, 13, 15]. This study aimed to compare the accuracy of two empirical correlations, binary logistic regression (BLR) and linear discriminant analysis (LDA), in predicting COVID-19 severity with single laboratory and clinical data or indexes calculated from blood tests (NLR and SII). Predicting the severity of disease conditions is a binary classification problem, and the exact statistical method for data fitting is a frequent question for researchers[16]. However, the two methods differ in their basic principles. While LR makes no assumptions on the distribution of the explanatory data, LDA has been developed for normally distributed explanatory variables. It is therefore reasonable to expect better results from LDA when the normality assumptions are met, although LR is more appropriate in all other situations[17]. Therefore, BLR and LDA, the two most applicable statistical classifier techniques, were used for baseline prediction in this study because of the increasing interest in choosing between BLR and LDA for biological data analysis.

Methods

Data Collection

The participants of the present study were 109 patients diagnosed with COVID-19 pneumonia [confirmed by computed tomography (CT) and reverse transcriptase-polymerase chain reaction (RT-PCR)] in Baqiyatallah hospital between February 20, 2020 and June 9, 2020 in Tehran, Iran. Epidemiological, demographic, clinical, laboratory, and outcome data were obtained from the Baghiyatallah laboratory computer system, electronic medical records, and interviews with patients. Then, the patients were divided into two groups according to the severity of the disease. Hence, there was a mild group (consisting of 42 patients) and a severe group (consisting of 67 patients).

The clinical classification of patients as having severe or non-severe COVID-19 was established based on clinical signs of pneumonia with SpO2 <90% on room air or treatment in the intensive care unit.

The proposal of the study was approved by the Research Ethics Committee, Baghiyatallah University of Medical Sciences, Tehran, Iran (coded: IR.BMSU.RETECH.REC.1399.094). All participants signed a written informed consent.

Statistical Analysis

Statistical analyses were performed using IBM Statistical Package for the Social Sciences Statistics Software (version 26; IBM, New York, USA). Quantitative data were presented as mean±standard error. A p value of <0.05 was defined as statistical significance. ANOVA and t-test were used to compare groups and means of two groups, respectively. For classification, we compared BLR analysis with LDA, and the receiver operating characteristic (ROC) curve was plotted for each model. In addition, we examined the prediction ability of two different independent factors: NLR and SII: defined as neutrophil * platelet/lymphocyte), as calculated laboratory indexes and single hematological factors (selected based on ANOVA).

Binary Logistic Regression Analysis

A type of regression was used to predict probabilities of the presence or absence of a particular disease, characteristic, and condition. A logistic regression model predicts a dependent data variable by analyzing the relationship between one or more existing independent variables, and the predicted probability must lie between 0 and 1.

Linear Discriminant Analysis

The discriminant analysis focuses on the association between multiple independent variables and a categorical dependent variable by forming a composite of the independent variables. This type of multivariate analysis can determine the extent to which any of the composite variables discriminates between two or more pre-existing groups of subjects, in addition to deriving a classification model for predicting the group membership of new observations. The simplest type of discriminant analysis is when the dependent variable has two groups. In this case, a linear discriminant function that passes through the means of the two groups (centroids) can be used to discriminate subjects between the two groups.

Receiver Operating Characteristic Curve

For each model, we plotted the corresponding ROC curve. A ROC curve graphically displays sensitivity and 100% minus specificity (false positive rate) at several cutoff points. By plotting the ROC curves for two models on the same axes, we are able to determine which test is better for classification, namely, that test whose curve encloses the larger area beneath it.

Results

Demographics Data of Patients with Mild and Severe COVID-19

The present study included 109 patients with confirmed COVID-19 pneumonia (confirmed by CT and RT-PCR) who were admitted to Baqiyatallah hospital between February 20, 2020 and June 9, 2020 in Tehran, Iran. These patients were divided into two groups according to their disease severity (mild or severe). Table 1 demonstrates the demographic characteristics of 109 patients. There were no significant differences in sex, age, and BMI between patients with severe or mild COVID-19. Patients in the severe group had low SpO2 and were more likely to have comorbidities (Table 1). Other characteristics such as respiratory rate and blood pressure showed no significant difference between both groups.

Table 1: Demographics and baseline characteristics of patients with COVID-19

Laboratory Findings of Patients with Mild and Severe COVID-19

The patients with severe COVID-19 had higher white blood cell and neutrophils counts, mean corpuscular volume (MCV), and mean corpuscular hemoglobin (MCH) than those with mild COVID-19, with statisitically significant differences (p<0.05) (Table 2). In contrast, lymphocyte count was significantly reduced in severe COVID-19 patients. Compared with the mild group. ESR, lactate dehydrogenase (LDH), and blood urea nitrogen were significantly increased in the severe group (Table 2). Neutrophil-to-lymphocyte ratio and SII as calculated indexes showed a significant difference between patients with mild and severe COVID-19 (Table 2).

Table 2: Laboratory findings of patients with COVID-19

Mathematical Analysis

Using laboratory parameters, five characters of single-factor haematologic data were extracted for the first classification. Platelet count, MCV, MCH, neutrophil count, and lymphocyte count, were compared with two calculated indexes as a second classification factor: NLR and SII. These variables were used in both discriminant and logistic regression analyses to determine the role of single and calculated haematologic data in the prediction of disease severity in COVID-19 patients.

Analysis by Binary Logistics Regression

BLR techniques revealed that both analyzed groups were the same, and the overall correct prediction rate was 65.1% for the calculated index and 67.0% for the haematologic single factors (Table 3). Moreover, we observed that the Goodness-of-Fit test (Hosmer&Lemeshow) was higher in the haematologic single-factor classification.

Table 3: Binary logistic regression (comparing systemic immune-inflammation index & neutrophil-to-lymphocyte ratio as a calculated index with hematology single-factor)

The role of predictors in explaining the outcome, using BLR, is reported in Tables 4 and 5.

Table 4: The role of predictors in explaining the outcome using binary logistic regression: report of systemic immune-inflammation index & neutrophil-to-lymphocyte ratio

Table 5: The role of predictors in explaining the outcome using binary logistic regression: report of hematology single-factor

Wald factor shows the importance of each variable in of dependent which higher is better. Therefore, the first model SII (Table 4) and the second model neutrophil and platelets (Table 5) are more important. In addition, Odds ratio (OR) showed an OR for dichotomous predictors to predict the presence of outcomes.

Analysis by Linear Discriminant Analysis

Whereas the overall correct prediction rate was 67.9% for the calculated index and 68.8% for the haematologic single-factor (Table 6), the LDA method revealed that both analyzed groups were the same (such as the BLR technique). Moreover, we observe that the Goodness-of-Fit test (Hosmer&Lemeshow) was higher in the haematologic single-factor classification.

Table 6: Linear Discriminant Analysis (comparing systemic immune-inflammation index & neutrophil-to-lymphocyte ratio as a calculated index with hematology single-factor)

The standardized discriminant function coefficients indicate the relative importance of the independent variables in predicting the dependent. In contrast, with the BLR method, NLR is more important in LDA (Table 7). The large absolute values of neutrophils showed the greater discriminating ability of this factor (Table 8).

Table 7: The role of predictors in explaining the outcome using linear discriminant analysis report of: systemic immune-inflammation index & neutrophil-to-lymphocyte ratio

Table 8: The role of predictors in explaining the outcome using linear discriminant analysis report of: hematology single-factor

Analysis by Receiver Operating Characteristic Curve

As shown in Figure 1, the ROC curves of the aforementioned models clearly indicate that the logistic model is similar to the discriminant analysis model. In addition, Table 9, which presents the area under the ROC curve (AUC), shows no difference between the two techniques used in this study.

Table 9: Area under the curve

Figure 1: ROC curve comparing the potential of different variables for predicting severe COVID-19. A) Binary logistic regression; B) Linear discriminant analysis
ROC: Receiver operating characteristic, COVID-19: Coronavirus disease-2019

Discussion

The current study provided demographic, laboratory, clinical data, complications, and outcomes data of hospitalized patients with non-severe and severe COVID-19 in Baqiyatallah Hospital, Tehran, Iran. More than 50% of the patients in the current study were classified as severe cases, which is consistent with Li et al.’s[18] study. In accordance with other studies[19, 20], our results showed that COVID-19 patients with comorbidities such as hypertension, coronary heart disease, diabetes, and chronic obstructive lung disease are more vulnerable to severe disease. Thus, due to the increased rate of mortality in severe patients, early identification is vital to increase the treatment efficiency of COVID-19 and reducing mortality.

Previous studies demonstrated that increase in neutrophil counts and decrease in lymphocyte counts were correlated with COVID-19 severity[21, 22]. The current study indicated that WBC, MCV, MCH, neutrophil, lymphocyte, ESR, blood urea nitrogen, and LDH were significantly different between severe and non-severe patients with COVID-19. Therefore, neutrophil and lymphocyte counts, MCV, MCH, and platelets were selected as the hematology single predictor factors for severity in this study and compared with NLR and SII as calculated haematologic factors. For these reasons, the present study compared logistic regression and LDA to investigate the accuracy of the applied classifications in order to make the choice between the methods easier. The methods do not differ in their functional forms. It seems that the advantages of LDA or LR depend on the sample size and normality of variables[16, 17, 23]. Our results showed that both logistic regression and discriminant analyses converged in similar results. Both methods estimated the same statistically significant coefficients. This study is consistent with other studies that performed BLR and LDA to examine the effects of sample sizes. Pohar et al.[17] revealed that the correct classification was achieved for a sample size of more than 50. However, they concluded that the correct classification is sensitive to the assumption of normality, and that the LDA model performed better with a sufficiently large sample size. This report was confirmed by Musa et al.[16], who demonstrated that the differences between the two methods are negligible for a sample size of more than 100 members. Although, ROC curve analysis and the AUC are considered as another helpful parameter for evaluating the quality of the LDA and BLR, the result of the ROC curves of the present study showed a small difference in AUC between LDA and BLR. These results are in agreement with other studies which showed that the AUC was similar for both models[23].

As mentioned above, in LDA and BLR analysis, single and calculated haematologic factors were associated with COVID-19 severity. The result of LDA and BLR analysis revealed the small difference in overall correct prediction between both analyzed groups, although this factor was higher for single hematology factors (Tables 3, 6). Our data suggest that neutrophil, lymphocyte, MCV, MCH, and platelets should be considered to evaluate severity in patients with the novel Coronavirus, especially the differences in neutrophils between the severe and non-severe groups (Tables 5, 8). Although the role of single factors predictors in explaining the outcome using LDA or BLR were consistent, we observed a dissimilarity in the role of the calculated haematologic factors between the two methods. In contrast, with the BLA method, NLR was more important in LDA (Tables 4, 7). Regarding the basic principles of the two methods, LDA was developed for normally distributed variables[17, 23]; thus, it is expected to be more appropriate when the normality assumptions are met.

The limitation of this study is the small sample size. Therefore, future studies should be performed with a larger number of participants.

Conclusion

Our prediction models could help clinicians to early identify patients at risk of severe COVID-19, and this prediction can be conducted using a simple blood test. LDA and BLR can be used to effectively classify patients with severe and non-severe COVID-19, even with violation of the normality assumption.

Acknowledgment

The authors would like to thank the Clinical Research Development Unit of Baqiyatallah Hospital, for all their support and guidance during carrying out this study.

Ethics

Ethics Committee Approval: The proposal of the study was approved by the Research Ethics Committee, Baghiyatallah University of Medical Sciences, Tehran, Iran (coded: IR.BMSU.RETECH.REC.1399.094, date: 19.04.2020).

Informed Consent: Consent form was filled out by all participants.

Peer-review: Externally and internally peer-reviewed.

Authorship Contributions

Surgical and Medical Practices: M.M.P., T.M., Z.E.N., Z.S., Concept: M.M.P., M.M.A., T.M., Z.S., Design: F.B., Z.R., Z.S., Data Collection or Processing: M.M.A., F.B., H.E.G., Z.R., Analysis or Interpretation: M.S., M.M.A., Literature Search: M.S., F.B., H.E.G., T.M., Writing: M.M.P., M.S., H.E.G., E.N.

Conflict of Interest: No conflict of interest was declared by the authors.

Financial Disclosure: The authors declared that this study received no financial support.

References

1
World Health Organization. Naming the coronavirus disease (COVID-19) and the virus that causes it WHO: World Health Organizer; 2020. Available from: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/naming-the-coronavirus-disease-(covid-2019)-and-the-virus-that-causes-it
2
National Institutes of Health. Overview of COVID-19: Epidemiology, Clinical, Presentation, and Transmission. 2020. In: COVID-19 Treatment Guidelines [Internet]. Available from: https://www.covid19treatmentguidelines.nih. gov
3
Ikuyama Y, Wada Y, Tateishi K, Kitaguchi Y, Yasuo M, Ushiki A, Urushihata K, Yamamoto H, Kamijo H, Mita A, Imamura H, Hanaoka M. Successful recovery from critical COVID-19 pneumonia with extracorporeal membrane oxygenation: A case report. Respir Med Case Rep. 2020;30:101113.
4
Gao Y, Li T, Han M, Li X, Wu D, Xu Y, Zhu Y, Liu Y, Wang X, Wang L. Diagnostic utility of clinical laboratory data determinations for patients with the severe COVID‐19. J Med Virol 2020;92:791-6.
5
Gong J, Ou J, Qiu X, Jie Y, Chen Y, Yuan L, Cao J, Tan M, Xu W, Zheng F, Shi Y, Hu B. A Tool to Early Predict Severe Corona Virus Disease 2019 (COVID-19): A Multicenter Study using the Risk Nomogram in Wuhan and Guangdong, China. medRxiv 2020. Available from: https://www.medrxiv. org/content/10.1101/2020.03.17.20037515v2
6
Liu Y, Du X, Chen J, Jin Y, Peng L, Wang HHX, Luo M, Chen L, Zhao Y. Neutrophil-to-lymphocyte ratio as an independent risk factor for mortality in hospitalized patients with COVID-19. J Infect. 2020;81:6-12.
7
Li H, Huang JB, Pan W, Zhang CT, Chang XY, Yang B. Systemic Immune- Inflammatory Index predicts prognosis of patients with COVID-19: a retrospective study. 2020:1-30.
8
Qin C, Zhou L, Hu Z, Zhang S, Yang S, Tao Y, Xie C, Ma K, Shang K, Wang W, Tian DS. Dysregulation of immune response in patients with COVID-19 (COVID-19) in Wuhan, China. Clin Infect Dis 2020;71:762-8.
9
Tang JN, Goyal H, Yu S, Luo H. Prognostic value of systemic immune-inflammation index (SII) in cancers: a systematic review and meta-analysis. J Lab Precis Med 2018;3:29.
10
Fu H, Zheng J, Cai J, Zeng K, Yao J, Chen L, Li H, Zhang J, Zhang Y, Zhao H, Yang Y. Systemic immune-inflammation index (SII) is useful to predict survival outcomes in patients after liver transplantation for hepatocellular carcinoma within Hangzhou criteria. Cell Physiol Biochem. 2018;47:293-301.
11
Velavan TP, Meyer CG. Mild versus severe COVID-19: laboratory markers. Int J Infect Dis. 2020;95:304-7.
12
Dayarathna S, Jeewandara C, Gomes L, Somathilaka G, Jayathilaka D, Vimalachandran V, Wijewickrama A, Narangoda E, Idampitiya D, Ogg GS, Malavige GN. Similarities and differences between the ‘cytokine storms’ in acute dengue and COVID-19. Sci Rep. 2020;10:19839.
13
Lu Y, Huang Z, Wang M, Tang K, Wang S, Gao P, Xie J, Wang T, Zhao J. Clinical Characteristics and Predictors of Mortality for Young Adults with Severe COVID-19: A Retrospective Study. 2021;20:3.
14
Kumar A, Arora A, Sharma P, Anikhindi SA, Bansal N, Singla V, Khare S, Srivastava A. Clinical features of COVID-19 and factors associated with severe clinical course: a systematic review and meta-analysis. SSRN. 2020:3536166.
15
Ji M, Yuan L, Shen W, Lv J, Li Y, Chen J, Zhu C, Liu B, Liang Z, Lin Q, Xie W, Li M, Chen Z, Lu X, Ding Y, An P, Zhu S, Gao M, Ni H, Hu L, Shi G, Shi L, Dong W. A predictive model for disease progression in non-severely ill patients with coronavirus disease 2019. Eur Respir J. 2020;56:2001234.
16
Musa AB, Alsir Alkhidir Abedalraheem A, T. Ibrahim M, Hamad H, Mohamed Ahmed Shaheen S. Divergence and Similarity of the Binary Logistic Regression and Linear Discriminant Analysis Models in Evaluating Factors Associated with Bluetongue Virus in Cattle, International Journal of Statistics and Applications. 2019;9:180-5.
17
Pohar M, Blas M, Turk S. Comparison of logistic regression and linear discriminant analysis: a simulation study. Metodoloski Zvezki. 2004;1:143-61.
18
Li X, Xu S, Yu M, Wang K, Tao Y, Zhou Y, Shi J, Zhou M, Wu B, Yang Z, Zhang C, Yue J, Zhang Z, Renz H, Liu X, Xie J, Xie M, Zhao J. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J Allergy Clin Immunol. 2020;146:110-8.
19
Wang B, Li R, Lu Z, Huang Y. Does comorbidity increase the risk of patients with COVID-19: evidence from meta-analysis. Aging (Albany NY). 2020;12:6049-57.
20
Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, Xiang J, Wang Y, Song B, Gu X, Guan L, Wei Y, Li H, Wu X, Xu J, Tu S, Zhang Y, Chen H, Cao B. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:1054-62.
21
Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265-9.
22
Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen HD, Chen J, Luo Y, Guo H, Jiang RD, Liu MQ, Chen Y, Shen XR, Wang X, Zheng XS, Zhao K, Chen QJ, Deng F, Liu LL, Yan B, Zhan FX, Wang YY, Xiao GF, Shi ZL. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270-3.
23
Shayan Z, Mohammad Gholi Mezerji N, Shayan L, Naseri P. Prediction of depression in cancer patients with different classification criteria, linear discriminant analysis versus logistic regression. Global J Health Sci 2015;8:41-6.