Table 2 Classification performance based on vectorization method.

From: Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records

Method

ER+

ER−

PR+

PR−

HER2+

HER2−

F1

AUC

F1

AUC

F1

AUC

F1

AUC

F1

AUC

F1

AUC

RWOV-NN

0.95

0.95

0.82

0.96

0.93

0.96

0.86

0.95

0.68

0.89

0.91

0.86

RWOV-SVM

0.90

0.88

0.70

0.88

0.76

0.72

0.58

0.71

0.53

0.77

0.84

0.77

SVM(1,2)

0.92

0.94

0.69

0.94

0.91

0.95

0.81

0.95

0.31

0.67

0.87

0.68

SVM(2,2)

0.93

0.94

0.72

0.94

0.91

0.96

0.82

0.96

0.37

0.73

0.90

0.74

SVM(1,3)

0.94

0.94

0.73

0.94

0.90

0.95

0.80

0.95

0.32

0.69

0.89

0.70

SVM(2,3)

0.94

0.94

0.75

0.94

0.91

0.95

0.82

0.95

0.40

0.72

0.91

0.74

SVM(3,3)

0.93

0.92

0.71

0.92

0.90

0.92

0.79

0.92

0.35

0.71

0.91

0.72

SVM-W2V

0.71

0.64

0.40

0.64

0.67

0.69

0.57

0.69

0.20

0.45

0.60

0.43

NN(1,2)

0.93

0.95

0.71

0.94

0.90

0.95

0.82

0.95

0.35

0.81

0.91

0.79

NN(2,2)

0.93

0.93

0.72

0.94

0.92

0.96

0.81

0.95

0.39

0.80

0.91

0.80

NN(1,3)

0.94

0.94

0.72

0.95

0.91

0.95

0.81

0.95

0.30

0.79

0.91

0.79

NN(2,3)

0.93

0.92

0.70

0.93

0.92

0.94

0.80

0.95

0.33

0.78

0.90

0.78

NN(3,3)

0.93

0.92

0.69

0.91

0.89

0.92

0.79

0.92

0.20

0.76

0.90

0.76

NN-W2V

0.85

0.76

0.40

0.75

0.81

0.75

0.63

0.78

0.07

0.43

0.87

0.41

  1. RWOV had the most consistent performance across classification tasks. The best performing method for each metric on each task is shown in bold in the table. In only one case did RWOV-NN not have the best performance (PR− AUC), however it was very close to the top performer.