Figure 1

Schematic representation of the machine learning models used in this paper for case/control classification. Colored circles represent input variables xi, yellow square the output prediction Y. Small wave symbol represents a sigmoidal function, used to transform a quantitative parameter into a probability of disease association. All formula are approximated and meant to give an idea of the models. (A) Logistic Regression: the prediction of this model is given by applying a sigmoid function to a weighted sum of the inputs. (B) Dense Neural Networks: they can be seen as multiple stacked logistic regressions. Here we represent a simplified network with one hidden layer with 3 neurons, and the output layer with 1 neuron. Each neuron receives the sigmoid of the weighted sum of its inputs. (C) Gradient Boosting on Decision Trees: the prediction is given by the sigmoid of the sum of the outputs leafs of hundreds of decision trees (\(\eta \) being the learning rate).