Fig. 2: Precision and recall as a function of the number of model predictions validated. | Nature Machine Intelligence

Fig. 2: Precision and recall as a function of the number of model predictions validated.

From: A framework to evaluate machine learning crystal stability predictions

Fig. 2

A typical discovery campaign will rank hypothetical materials by model-predicted hull distance from most to least stable and validate the most stable predictions first. A higher fraction of correct stable predictions corresponds to higher precision and fewer stable materials overlooked corresponds to higher recall. Precision is calculated based only on the selected materials up to that point, while the cumulative recall depends on knowing the total number of positives upfront. Models such as eqV2 S DeNS and Orb MPtrj perform better for exhaustive discovery campaigns (screening a higher share of the candidate pool); others such as CHGNet do better when validating a smaller percentage of the materials predicted to be most stable. UIPs offer notably improved precision on shorter campaigns of ~20,000 or fewer materials validated, as they are less prone to false-positive predictions among highly stable materials.

Back to article page