Fig. 2: Information of datasets and performance of FastEI.

a The molecule classes predicted by ClassyFire for the test set. b The visualization of the ECFPs of 240,000 molecules randomly selected from f-CHEMBL and 232,826 molecules from the training set by UMAP. c The spectrum matching time of FastEI and WCS on libraries with different sizes. d The contribution of Word2vec embeddings and HNSW to FastEI (WCS weighted binning + cosine similarity, EC embeddings + cosine similarity, BH weighted binning + HNSW, FastEI embeddings + HNSW). e The performance of FastEI and WCS on the test set in terms of recall rates at different top x levels. Source data are provided as a Source Data file.