Fig. 1: Flowchart of FastEI and WCS methods.

First, a large-scale in-silico library is generated from the NIST and ChEMBL datasets. For the querying of spectra, FastEI (top branch) uses Word2vec to transform the spectrum into spectral embeddings. These embeddings are given a Hierarchical Navigable Small-world Graph (HNSW) index, which is used for retrieving similar candidates from the library. Comparatively, classical binning methods (bottom branch) divide the spectra in bins and compare them with the ones in the library, most commonly using the weighted cosine similarity (WCS) as the measure.