Fig. 1: Flowchart of FastEI and WCS methods. | Nature Communications

Fig. 1: Flowchart of FastEI and WCS methods.

From: Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library

Fig. 1

First, a large-scale in-silico library is generated from the NIST and ChEMBL datasets. For the querying of spectra, FastEI (top branch) uses Word2vec to transform the spectrum into spectral embeddings. These embeddings are given a Hierarchical Navigable Small-world Graph (HNSW) index, which is used for retrieving similar candidates from the library. Comparatively, classical binning methods (bottom branch) divide the spectra in bins and compare them with the ones in the library, most commonly using the weighted cosine similarity (WCS) as the measure.

Back to article page