Fig. 5: Word2vec and HNSW. | Nature Communications

Fig. 5: Word2vec and HNSW.

From: Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library

Fig. 5

a Transforming the spectra into low-dimensional spectral embeddings using the Word2vec model. The Word2vec model is trained on fragments in the mass spectra. Each fragment is represented by a word that contains its position up to a defined integer precision (Peak@xx). For example, a fragment at m/z 89 translates into the word “Peak@89”. b Approximate nearest neighbor search based on the hierarchical navigable small-world graph. The top layer is the entry point and contains only the longest links. As moving down the layers, the link lengths become shorter and more numerous. The search process is as follows: traversing edges in each layer; greedily moving to the nearest vertex until finding a local minimum in the current layer; repeating the above process until finding the nearest neighbors of the query vector in the bottom layer (layer 0).

Back to article page