Fig. 5: Word2vec and HNSW.

a Transforming the spectra into low-dimensional spectral embeddings using the Word2vec model. The Word2vec model is trained on fragments in the mass spectra. Each fragment is represented by a word that contains its position up to a defined integer precision (Peak@xx). For example, a fragment at m/z 89 translates into the word “Peak@89”. b Approximate nearest neighbor search based on the hierarchical navigable small-world graph. The top layer is the entry point and contains only the longest links. As moving down the layers, the link lengths become shorter and more numerous. The search process is as follows: traversing edges in each layer; greedily moving to the nearest vertex until finding a local minimum in the current layer; repeating the above process until finding the nearest neighbors of the query vector in the bottom layer (layer 0).