LassoPred: a tool to predict the 3D structure of lasso peptides

Ouyang, Xingyu; Ran, Xinchun; Xu, Han; Al-Abssi, Runeem; Zhao, Yi-Lei; Link, A. James; Yang, Zhongyue J.

doi:10.1038/s41467-025-60544-4

Download PDF

Article
Open access
Published: 01 July 2025

LassoPred: a tool to predict the 3D structure of lasso peptides

Nature Communications volume 16, Article number: 5497 (2025) Cite this article

Subjects

Abstract

Lasso peptides (LaPs), characterized by their entangled slipknot-like structures, are a large class of ribosomally synthesized and post-translationally modified peptides (RiPPs), with examples functioning as antibiotics, enzyme inhibitors, and molecular switches. Despite thousands of LaP sequences predicted by bioinformatics, only around 50 distinct LaPs have been structurally characterized in the past 30 years. Existing computational tools, such as AlphaFold2, AlphaFold3 and ESMfold, fail to accurately predict LaP structures due to their irregular scaffold featuring a lariat knot-like fold and the presence of an isopeptide bond. To address this challenge, we developed LassoPred, designed with a classifier to annotate the ring, loop, and tail of an LaP sequence and a constructor to build a 3D structure. Leveraging LassoPred, we predict the 3D structures for 4749 unique LaP core sequences, creating the largest in silico-predicted lasso peptide structure database to date. LassoPred is publicly available through a web interface (https://lassopred.accre.vanderbilt.edu/) and a command-line tool, supporting future structure-function relationship studies and aiding in the discovery of functional lasso peptides for chemical and biomedical applications.

Introduction

Lasso peptides (LaPs) are a class of ribosomally synthesized and post-translationally modified peptides (RiPPs) defined by their threaded structure, formally a [1]rotaxane¹. This structure consists of a macrolactam ring, formed by an isopeptide bond between the N-terminal α-amino group and the side chain carboxyl group of an aspartate or glutamate residue². The ring encircles the C-terminal tail of the peptide, forming a distinct right-handed lariat knot-like structure³, with the residues above and below the plane of the ring referred to as the “upper plug” and “lower plug”, respectively (Fig. S1). The terminology “plug” does not have an implication on its mechanical role in holding the ring in place. These plug residues can come from a wide variety of amino acids—17 out of 20 for the upper plug and 14 out of 20 for the lower plug (Fig. S2). The “plug positions” indicate the two residues between which the ring is located based on the PDB structure, rather than which two are mechanically locking the peptide. Sterically bulky residues, which are responsible for preventing the ring from unthreading, can be located a few amino acids away from the plugs. They impart LaPs with high thermal stability and resistance to protease degradation. LaPs exhibit biological relevance as antibiotics^4,5,6,7, enzyme inhibitors^8,9, and receptor antagonists^10,11. Lasso peptides also demonstrate chemical functions as dynamic covalent materials¹² and thermally-actuated switches^13,14.

Despite sharing the lasso structure, LaPs demonstrate considerable structural variability due to different ring, loop, and tail size as well as highly variable sequences. Our knowledge into the full structural diversity of LaPs is highly limited, hindering the prioritized discovery of functional LaPs for applications. Until March 15^th 2024, only 47 unique LaP structures have been experimentally determined and deposited in the Protein Data Bank (PDB) (Table S1). However, over 4000 putative unique LaP sequences are identified through genome-mining algorithms such as RODEO^15,16,17 (Supplementary Data 1), thereby presenting a huge knowledge gap for three-dimensional (3D) structural prediction and characterization.

Given that LaPs are relatively short and possess irregular scaffolds, AlphaFold2¹⁸ and ESMFold¹⁹ fail to model the lariat knot-like topology of lasso peptides (LaPs) (Fig. S3). AlphaFold3 (AF3)²⁰, however, shows capability in predicting lasso peptide structures, as evidenced by its ability to reproduce the 3D structures of 79% of LaPs in the PDB (e.g., stlassin, Fig. 1A, B and Table S1). This is likely due to AF3’s improved framework, which reduces hallucinations and enhances the accuracy of coordinate generation²⁰. Despite this improvement, AF3 exhibits poor generalizability. Tested with a manually curated dataset of 12 LaPs with known structural annotations reported in the literature but undeposited in the PDB^{21,22,23,24,25,26,27,28} (except for capistruin, Table S2), AF3 failed in 10 out of 12 LaPs (87%, Fig. 1C, D, and Table S2). Of the remaining two, AF3 correctly predicted the lariat-knot topology of capistruin and mycetohabin-15, although the plug annotation for the latter is incorrect (Table S2, Fig. 1C, D). For a more challenging dataset, we curated 40 randomly selected RODEO-mined LaP sequences with unknown annotations and structures. In this test, AF3 predicted only three sequences to adopt a lariat-knot topology (success rate: 8%, Fig. 1E, F, and Table S3), and their annotation accuracy remains experimentally undetermined. In most cases, AF3 incorrectly predicts LaPs to adopt linear (^LaPTest1), cyclic (^LaPTest2), branched cyclic (^LaPTest3), and helical (^LaPTest4- ^LaPTest5) structures (Fig. 1E and Table S4). The low success rate of AF3 in predicting lasso-fold proteins not present in the PDB highlights its difficulty with extrapolation, similar to challenges observed when predicting fold-switched protein conformations²⁹, posing a major roadblock to the community^30,31,32.

**Fig. 1: AF3 performance in predicting LaP structures.**

Here we developed LassoPred for lasso peptide structure prediction. LassoPred involves an annotator-constructor architecture, where the annotator predicts up to three sets of sequence annotations, and the constructor converts each predicted sequence annotation to a 3D lariat-like structure. LassoPred’s generalizability was evaluated using a “blind test” consisting of 12 LaPs (Fig. 1C, D). These sequences have less than 60% sequence identity or similarity compared to any of the LaP sequences used in the training and test sets. Using LassoPred, we built 4749 distinct LaP structures that were predicted using genome mining analyses by RODEO curated on March 15^th, 2024 (Supplementary Data 1). To allow public access to the database and the prediction tool, we set up a web interface (https://lassopred.accre.vanderbilt.edu/). Besides advancing knowledge of lasso peptide structural diversity, LassoPred will facilitate NMR solution structural determination and the discovery of functional lasso peptides for therapeutic and industrial applications, aiding in the development of treatments for infectious diseases and enzyme inhibitors.

Results

LassoPred is designed to translate lasso peptide sequences into 3D structures. It comprises two modules: an annotator that predicts three distinct sets of sequence region annotations (i.e., length of ring, loop, and tail) for an input sequence, and a constructor that builds 3D LaP structures based on predicted sequence annotations (Fig. 2). With an input sequence, the annotator first decomposes the sequence into overlapping dipeptide fragments, then leverages two support vector classifiers: an isopeptide classifier and a plug classifier, to predict the dipeptide fragments that contain the isopeptide-donating residue and plug residues, respectively, and eventually generate three sets of possible sequence annotations that share the same ring length but various loop/tail length. Each set of sequence annotation will convert to a 3D LaP structure using the constructor, which consists of upgraded modules from LassoHTP³³ to build the lariat-like LaP scaffold, generate LaP mutants, and optimize LaP structures based on molecular mechanics (see Methods).

**Fig. 2: The design architecture of LassoPred.**

Development of LassoPred’s Annotator

The primary challenge in building the LassoPred annotator was the small dataset, as there are only 47 lasso peptide structures available in the PDB (Table S5). This small dataset poses a significant problem for the generalizability of the model due to the potential bias in test set performance, and the risk of overfitting. To systematically address these issues, we proposed three strategies. First, we employed data augmentation through peptide fragmentation, enriching the dataset by breaking down each LaP sequence into smaller peptide fragments for model training. Both the isopeptide and plug classifiers were trained separately to assign each fragment to its correct sequence region and subsequently reconstruct the isopeptide and plug positions from the predicted fragments. Second, we used transfer learning by incorporating pre-trained embeddings from protein language model (ESM2 and AF2^18,19), which introduced evolutionary information into the model features. Third, to minimize performance bias and overfitting, we used repeated holdout validation to evaluate all performance metrics³⁴. This involves conducting 100 training/test set splits at a 4:1 ratio with stratified sampling to ensure that the test set maintains a consistent distribution of ring and loop lengths with the overall dataset (Supplementary Data 2). Machine learning (ML) algorithms and their hyperparameters were selected based on the average model performance across these splits. These methods are designed to enhance the robustness of the model and reduce the impact of the small dataset on generalizability. Below, we will discuss the implementation and benchmarking of these strategies to develop the isopeptide and plug classifiers underlying the LassoPred annotator.

To develop the isopeptide classifier, we first converted each sequence of N amino acids into N overlapping peptide fragments, resulting in a total of 905 fragments. We then labeled each dipeptide fragment as ‘0’, ‘1’, or ‘2’, where ‘1’ indicates residues within the ring, ‘0’ marks the ring-loop boundary (i.e., the position of the isopeptide-donating residue), and ‘2’ corresponds to residues in the loop or tail. Since the isopeptide-donating residue must be Glu or Asp that is located at the 7^th, 8^th, or 9^th positions (prior knowledge derived from all known lasso peptide structures), we leveraged only the peptide fragments containing residues between the 6^th and 10^th position of each LaP sequence for training the isopeptide classifier (Fig. 3A). Using dipeptide fragments and ESM2 L33 embedding, the isopeptide classifier achieves strong predictive performance in classifying the peptide fragments into one of the three categories, featuring a ROC AUC of 0.97 ± 0.03, a fragment classification accuracy of 0.91 ± 0.06, and a fragment classification F1 score of 0.90 ± 0.08 across 100 splits. The reconstruction process determines the isopeptide position by identifying the dipeptide fragment with the highest likelihood of being labeled as ‘0’ (the ring-loop boundary). As a result, the isopeptide classifier achieves a nearly perfect accuracy for identifying the isopeptide-donating residue (accuracy: 1.00 ± 0.02), with only 1 out of 100 splits failing to reach 100% accuracy. The high accuracy of the isopeptide classifier is expected because the isopeptide position can be identified merely by finding Glu or Asp located at the 7^th, 8^th, or 9^th positions in 42 out of the 47 LaP sequences curated from PDB, giving a baseline accuracy of 0.89.

**Fig. 3: Performance of LassoPred’s annotator.**

Developing the plug classifier is much more challenging due to the wider variety of amino acid types that can act as the plug (17 out of 20 for the upper plug and 14 out of 20 for the lower plug, Fig. S2) and the broader distribution of loop length (3-20 amino acids), as observed from the 47 LaP structures in the PDB. Similar to the isopeptide classifier, we labeled each peptide fragment as ‘0’, ‘1’, or ‘2’, where ‘1’ indicates fragments within the loop, ‘0’ marks the loop-tail boundary (considering both upper and lower plug residues), and ‘2’ corresponds to residues within the tail (Fig. 3A). Given the complexity of developing the plug classifier, we began by benchmarking several fragmentation strategies: dipeptide, tripeptide, and tetrapeptide fragmentations, as well as dipeptide and tripeptide fragmentations with a truncated ring sequence. Each set of the benchmarking employs ESM embedding and a grid search over a combination of machine learning algorithms and hyperparameters (Table S6), with the model performance evaluated across 100 splits (Table S7). Each of these fragmentation strategies corresponds to a specific approach for reconstructing the plug positions from the predicted fragments (detailed in the Note S1 and Tables S8 and S9). Eventually, ring-truncated dipeptide fragmentation demonstrates the best predictive performance in identifying the correct plug position from one prediction (top 1 accuracy: 0.60 ± 0.15) or three predictions (top 3 accuracy: 0.85 ± 0.10, Table S7). Although using longer fragments, such as tetrapeptides, improves the accuracy of categorizing fragments into the three classes (Class 0, 1, 2), featured by a ROC AUC score of 0.96 ± 0.02 compared to 0.91 ± 0.04 for ring-truncated dipeptides fragments (Table S7), the performance in reconstructing the plug position is weaker (top 3 accuracy: 0.79 ± 0.12 vs. 0.85 ± 0.10, Table S7). This is likely due to the information loss when attempting to reconstruct the plug position from multiple fragments that are predicted with a label of ‘0’. Additionally, compared to dipeptide fragmentation, ring-truncated dipeptide fragmentation shows better plug prediction accuracy (top 3 accuracy: 0.85 ± 0.10 vs. 0.80 ± 0.10, Table S7) because it removes the ring sequence predicted by the isopeptide classifier and focuses the model on the loop and tail regions. This minimizes noise caused by irrelevant sequence information.

With the dipeptide fragmentation strategy, we tested eight feature engineering approaches, based on embeddings from protein language models ESM2, AF2, and SaProt (a new model that combines residue sequences with 3D structural information through a structure-aware vocabulary³⁵). For each test, we applied four different ML models, including Random Forest Classifier (RFC), K-Neighbors Classifier (KNC), Gradient Boosting Classifier (GBC), and Support Vector Classifier (SVC), which are identical to those used in the fragmentation test. The hyperparameters for each model were optimized through grid search (Table S6), and the model that achieves the highest plug annotation accuracy (i.e., top 1-3 accuracy) is used in the performance statistics (Fig. 3B). Based on repeated holdout testing across 100 splits, the 33^rd layer of the “ESM2_t33_650M” model (ESM2 L33) shows the best performance, achieving an overall ROC AUC of 0.91 ± 0.04, a Top 3 accuracy of 0.85 ± 0.10, and Top 3 F1 score of 0.91 ± 0.06 (Table S10).

To minimize bias, we selected the ML algorithm and hyperparameters for the plug classifier that showed the minimal deviation from the mean performance across all metrics from the 100 splits modeled using ring-truncated dipeptide fragmentation and ESM2 L33 embedding. Specifically, this refers to the split with the minimum absolute error from the mean across various performance metrics, including fragment classification accuracy, fragment classification F1 score, one-vs-rest weighted ROC AUC, ROC AUC for Class 0, Class 1, and Class 2, as well as top 1, top 2, and top 3 accuracy (Table S10). On this split, the support vector classifier model, whose hyperparameters are optimized with grid search, demonstrates the best predictive accuracy (Tables S11, 12), largely owing to its margin-based approach and adaptability to high-dimensional embeddings. This selected split’s test set exhibits a similar distribution to the overall dataset in terms of loop length (test set: 5.8 ± 2.4 aa vs. overall: 6.4 ± 3.5 aa, Fig. 3D) and total length (18.5 ± 2.8 aa vs. 19.3 ± 3.9 aa, Fig. 3C).

Based on this test set (10 LaPs, Table S11), the plug classifier demonstrates strong performance in classifying the dipeptide fragments into the correct sequence region (i.e., Class 0, 1, and 2). This is reflected in its class 0 ROC-AUC score of 0.87 (Fig. 3F), a one-vs-rest weighted ROC-AUC score of 0.90 (Table S13), fragment classification accuracy of 0.81 (Table S13), and fragment classification F1 score of 0.81 (Table S13). In annotating the correct plug position, the plug classifier achieves accuracies of 0.60, 0.80, and 0.90 for the top 1, 2, and 3 predictions, respectively (Table S13). In the test set, the only LaP that is missed in the top 3 predictions is benenodin-1, which exists in two isomeric states (PDB IDs: 5TJ1 and 6B5W), with one state placed in the training set and the other in the test set. On the other hand, the isopeptide classifier exhibits perfect accuracy in identifying the isopeptide-donating residue (accuracy: 1.00) and strong performance in classifying dipeptide fragments into the three regions, with a class 0 ROC-AUC score of 1.00, a one-vs-rest weighted ROC-AUC score of 0.98, and a fragment classification accuracy and F1 score of 0.90 (Fig. 3E, Table S13). Using the top predicted ___location for the isopeptide-donating residue (the isopeptide classifier) and the top three predicted locations for the upper plug (the plug classifier), the LassoPred annotator generates up to three sets of sequence annotations per input. For LaPs with shorter sequences (e.g., those under 15 amino acids), the annotator may produce only one or two sets of sequence annotations.

To assess the model’s stability, we performed a data splitting test using ring-truncated dipeptide fragmentation, ESM2 L33 embedding, and the SVC model (Fig. 3G and Table S14). Under each training/test splitting ratio, the performance metrics were evaluated using repeated holdout validation across 100 splits. As the training set size decreased from 83% (39 LaPs) to 74% (35 LaPs), and further to 49% (23 LaPs), the model’s Top 3, 2, and 1 accuracy remained stable, ranging from 0.86 ± 0.11 to 0.83 ± 0.08 for Top 3 accuracy, from 0.72 ± 0.08 to 0.76 ± 0.14 for Top 2 accuracy and from 0.58 ± 0.15 to 0.57 ± 0.09 for Top 1 accuracy (Table S14). This shows that even with a reduced percentage training data, the model’s predictive performance remains stable.

Notably, the current dataset includes sequences that differ by only a few mutations, reflecting the limited sequence variation of lasso peptides. These similar sequences were included to ensure consistent plug position predictions despite mutations, though this could raise a concern about potential data leakage. To address this concern, we tested the model on a cleaned dataset of 36 LaPs with <80% sequence similarity, determined by multiple sequence alignment (calculated as matched residues in global alignments. Using ring-truncated dipeptide fragmentation, ESM2 L33 embedding, the SVC model, and a 4:1 training/test split (29 in the training and 7 in the test), the model’s performance remains similar for the top 3 prediction accuracy (original: 0.85 ± 0.10 vs. clean: 0.86 ± 0.13) and the one-versus-rest-weighted ROC AUC (original: 0.92 ± 0.04 vs. clean: 0.88 ± 0.05), but drops for the Top 1 (original: 0.63 ± 0.13 vs. clean: 0.49 ± 0.18) and Top 2 accuracy (original: 0.75 ± 0.13 vs. clean: 0.69 ± 0.19, Fig. 3H and Table S15). This shows that the presence of a mutant sequence in the dataset does not affect the model’s ability to achieve at least one correct annotation in the top 3 predictions.

Finally, to rigorously assess the generalizability of the annotator, we conducted a “blind test” using a curated set of 12 distinct LaPs from the literature^{5,21,22,23,24,36} (Table 1). Their sequence annotations have been confirmed by solution NMR spectroscopy experiments or biochemical assays, but they are not deposited in the PDB except for capistruin, which has a co-crystalized structure with RNA polymerase. This blind test is particularly challenging, as all LaP sequences exhibit low similarity ( ≤ 0.65) and identity ( ≤ 0.46) with the dataset of 47 LaPs curated from PDB. In this test, LassoPred’s annotator achieves top 1, 2, and 3 prediction accuracies of 0.58, 0.67, and 0.92, respectively, consistent with the performance observed on the hold-out test set (Table 1). The isopeptide classifier predicts all isopeptide-donating residues with 100% accuracy, so the annotator’s prediction accuracy depends solely on the performance of plug classifier. One notable example is caulonodin VI, the longest LaP in the test set with 19 amino acids residues and multiple bulky residues in the loop/tail region (e.g., Lys, Arg, Gln, and Tyr) that could potentially serve as plugs. Despite the complexity, LassoPred accurately identifies the correct plug position on its first guess. Additionally, the annotator gives an identical set of predicted annotations for RES-701-1 vs. RES-701-3, indicating the insensitivity of prediction results to point mutations. The predictive performance of LassoPred in the “blind test” highlights the robustness and generalizability of the model beyond the training data and the holdout test, supporting its ability to deliver reliable sequence annotations, which are crucial for the constructor to build accurate 3D structures.

Table 1 Performance of annotation prediction on the blind set

Full size table

Development of LassoPred’s constructor

Using predicted sequence annotations (ring, loop, and tail lengths) as input, the constructor first builds an all-glycine LaP scaffold matching the given annotations, mutates side chains to match the input sequence, and then optimizes the structure with molecular mechanics force field in AMBER. Users can optionally include MD sampling to generate a conformational ensemble, allowing for clustering analysis. To develop LassoPred’s constructor, we substantially innovated several core modules of LassoHTP. First, we expanded the scaffold library to accommodate a wider range of loop lengths, increasing from the original 3–20 to 3–50 amino acids residues (Fig. S4), enabling the construction of more diverse LaP structures. Second, we replaced the scaffold creation engine to accelerate the molecular model building, reducing construction time from approximately 2 hours using steered MD in LassoHTP to less than 10 minutes with SWISS-MODEL³⁷. Third, we implemented critical functionalities that do not exist in LassoHTP, including left-handed lasso peptide construction, PyMOL-interfaced mutant generation, and conformational clustering of MD trajectories (described in the Methods section). Finally, to enhance the robustness of the constructor, the force field file of the isopeptide bond, including atomic labels and parameters, was formatted to ensure full compatibility with the canonical amino acids in the scaffold.

By integrating the annotator and constructor, LassoPred works as an automatic pipeline in Python or Shell. For an input lasso peptide sequence, the script first checks if the sequence is eligible for prediction by verifying three conditions: whether a potential isopeptide-donating residue (Asp or Glu) exists at the 7^th, 8^th, or 9^th position of the sequence, whether the sequence consists of recognizable amino acid letters, and whether the length is appropriate (i.e., total length ≥ 12 aa, and the distance from the last amino acid to the first estimated isopeptide is at least 5 aa, given that the minimum loop length and tail length are 3 and 2, respectively). If all these conditions are met, LassoPred will leverage the annotator to predict up to three sets of annotations and the constructor to predict a 3D LaP structure for each set of annotation. The total duration for sequence annotation, structural construction, and structural optimization typically takes less than 5 minutes on EVGA GeForce RTX 2080, allowing an efficient sequence-to-structure conversion for lasso peptide design and engineering. MD sampling is optional for users and can be customized according to local computing resources and specific needs.

Assessment of LassoPred for structural determination

To assess LassoPred’s accuracy for structural determination, we assessed how closely LassoPred’s constructed 3D structures matches their respective PDB structure in the test set of selected split (Table S11). LassoPred identifies the correct annotation among its top 3 predictions in 9 out of 10 LaPs (Table S16), failing to predict the isomeric state 1 of benenodin-1 (PDB ID: 5TJ1), which converts to the isomeric state 2 (PDB ID: 6B5W) at various temperatures¹³. The 10 LaPs in the test set underwent structural construction and optimization using LassoPred, with 9 being correctly annotated and ^testV7 (benenodin-1) having the top 1 predicted annotation. Referencing to the PDB structure, 8 optimized LaP structures involve an C_α RMSD value lower than 4.0 Å—a numerical cutoff to determine quality of predicted structures³⁸. This 4.0 Å threshold is further justified by a 30 ns classical MD simulation of the linear-core peptide of microcin J25 (random coil state), during which none of the sampled snapshots fall within 4 Å RMSD compared to the PDB reference (Fig. S5C).

The average RMSD of the 9 optimized LaP structures shown in Fig. 4A, all of which have correct annotations, is 3.2 ± 1.0 Å for all Cα atoms and 1.0 ± 0.6 Å for the Cα atoms of local interlocked structural moiety that consists of the isopeptide-donating residue, plug residues, along with their adjacent residues (Table S17). ^testV7 was excluded from this calculation because it involves incorrect annotations in all three prediction trials. However, even by including predicted structures with an incorrect annotation, the average RMSD is still 3.2 ± 0.9 Å for all Cα atoms and 1.1 ± 0.6 Å for local, interlock-defining Cα atoms (Table S17). This is because LassoPred tends to select residues near the correct plug.

**Fig. 4: Assessment of LassoPred’s constructor.**

Compared to the ring region, the loop and tail region contribute more to the overall conformational uncertainty (1.2 Å for ring, 1.4 Å for loop, and 1.3 Å for tail, Table S18). This is consistent with the observation that LaPs with a shorter tail give a smaller RMSD value (e.g., RMSD for ^testV1: 2.6 Å, ^testV5: 1.2 Å, and ^testV10: 3.1 Å). In contrast, ^testV3 (4.8 Å) and ^testV4 (4.4 Å) perform poorly due to their long and linear tails. With a 30 ns MD sampling, the linear tail artifact can be partially removed and will not cause the thread to move or unthread. For example, the RMSD value for xanthomonins I (^testV4) drops to 3.1 Å based on the representative structure from conformational clustering (from 4.4 Å in the optimized structure) (Fig. 4A). Besides the test set, we applied LassoPred to build 3D structures for 5 uncharacterized LaPs shown in Fig. 1E. LassoPred folds all LaPs into a lariat-like structure and the C-terminus threading into the ring (only the top predicted structure is shown, Fig. 4B). Unlike AlphaFold3, which does not have the functionality to form the isopeptide bond, LassoPred ensures that all ^isoC—^isoN distances are within the covalent bond regime (<1.4 Å).

We further assessed the constructor’s performance on 47 known LaP PDB structures, assuming correct sequence annotations for each. The results indicate that LassoPred-generated structures involve an average RMSD of 3.4 ± 1.9 Å (Table 2). After MD sampling, the average RMSD of 47 PDB structures drops to 3.0 ± 1.4 Å for the 30 ns trajectory and 3.0 ± 1.3 Å for the 100 ns trajectory (Table 2), with a high global distance test total score (GDT-TS)³⁹ of 0.8. These results suggest that LassoPred is capable of generating accurate 3D structures, even without extensive MD sampling. To further assist users, we provide the option to generate MD sampling scripts for their LaP structures, allowing for additional refinement using local computing resources. These predicted structures can serve as starting points for downstream applications, such as design of molecular switches^13,14,40, docking for enzyme inhibitors^3,41, and extensive sampling of LaP folding landscapes^42,43,44,45.

Table 2 Performance evaluation of LassoPred’s constructor among 47 PDB structures, assuming correct sequence annotation for each

Full size table

Lasso peptide structure prediction and database web app

Leveraging LassoPred, we predicted 3D structures for lasso peptides with an undetermined structure. Using the 4749 unique LaP sequences previously identified by RODEO genome mining^15,16 (Supplementary Data 1), we applied LassoPred to create an optimized structure for each predicted sequence annotation, yielding 13,866 LaP structures. Compared to the 47 existing PDB structures, which only include LaPs from three phyla (pseudomonadota: 59.6%, actinomycetota: 38.3%, and bacillota: 2.1%, Fig. 5A), the newly predicted structures from LassoPred increase phylogenetic diversity, spanning 21 phyla (Fig. 5B). These include 38.8% from pseudomonadota, 32.6% from actinomycetota, 15.7% from bacillota, and smaller proportions (each below 10%) from cyanobacteriota, bacteroidota, euryarchaeota, and others (Fig. 5C). Furthermore, the sequence length of these predicted structures ranges from 12 to 160 aa residues, significantly broader than the 14–33 aa residue range in existing PDB structures, representing a 7.8-fold increase in range expansion (Fig. 5D). The loop and tail length also expands to a range of 3–50 and 2–51 aa residues, respectively (Fig. 5D). To highlight the sharp difference in structural scope, Fig. 5E displays relaxed LaP structures with the long tail length (i.e., 51 aa, LP_QOR62253_3) and maximum loop length (i.e., 50 aa, LP_EDM37169_1). All these structures represent the most populated conformational cluster from a 30 ns MD production (Figs. 4F, 5E). We observed helical secondary structures in the notably extended regions of the structures, specifically in the tail region of the max-tail structure (positions 60-66, stabilized by ^Gly63N---^Thr66O hydrogen bond at 3.0 Å) and in the loop region of the max-loop structure (positions 45-48, stabilized by ^Arg45N---^Leu48O hydrogen bond at 3.5 Å). These more globular domains have not been observed in existing lasso peptide structures, raising the interesting possibility that the ring and tail of lasso peptides are used as stabilization motifs for larger globular protein domains.

**Fig. 5: Database and prediction tools for lasso peptide structures.**

To allow public access, we developed a web interface for LassoPred, including a database and a prediction tool (Fig. 5G). The database, containing 13,866 optimized structures from 4,749 LaP sequences, enables users to conduct comprehensive searches based on major characteristics such as phylum, precursor, leader, and core sequences. For each entry, a prediction summary provides up to top three rankings of ring and loop length pairs, along with the optimized structure, any available relaxed structures, and MD simulation files for each rank. The prediction tool allows users to submit tasks for predicting 3D lasso peptide structures from an input sequence, download results from our server, and receive updates via email. The result files include content similar to each database entry. Although no production MD simulations will be run on our server, the users will receive essential input files to initiate a MD simulation. The whole process is expected to take less than 10 minutes. In summary, LassoPred provides an accessible and comprehensive lasso peptide structure prediction tool and database, assisting in the discovery of functional lasso peptides.

Discussion

LassoPred and its associated database have a potential to advance fundamental knowledge about lasso peptides, accelerate the discovery of new functional peptides, and inform the design of new tools to aid in lasso peptide design and engineering, demonstrating a huge potential for shifting the paradigm of lasso peptide research. The database expands the number of LaP structures from 47 to 4749. We acknowledge that the training dataset comprised only 47 lasso peptides, reflecting the relative scarcity of experimentally characterized LaP structures in the field. To date the majority of lasso peptide structures have been solved using solution NMR techniques with a smaller number of structures determined by X-ray crystallography. Regardless of the technique, a campaign to determine a lasso peptide structure takes a minimum of several months of experimental work (sometimes extending to years). Moreover, success is not guaranteed; we have published our unsuccessful attempts at solving the structures of cellulonodin-2²⁶ and fuscanodin²⁵, also known as fusilassin⁴⁶. We have attempted experimental validation of LassoPred by trying to solve the NMR structure of a novel lasso peptide that harbors antimicrobial activity. Despite the fact that this peptide expressed well in a heterologous host, the 2D NMR experiments did not yield data of a quality sufficient to determine the structure after varying solvents, the temperature of acquisition, and NOE mixing times.

These challenges inherent in lasso peptide structure determination highlight a critical gap that LassoPred fills by providing large-scale in silico structural predictions beyond the small set of known structures. These predicted structures can be used in docking simulations or other in silico drug discovery efforts. These predictions provide basis to elucidate the sequence-structure-function relationship underlying the extraordinary thermostability of LaPs and stability in solvents other than water^14,47, inspiring the development of rational engineering strategies to tune lasso peptide properties⁴⁸. The construction of lasso peptides with a non-native left-handed wrapping fold, as enabled by LassoPred, facilitates researchers to investigate the folding landscape of lasso peptides^43,45 and the origin of lasso peptide wrapping handedness, thereby enhancing the fundamental knowledge about lasso peptides.

LassoPred can aid in prioritizing new LaPs for discovery as antibiotics and self-assembled biomaterials. Notably, LaPs like microcin J25 and capistruin are potent RNA polymerase (RNAP) inhibitors by binding to the RNAP secondary channel, blocking access of nucleotides to the RNAP active site, as well as altering the folding of loops that are essential for RNAP’s catalytic activity^8,49. While sequence patterns from known antimicrobial lasso peptides, such as the “two-tyrosine” motif, have been used alongside bioinformatics to guide the discovery of new lasso peptide antibiotics^50,51, structure-based molecular modeling—docking and free energy perturbation—enables the in silico discovery of lasso peptides targeting other cellular components, such as cell wall synthases, ribosome⁵², membrane transporters⁵³. This could lead to new antimicrobial LaPs for treating life-threatening Gram-negative bacteria. The LassoPred database also allows researchers to locate new types of interlocked rotaxane switches like benenodin-1^13,54, and cysteine-containing lasso peptides as dynamic covalently-bonded, self-assembled biomaterials¹² for medical and industrial uses⁵⁵. Besides boosting the in silico discovery of functional LaPs, the LassoPred database hosts thousands of sequences and structures for building and optimizing predictive machine learning model, such as DeepLasso⁵⁶. These data can also facilitate the construction of generative models to achieve de novo design of lasso peptides.

In closing, we would like to discuss the technical limitations of LassoPred and how these issues can be potentially mitigated through joint computational and experimental efforts. First, one limiting factor of LassoPred’s performance lies in its moderate top 1 accuracy (i.e., ~60%). Although this accuracy value is significantly higher than the baseline probability of randomly guessing the correct plug position in an average-length lasso peptide ( ~ 11%, estimated using microcin J25, see Note S4), there is still room for improvement in the model’s predictive performance.

Second, LassoPred is unable to accurately predict the temperature-dependent behavior of lasso peptides, a common limitation among current protein structure prediction tools. For example, in the case of benenodin-1, which adopts multiple isomeric states at higher temperatures, LassoPred can identify isomeric state 2 (PDB 6B5W, loop/tail length: 7/4) and a minor conformational state (loop/tail length: 5/6) within its top 3 predictions⁵⁷. However, it is unable to predict changes in the population of these states at different temperatures. A viable solution to this issue is to employ multiscale molecular simulations, integrating quantum chemistry-based energy modeling with conformational entropy calculations⁴².

Third, LassoPred may generate lariat knot-like folds for any input lasso-like sequence, potentially leading to false positives. To prevent this, we have used valid LaP sequences from RODEO that assesses the likelihood of a sequence to be a true lasso peptide by considering its neighboring sequences of leader peptide, leader peptide binding proteins, transporters, leader peptide hydrolases, lasso peptide cyclases, and isopeptide hydrolases. In addition, as in vitro lasso peptide construction^17,25 and cyclase engineering technologies⁵⁸ advance, most components of lasso peptides will become mutable, ultimately achieving the long-term goal of converting any arbitrary lasso-like sequence into a lasso peptide.

Fourth, the current version of LassoPred does not consider a diverse range of structural scaffolds caused by disulfide bond formation. Among the 4749 LaPs, we observed 380 LaPs containing 2 Cys, 44 containing 4 Cys, and 9 containing 6 Cys. Though occupying less than 10%, the disulfide bond-containing LaPs typically involve a high thermostability and potential to form mechanically interlocked structures and materials. The 7 LaPs with 6 Cys will also extend to a new class of LaP beyond the known types (up to 4 Cys in existing LaP structures, Table S22). Based on the structures of these 7 LaPs built by LassoPred, we will develop new LassoPred functions to construct all possible disulfide bond-containing LaPs (see Fig. S6 as an example). These structures will lead to new hypotheses for experimentally characterizing and understanding new classes of LaPs. Specifically, whether these disulfide linkages originate from intrinsic conformational distribution, or from its interactions with enzymes during its biosynthesis.

Last but not least, new modules should be developed to enhance LassoPred’s discovery capabilities, including a docking module to inform how LaPs interact with cellular protein targets for drug discovery, an enhanced conformational sampling and Markov State Model analysis module to elucidate the key conformational populations underlying LaPs’ functions, artificial intelligence-based scoring functions to predict the impact of mutation on LaPs’ physical and pharmaceutical functions, and the option to incorporate 3D visualization of lasso peptide structures on the website.

Methods

Data curation for the training and test dataset

We collected all known structures of lasso peptides from the Protein Data Bank (PDB)⁵⁹ (accessed on 04-01-2024), compiling a dataset of 50 lasso peptide structures (see Table S5). To prevent data leakage, we removed entries with identical sequences and annotations, resulting in 47 sequences. We manually annotated the isopeptide-donating residue, upper plug, lower plug, ring length, loop length, and tail length on each lasso peptide structure (see definition in Fig. S1). The resulting dataset comprises sequences and annotations for 47 lasso peptides (Table S5).

LassoPred’s annotator to annotate sequence regions

LassoPred’s annotator employs two machine learning classifiers, isopeptide classifier and plug classifier, to pinpoint the ___location of the isopeptide (N_iso) and upper plug residue (N_up), respectively, thereby deriving the length of the ring (N_iso), loop (N_up– N_iso), and tail (N–N_iso) for a LaP sequence with N amino acids.

Sequence fragmentation

Both sub-classifiers were trained and tested using data generated by splitting LaP sequences into consecutive dipeptide fragments (see benchmarks on fragmentation strategies, Tables S7–S9), where each fragment overlaps with its neighboring fragments by two amino acids. To ensure that the C-terminus residue can be labeled as the “tail” fragment, one dummy amino acids (denoted as “B”) are appended. As such, each LaP sequence generates N fragments, leading to a total of 905 dipeptide fragments from 47 PDB-curated sequences. Each fragment is labeled separately for both sub-classifiers. For the isopeptide classifier, a dipeptide is labeled as 1 if all two residues reside in the ring, 0 if they reside at the boundary between the ring and loop (i.e., contains the isopeptide-donating residue), and 2 if they reside in the loop or tail. For the plug classifier, a dipeptide is labeled as 1 if all two residues reside in the ring or loop, 0 if they reside at the boundary between the loop and tail (i.e., contains the upper plug and lower plug), and 2 if they reside in the tail. The terminology “plug residue” do not have implication on their mechanistic roles in holding the ring in place. The “plug position”, annotated as “0”, indicates the two residues between which the ring is located based on the PDB structure, rather than which two are mechanically locking the peptide. We applied separate truncation strategies to avoid noise for both sub-classifiers: for the isopeptide classifier, the 6^th–10^th amino acids are retained, while for the plug classifier, the sequence region for the loop and tail is kept. Both classifiers were trained separately to predict the probability of observing the isopeptide bond and upper plug residue within each dipeptide, thereby informing the ___location of the isopeptide (N_iso) and upper plug residue (N_up) for sequence annotation.

Repeated holdout validation

To enhance the model’s robustness and reduce the impact of the small dataset on generalizability, we employed the repeated holdout validation to evaluate the model performance and to select model-building strategies and hyperparameters³⁴. We conducted 100 training/test set splits at a 4:1 ratio with stratified sampling to ensure that the test set maintains a consistent distribution of ring and loop lengths with the overall dataset (Supplementary Data 2). Machine learning algorithms and their hyperparameters were selected based on the split with the minimum absolute error from the mean across various performance metrics, including fragment classification accuracy, fragment classification F1 score, one-vs-rest weighted ROC AUC, ROC AUC for Class 0, Class 1, and Class 2, as well as top 1, top 2, and top 3 accuracy (Table S11). To prevent multivalued mapping, the two isomeric states of benenodin-1 (PDB ID: 5TJ1 and 6B5W)¹³ were placed in separate datasets (training vs. test). With benchmarking on fragmentation (Tables S7–S9) and featurization (Table S10) strategies, the selected data set were determined based on the MAE on both isopeptide and plug predictors (Table S11). The final dataset contains 148 dipeptides for training and 40 for testing in the isopeptide classifier, and 409 dipeptides for training and 103 for testing in the plug classifier.

Classifier training and testing

We trained the model on 37 sequences and tested it on 10 sequences. For featurization, we used the ESM2 language model to represent lasso peptide sequences, using the output from Layer 33 of the “ESM2_L33_650M” model, which produces features of dimension 1280 corresponding to the sequence length. Notably, the same dipeptide fragment from different sequences is assigned with a different feature because ESM2 considers both the overall sequence context and the specific amino acids within that sequence. Additional feature benchmarks are provided in the SI (Table S10). Using ESM2 embedding features and the final selected dataset, we benchmarked various models, including K-Nearest Neighbors⁶⁰, SVC⁶¹, Random Forest⁶², and Gradient Boosting Classifier⁶³, assessing each model by its accuracy, F1 score, and ROC AUC⁶⁴. This was achieved by searching over predefined parameter grids for each model by GridSearchCV from scikit-learn⁶⁵ (Table S6). Support vector classifier was selected for validation on the test set (10 held-out sequences) due to its best-performing ROC AUC for both the isopeptide and upper plug predictors. The optimized hyperparameters include the regularization parameter C (0.1), degree of the polynomial kernel function (3), gamma (‘scale’), kernel type (‘linear’), random state (42).

Sequence region annotation

For each LaP sequence with unknown ring, loop, and tail length, LassoPred’s annotator employs its classifiers to infer the ___location of the isopeptide (N_iso) and upper plug residue (N_up), printing out one N_iso value and three N_up values. LassoPred first employs the isopeptide bond predictor to access the probability of each constituent dipeptide fragment containing the isopeptide-donating residue (labeled as “0”, described in the Sequence Fragmentation section). LassoPred then calculates the probability of dipeptide fragment being labeled as “0”. Other probability inference strategies are benchmarked in the SI (Tables S8 and S9). The presence of “0” for residue N_x and N_x + 1 in a dipeptide fragment indicates a potential isopeptide position at N_x + 1. LassoPred determines the isopeptide position to occur in the highest-ranked “0” dipeptide whose residue N_x + 1 is Asp or Glu, located in the 7^th, 8^th, or 9^th amino acid position. Similar to the approach of determining N_iso, LassoPred employs the upper plug predictor to access the probability of dipeptide fragment containing the upper plug (labeled as “0”), and rank the probability of fragments with “0” labeling. LassoPred predicts three potential upper plug positions (N_up) from top-ranking residue pairs that occur in the sequence range of [N_iso + 3, N-2]. This range is determined based on the empirical observation that existing LaPs involve minimum loop length of 3 and a minimum tail length of 2. Consequently, with one input LaP sequence, LassoPred predicts three possible sets of sequence annotations.

LassoPred’s constructor to build 3D structures

LassoPred’s constructor upgrades LassoHTP³³ by generating structures faster, using a larger scaffold library, enabling construction of left-handed structure, and performing MD clustering (Note S3). The constructor comprises three modules: scaffold construction, mutant generation, and an optimization. These work together to build 3D lasso peptide structures with predicted sequence annotations (ring, length, and loop length). The scaffold construction uses an expanded scaffold library as initial backbone for ring and loop regions, and use PyMOL’s ‘fab’ function to build the tail region⁶⁶. We used SWISS-MODEL³⁷ to generate additional lasso peptide scaffolds based on known right-handed structures, reducing the time for each scaffold construction from 2 hours to 10 minutes compared to LassoHTP. We varied the loop length from 3 to 50 residues for ring sizes of 7, 8, or 9 by inserting alanines. All residues were mutated to glycine to eliminate chirality, and the tail was truncated. Left-handed scaffolds were obtained by mirroring the right-handed ones using PyMOL. The mutant generation module then modified the scaffold residues according to the input sequence using PyMOL. Finally, the optimization module optimizes the structure using AMBER22 pmemd.cuda⁶⁷, providing optional MD simulations and clustering analysis.

Molecular mechanics modeling

We generated force field parameters for isopeptide bonds involving non-standard residues (Glu or Asp) through a two-step restrained electrostatic potential (RESP) charge fitting approach⁶⁸. Using the Antechamber package with the Generalized Amber Force Field (GAFF) parameters⁶⁹, we set the bond, angle, dihedral, and van der Waals parameters for these peptides. We constructed the peptide force field using ff14SB⁷⁰, and solvated the peptide in TIP3P water molecules⁷¹using an octahedral box with a 10.0 Å buffer. Sodium or chloride ions were added to neutralize the system. The MD simulation protocol starts with energy minimization with up to 10,000 cycles, employing a 10.0 Å non-bonded interaction cutoff and applying harmonic positional restraints of 20 kcal/mol·Å² to the C_α atoms. Long-range electrostatics was treated using the Particle Mesh Ewald method. SHAKE algorithm was applied to constrain bonds and angles involving hydrogen atoms^72,73. The system was heated to 300 K over 40 picoseconds using a Langevin thermostat⁷⁴ and a Berendsen barostat⁷⁵, with restraints on the C_α, N, and C atoms; equilibrated for 1 ns with NPT ensemble at 300 K and 1 atm, using restraints on backbone atoms; equilibrated for 1 ns NVT equilibration with no restraints; and eventually underwent a 30 ns NPT production with a 2 fs time step at 300 K and 1 atm. The resulting MD trajectory was clustered to identify 5 representative structures (Note S2). We performed root-mean-square deviation (RMSD), hydrogen bond assessment, and radius of gyration (R_g) analysis using cpptraj⁷⁶. Visualization of the structures was performed using PyMOL⁶⁶.

Genome mining and database construction

The lasso peptide sequences were collected from previous genome mining research^15,16,17. They obtained these sequences either from genomic data analyses using tools such as RODEO and RRE-Finder, or from experimental validations involving evolutionary covariance and biochemical assays by March 15^th, 2024, yielding 1315, 5193, and 7701 LaPs, respectively. Notably, RODEO-derived sequences (n = 1315) are high-confidence as they include complete biosynthetic gene clusters, whereas RRE-Finder predictions (n = 5193) often lack supporting genes, and cell-free derived sequences (n = 7701) may be foldable but non-functional. We cleaned the data by deleting sequences with completely consistent core sequences. We filtered the validated sequences by removing duplicates (reducing the total from 14,209 to 5,686 LaPs), excluding sequences where the 7^th, 8^th, or 9^th amino acids are not capable of forming isopeptide bonds (i.e., not Asp or Glu), sequences with uncertain amino acids such as “X”, sequences too short to meet the minimum loop (3 AAs) and tail length (2 AAs) requirements, and sequences too long to predict accurately (predicted loop length exceeding 57 residues). This process resulted in a curated set of 4749 available sequences. We developed a database for known lasso peptides, to facilitate analysis of lasso peptides structures and mining of antimicrobial peptides. Each sequence characterized by attributes such as Lasso Peptide Family, Name, Query GI, Peptidase, Phylum, Leader Sequence, Core Sequence, and other sequence information like Sequence Length and Calculated Mass. The database’s structure table hosts 13,866 optimized structures with unique core sequences predicted by LassoPred. This table provides up to three predicted structures per sequence, along with their MD simulation files (*.prmtop, *.inpcrd, *.in). The initial structure was optimized with the minimization (min*.pdb). The database is accessible to the public via website interface (https://lassopred.accre.vanderbilt.edu/). We will update any changes to access instructions on this page in the future.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The lasso peptide structures from the dataset are available on GitHub (https://github.com/ChemBioHTP/LassoPred) and can also be downloaded from the Protein Data Bank (PDB) (https://www.rcsb.org/) using the PDB IDs listed in Table S5. The lasso peptide sequence database is available from ref. ¹⁶ and ref. ¹⁵. All predicted LaP structures can be downloaded from LassoPred database: https://lassopred.accre.vanderbilt.edu/. The PDB structures used in this paper, along with their accession codes, are available in the Source Data. Source data are provided with this paper.

Code availability

The LassoPred software, including detailed tutorial, is freely available on GitHub (https://github.com/ChemBioHTP/LassoPred). The source code is also available via Zenodo at https://doi.org/10.5281/zenodo.15368716.

References

Maksimov, M. O., Pan, S. J. & Link, A. J. Lasso peptides: structure, function, biosynthesis, and engineering. Nat. Prod. Rep. 29, 996–1006 (2012).
Article CAS PubMed Google Scholar
Do, T. & Link, A. J. Protein engineering in ribosomally synthesized and post-translationally modified peptides (RiPPs). Biochemistry 62, 201–209 (2022).
Article PubMed Google Scholar
Hegemann, J. D. et al. Lasso peptides: an intriguing class of bacterial natural products. Acc. Chem. Res. 48, 1909–1919 (2015).
Article CAS PubMed Google Scholar
Salomón, R. & Farías, R. N. Microcin 25, a novel antimicrobial peptide produced by escherichia coli. J. Bacteriol. 174, 7428–7435 (1992).
Article PubMed PubMed Central Google Scholar
Constantine, K. L. et al. High-resolution solution structure of siamycin II: novel amphipathic character of a 21-residue peptide that inhibits HIV fusion. J. biomolecular NMR 5, 271–286 (1995).
Article CAS Google Scholar
Zhao, N. et al. Lasso peptide, a highly stable structure and designable multifunctional backbone. Amino acids 48, 1347–1356 (2016).
Article CAS PubMed Google Scholar
Martin-Gómez, H. & Tulla-Puche, J. Lasso peptides: chemical approaches and structural elucidation. Org. Biomolecular Chem. 16, 5065–5080 (2018).
Article Google Scholar
Braffman, N. R. et al. Structural mechanism of transcription inhibition by lasso peptides microcin J25 and capistruin. Proc. Natl Acad. Sci. USA 116, 1273–1278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Knappe, T. A. et al. Introducing lasso peptides as molecular scaffolds for drug design: engineering of an integrin antagonist. Angew. Chem.-Int. Ed. 50, 8714 (2011).
Article CAS Google Scholar
Weber, W. et al. Anantin-a peptide antagonist of the atrial natriuretic factor (ANF) I. Producing organism, fermentation, isolation and biological activity. J. antibiotics 44, 164–171 (1991).
Article CAS Google Scholar
Cha, Y. et al. Antagonistic actions of Paucibacter aquatile B51 and its lasso peptide paucinodin toward cyanobacterial bloom‐forming Microcystis aeruginosa PCC7806. J. Phycol. 60, 152–169 (2024).
Article CAS PubMed Google Scholar
Schröder, H. V., Zhang, Y. & Link, A. J. Dynamic covalent self-assembly of mechanically interlocked molecules solely made from peptides. Nat. Chem. 13, 850–857 (2021).
Article PubMed PubMed Central Google Scholar
Zong, C. et al. Lasso peptide benenodin-1 is a thermally actuated [1] rotaxane switch. J. Am. Chem. Soc. 139, 10403–10409 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z., Hajlasz, N. & Kulik, H. J. Computational modeling of conformer stability in benenodin-1, a thermally actuated lasso peptide switch. J. Phys. Chem. B 126, 3398–3406 (2022).
Article CAS PubMed Google Scholar
Kretsch, A. M. et al. Peptidase activation by a leader peptide-bound RiPP recognition element. Biochemistry 62, 956–967 (2023).
Article CAS PubMed Google Scholar
Tietz, J. I. et al. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat. Chem. Biol. 13, 470–478 (2017).
Article CAS PubMed PubMed Central Google Scholar
Si, Y. et al. Cell-free biosynthesis to evaluate lasso peptide formation and enzyme–substrate tolerance. J. Am. Chem. Soc. 143, 5917–5927 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article MathSciNet CAS PubMed Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 1–3 (2024).
Katahira, R. et al. Solution structure of endothelin B receptor selective antagonist RES-701-1 determined by 1H NMR spectroscopy. Bioorg. Medicinal Chem. 3, 1273–1280 (1995).
Article CAS Google Scholar
Shihoya, W. et al. Structure of a lasso peptide bound ETB receptor provides insights into the mechanism of GPCR inverse agonism. bioRxiv, 2023.12.30.573741 (2024).
Knappe, T. A. et al. Isolation and structural characterization of capistruin, a lasso peptide predicted from the genome sequence of Burkholderia thailandensis E264. J. Am. Chem. Soc. 130, 11446–11454 (2008).
Article CAS PubMed Google Scholar
Iwatsuki, M. et al. Lariatins, antimycobacterial peptides produced by rhodococcus sp. K01− B0171, have a lasso structure. J. Am. Chem. Soc. 128, 7486–7491 (2006).
Article CAS PubMed Google Scholar
Koos, J. D. & Link, A. J. Heterologous and in vitro reconstitution of fuscanodin, a lasso peptide from thermobifida fusca. J. Am. Chem. Soc. 141, 928–935 (2018).
Article PubMed PubMed Central Google Scholar
Cao, L. et al. Cellulonodin-2 and Lihuanodin: Lasso peptides with an aspartimide post-translational modification. J. Am. Chem. Soc. 143, 11690–11702 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bratovanov, E. V. et al. Genome mining and heterologous expression reveal two distinct families of lasso peptides highly conserved in endofungal bacteria. ACS Chem. Biol. 15, 1169–1176 (2019).
Article PubMed Google Scholar
Zimmermann, M. et al. Characterization of caulonodin lasso peptides revealed unprecedented N-terminal residues and a precursor motif essential for peptide maturation. Chem. Sci. 5, 4032–4043 (2014).
Article CAS Google Scholar
Chakravarty, D. et al. AlphaFold predictions of fold-switched conformations are driven by structure memorization. Nat. Commun. 15, 7296 (2024).
Article CAS PubMed PubMed Central Google Scholar
Perlinska, A. P., Sikora, M. & Sulkowska, J. I. Everything alphafold tells us about protein knots. J. Mol. Biol. 436, 168715 (2024).
Article CAS PubMed Google Scholar
Rubach, P. et al. Alphaknot 2.0: A web server for the visualization of proteins’ knotting and a database of knotted alphafold-predicted models. Nucleic Acids Res. 52, W187–W193 (2024).
Article PubMed PubMed Central Google Scholar
Brems, M. A. et al. AlphaFold predicts the most complex protein knot and composite protein knots. Protein Sci. 31, e4380 (2022).
Article CAS PubMed PubMed Central Google Scholar
Juarez, R. J. et al. LassoHTP: A high-throughput computational tool for lasso peptide structure construction and modeling. J. Chem. Inf. modeling 63, 522–530 (2023).
Article CAS Google Scholar
Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. Preprint at arXiv preprint arXiv:1811.12808 (2018).
Su, J. et al. Saprot: Protein language modeling with structure-aware vocabulary. bioRxiv, 2023.10. 01.560349 (2023).
Fuwa, H. et al. Heterologous production of new lasso peptide koreensin based on genome mining. J. Antibiotics 74, 42–50 (2021).
Article CAS Google Scholar
Schwede, T. et al. SWISS-MODEL: an automated protein homology-modeling server. Nucleic acids Res. 31, 3381–3385 (2003).
Article CAS PubMed PubMed Central Google Scholar
Källberg, M. et al. Template-based protein structure modeling using the raptorX web server. Nat. Protoc. 7, 1511–1522 (2012).
Article PubMed PubMed Central Google Scholar
Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins: Struct., Funct., Bioinforma. 57, 702–710 (2004).
Article CAS Google Scholar
Allen, C. D. & Link, A. J. Self-assembly of catenanes from lasso peptides. J. Am. Chem. Soc. 138, 14214–14217 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hegemann, J. D. et al. Rational improvement of the affinity and selectivity of integrin binding of grafted lasso peptides. J. Medicinal Chem. 57, 5829–5834 (2014).
Article CAS Google Scholar
da Hora, G. C. et al. Lasso peptides: exploring the folding landscape of nature’s smallest interlocked motifs. J. Am. Chem. Soc. https://doi.org/10.1021/jacs.3c10126 (2024).
da Hora, G. C. et al. One descriptor to fold them all: harnessing intuition and machine learning to identify transferable lasso peptide reaction coordinates. J. Phys. Chem. B. (2024).
Allen, C. D. et al. Thermal unthreading of the lasso peptides astexin-2 and astexin-3. ACS Chem. Biol. 11, 3043–3051 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ferguson, A. L. et al. An experimental and computational investigation of spontaneous lasso formation in microcin J25. Biophysical J. 99, 3056–3065 (2010).
Article CAS Google Scholar
DiCaprio, A. J. et al. Enzymatic reconstitution and biosynthetic investigation of the lasso peptide fusilassin. J. Am. Chem. Soc. 141, 290–297 (2018).
Article PubMed PubMed Central Google Scholar
Hegemann, J. D. Factors governing the thermal stability of lasso peptides. ChemBioChem 21, 7–18 (2020).
Article CAS PubMed Google Scholar
Liu, T. et al. Rational generation of lasso peptides based on biosynthetic gene mutations and site-selective chemical modifications. Chem. Sci. 12, 12353–12364 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pavlova, O. et al. Systematic structure-activity analysis of microcin J25. J. Biol. Chem. 283, 25589–25595 (2008).
Article CAS PubMed PubMed Central Google Scholar
Maksimov, M. O., Pelczer, I. & Link, A. J. Precursor-centric genome-mining approach for lasso peptide discovery. Proc. Natl Acad. Sci. USA 109, 15223–15228 (2012).
Article CAS PubMed PubMed Central Google Scholar
Carson, D. V. et al. Cloacaenodin, an antimicrobial lasso peptide with activity against Enterobacter. ACS Infect. Dis. 9, 111–121 (2022).
Article PubMed PubMed Central Google Scholar
Jangra, M. et al. A broad-spectrum lasso peptide antibiotic targeting the bacterial ribosome. Nature 640, 1022–1030 (2025).
Mathavan, I. et al. Structural basis for hijacking siderophore receptors by antimicrobial lasso peptides. Nat. Chem. Biol. 10, 340–342 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cheung-Lee, W. L. & Link, A. J. Genome mining for lasso peptides: past, present, and future. J. Ind. Microbiol. Biotechnol. 46, 1371–1379 (2019).
Article CAS PubMed Google Scholar
Williams-Noonan, B. J. et al. Self-assembling peptide biomaterials: Insights from spontaneous and enhanced sampling molecular dynamics simulations. Chem. Phys. Rev. 4, https://doi.org/10.1063/5.0142302 (2023).
Thokkadam, A. et al. High-throughput screen reveals the structure–activity relationship of the antimicrobial lasso peptide ubonodin. ACS Cent. Sci. 9, 540–550 (2023).
Article CAS PubMed PubMed Central Google Scholar
Schröder, H. V. et al. The shuttling cascade in lasso peptide benenodin‐1 is controlled by non‐covalent interactions. Chem.–A Eur. J. 28, e202103615 (2022).
Article Google Scholar
Barrett, S. E. et al. Substrate interactions guide cyclase engineering and lasso peptide diversification. Nat. Chem. Biol. https://doi.org/10.1038/s41589-024-01727-w (2024).
Berman, H. M. et al. The protein data bank. Nucleic acids Res. 28, 235–242 (2000).
Article CAS PubMed PubMed Central Google Scholar
Peterson, L. E. K-nearest neighbor. Scholarpedia 4, 1883 (2009).
Article Google Scholar
Hearst, M. A. et al. Support vector machines. IEEE Intell. Syst. their Appl. 13, 18–28 (1998).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Natekin, A. & Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobo. 7, 21 (2013).
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
The PyMOL molecular graphics system, version 3.0 schrödinger, LLC.
Case, D. A. et al. Amber 2022: University of california, San Francisco (2022).
Cornell, W. D. et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117, 5179–5197 (1995).
Article CAS Google Scholar
Wang, J. et al. Development and testing of a general amber force field. J. computational Chem. 25, 1157–1174 (2004).
Article CAS Google Scholar
Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. theory Comput. 11, 3696–3713 (2015).
Article CAS PubMed PubMed Central Google Scholar
Mark, P. & Nilsson, L. Structure and dynamics of the TIP3P, SPC, and SPC/E water models at 298 K. J. Phys. Chem. A 105, 9954–9960 (2001).
Article CAS Google Scholar
Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: An N⋅ log (N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092 (1993).
Article CAS Google Scholar
Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Computational Phys. 23, 327–341 (1977).
Article CAS Google Scholar
Liu, J., Li, D. & Liu, X. A simple and accurate algorithm for path integral molecular dynamics with the Langevin thermostat. J. Chem. Phys., 145, 024103 (2016).
Lin, Y. et al. Application of Berendsen barostat in dissipative particle dynamics for nonequilibrium dynamic simulation. The J.Chem. Phys. 146, 124108 (2017).
Roe, D. R. & Cheatham, T. E. III PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J. Chem. theory Comput. 9, 3084–3095 (2013).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

X.O. and Y.-L.Z. acknowledge the grant of National Key R&D Program of China (2020YFA0907700), and the grants from National Science Foundation of China (31970041). A.J.L. acknowledges NIH grant GM107036. X.O., X.R. and Z.J.Y. were supported by the startup grant from Vanderbilt University, the fellowship of Vanderbilt Institute of Chemical Biology, and the sponsorship from Rosetta Commons Seed Grant Award (awarded to Z.J.Y.). This work leveraged the Research Software Engineer Services provided by the Vanderbilt Advanced Computing Center for Research and Education (ACCRE), a collaboratory operated by and for Vanderbilt faculty.

Author information

Authors and Affiliations

State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, People’s Republic of China
Xingyu Ouyang & Yi-Lei Zhao
Department of Chemistry, Vanderbilt University, Nashville, TN, USA
Xingyu Ouyang, Xinchun Ran, Runeem Al-Abssi & Zhongyue J. Yang
Neo Financial, 200 8 Ave SW #400, Calgary, AB, Canada
Han Xu
Department of Chemical and Biological Engineering, Chemistry, and Molecular Biology, Princeton University, Princeton, NJ, USA
A. James Link
Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
Zhongyue J. Yang
Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN, USA
Zhongyue J. Yang
Data Science Institute, Vanderbilt University, Nashville, TN, USA
Zhongyue J. Yang
Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN, USA
Zhongyue J. Yang

Authors

Xingyu Ouyang
View author publications
Search author on:PubMed Google Scholar
Xinchun Ran
View author publications
Search author on:PubMed Google Scholar
Han Xu
View author publications
Search author on:PubMed Google Scholar
Runeem Al-Abssi
View author publications
Search author on:PubMed Google Scholar
Yi-Lei Zhao
View author publications
Search author on:PubMed Google Scholar
A. James Link
View author publications
Search author on:PubMed Google Scholar
Zhongyue J. Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

X.O. developed the machine learning models, collected the training and test data, built the structure construction pipeline, and wrote the paper. X.R. designed the sequence fragmentation strategy and machine learning models and deployed the website interface. R.A. collected the structural data. H.X. designed the LassoPred website UI/UX. Y.-L.Z. and A.J.L. supervised the study and revised the paper. Z.J.Y. designed and supervised the study and wrote the paper.

Corresponding authors

Correspondence to Yi-Lei Zhao, A. James Link or Zhongyue J. Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ouyang, X., Ran, X., Xu, H. et al. LassoPred: a tool to predict the 3D structure of lasso peptides. Nat Commun 16, 5497 (2025). https://doi.org/10.1038/s41467-025-60544-4

Download citation

Received: 28 March 2025
Accepted: 25 May 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41467-025-60544-4