Active phase discovery in heterogeneous catalysis via topology-guided sampling and machine learning

Zheng, Shisheng; Zhang, Xi-Ming; Liu, Heng-Su; Liang, Ge-Hao; Zhang, Si-Wang; Zhang, Wentao; Wang, Bingxu; Yang, Jingling; Jin, Xian’an; Pan, Feng; Li, Jian-Feng

doi:10.1038/s41467-025-57824-4

Download PDF

Article
Open access
Published: 14 March 2025

Active phase discovery in heterogeneous catalysis via topology-guided sampling and machine learning

Shisheng Zheng¹^na1,
Xi-Ming Zhang¹^na1,
Heng-Su Liu¹,
Ge-Hao Liang¹,
Si-Wang Zhang¹,
Wentao Zhang²,
Bingxu Wang²,
Jingling Yang¹,
Xian’an Jin¹,
Feng Pan ORCID: orcid.org/0000-0002-8216-1339² &
…
Jian-Feng Li ORCID: orcid.org/0000-0003-1598-6856^1,3

Nature Communications volume 16, Article number: 2542 (2025) Cite this article

4868 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Understanding active phases across interfaces, interphases, and even within the bulk under varying external conditions and environmental species is critical for advancing heterogeneous catalysis. Describing these phases through computational models faces the challenges in the generation and calculation of a vast array of atomic configurations. Here, we present a framework for the automatic and efficient exploration of active phases. This approach utilizes a topology-based algorithm leveraging persistent homology to systematically sample configurations across diverse coordination environments and material morphologies. Simultaneously, efficient machine learning force fields enable rapid computations. We demonstrate the effectiveness of this framework in two systems: hydrogen absorption in Pd, where hydrogen penetrates subsurface layers and the bulk, inducing a “hex” reconstruction critical for CO₂ electroreduction, explored through 50,000 sampled configurations; and the oxidation dynamics of Pt clusters, where oxygen incorporation renders the clusters less active during oxygen reduction reactions, investigated through 100,000 sampled configurations. In both cases, the predicted active phases and their impacts on catalytic mechanisms closely align with previous experimental observations, indicating that the proposed strategy can model complex catalytic systems and discovery active phases under specific environmental conditions.

Machine learning-assisted dual-atom sites design with interpretable descriptors unifying electrocatalytic reactions

Article Open access 17 September 2024

Accurate energy barriers for catalytic reaction pathways: an automatic training protocol for machine learning force fields

Article Open access 04 October 2023

Data efficient machine learning potentials for modeling catalytic reactivity via active learning and enhanced sampling

Article Open access 19 December 2024

Introduction

Heterogeneous catalysts are always subjected to specific working conditions, such as temperature, pH, and electrode potential, and interact with environmental species^1,2. This can result in changes ranging from a single atomic layer to several atomic layers, and may even lead to alterations in bulk composition (Fig. 1) e.g., the coverage effect of adsorbates on the surface^3,4,5,6,7, the subsurface incorporation of O, H, and other heteroatoms^8,9,10, and the degradation process^11,12,13. The resulting surface states and overall structures of catalysts under specific conditions are considered the active phases in catalytic reactions^14,15,16. They are typically dynamic and may undergo transitions as conditions change, which are intimately tied to the mechanistic insights and material design within this field¹⁷.

**Fig. 1: The overview of the persistent homology-based sampling algorithm (PH-SA) and overall framework for active phase discovery.**

The determination of active phase under ambient condition is, therefore, the very fundamental issue of heterocatalysis and essentially involves identifying the distribution of environmental species. Experimental methods for investigating the active phase are expensive and intricate, rendering it challenging to cover a broad range of experimental conditions^2,18. To obtain atomic-scale insights into operando active phases, the field frequently relies on computational simulations. A basic approach involves using chemical intuition to hypothesize possible microscopic structures and employing density functional theory (DFT) to perform energetic calculations^19,20, constructing phase diagrams under specific reaction environments. Nevertheless, human intuition cannot ensure adequate exploration of the phase space to identify the most thermodynamically favorable structures. Furthermore, the rapid escalation in the number of configurations to be explored will render DFT calculations computationally prohibitive. Machine learning force fields (MLFF), while maintaining accuracy, are significantly faster than DFT calculations and thus are increasingly becoming the preferred tools for exploring active phases^{2,21,22,23,24,25,26}. However, MLFF requires precise fitting of the potential energy surface, which necessitates exploring enough possible configurations^22,27. In this regard, developing efficient sampling algorithms that do not rely on chemical intuition has become a top priority, which not only facilitates the exploration of more potential configurations but also enables their rapid evaluation via effectively training MLFF.

Global optimization-based approaches such as basin hopping^28,29,30,31, evolutionary algorithms³² and random structure search³³ offer principled approaches to comprehensively navigating the ambiguity of active phase. However, these methods usually rely on skillful parameter adjustments and predefined conditions, and face challenges in exploring the entire configuration space and dealing with amorphous structures. The graph theory-based algorithms^34,35,36,37, which can enumerate configurations for a specific adsorbate coverage on the surface with graph isomorphism algorithms, even on an asymmetric one. Nevertheless, these methods can only study the adsorbate coverage effect on the surface because the graph representation is insensitive to three-dimensional information, making it unable to consider subsurface and bulk structure sampling. Other geometric-based methods^38,39 also have been developed for determining surface adsorption sites but still face difficulties when dealing with non-uniform materials or embedding sites in subsurface.

Topology, independent of metrics or coordinates, presents a novel approach that could potentially offer a comprehensive traversal of structural complexity. Persistent homology, an emerging technique in the field of topological data analysis, bridges the topology and real geometry by capturing geometric structures over various spatial scales through filtration and persistence⁴⁰. Through embedding geometric information into topological invariants, which are the properties of topological spaces that remain unchanged under specific continuous deformations, it allows the monitoring of the “birth,” “death,” and “persistence” of isolated components, loops, and cavities across all geometric scales using topological measurements. Topological persistence is usually represented by persistent barcodes, where different horizontal line segments or bars denote homology generators⁴¹. Persistent homology has been successfully employed to the feature representation for machine learning^42,43, molecular science^44,45, materials science^{46,47,48,49,50,51,52,53,54,55}, and computational biology^56,57. The successful application motivates us to explore its potential as a sampling algorithm due to its capability of characterizing material structures multidimensionally.

In this work, we introduce a topology-based automatic active phase exploration framework, enabling the thorough configuration sampling and efficient computation via MLFF. The core of this framework is a sampling algorithm (PH-SA) in which the persistent homology analysis is leveraged to detect the possible adsorption/embedding sites in space via a bottom-up approach. The PH-SA enables the exploration of interactions between surface, subsurface and even bulk phases with active species, without being limited by morphology and thus can be applied to periodical and amorphous structures. MLFF are then trained through transfer learning to enable rapid structural optimization of sampled configurations. Based on the energetic information, Pourbaix diagram is constructed to describe the response of active phase to external environmental conditions. We validated the effectiveness of the framework with two examples: the formation of Pd hydrides with slab models and the oxidation of Pt clusters in electrochemical conditions. The structure evolution process of these two systems was elucidated by screening 50,000 and 100,000 possible configurations, respectively. The predicted phase diagrams with varying external potentials and their intricate roles in shaping the mechanisms of CO₂ electroreduction and oxygen reduction reaction were discussed, demonstrating close alignment with experimental observations. Our algorithm can be easily applied to other heterogeneous catalytic structures of interest and pave the way for the realization of automatic active phase analysis under realistic conditions.

Results

The overview of PH-SA and the overall framework

The Fig. 1 is the overview of the topology-based automatic active phase exploration framework. In order to determine the active phase of the catalysts, the first step is to identify the sites in the material that can accommodate active species, which can range from the surface to the bulk. We leverage the persistent homology, an advanced algorithm in topology data analysis, to achieve this goal, namely PH-SA algorithm. The technical details of PH-SA are presented in the method section. Here, we briefly introduce its logic (Fig. 1a). Initially, we decompose a material structure into combinations of atom aggregates, consisting of two, three, and more atoms, with the underlying physical implication that active species typically interact with only a few surrounding atoms. The geometric characteristics of the atom aggregate (points cloud) can be capture over various spatial scales through a manipulation, namely filtration. Specifically, each point synchronously expands in the space with the filtration parameters being atomic diameter. The filtration process will generate the persistent barcodes with the evolution of different Betti numbers. For a given structure aggregate, the number of independent components, loops and cavities are topological invariants and they are referred to as β₀, β₁, and β₂, respectively. The “birth,” “death,” and “persistence” of β₀, β₁, and β₂ are recorded in the persistent barcodes. The “death” of the Betti numbers corresponds to a contact mode of spheres centered on points, and this contact point may represent a possible local adsorption or embedding site where active species may interact. This is because its approximately equidistant positioning relative to surrounding atoms effectively minimizes repulsive interactions, facilitating the formation of energetically favorable bonding configurations. The geometric coordinates of the contact point can be readily determined through fitting constrained by the topological information of the death points. By iterating over atom aggregates of varying sizes within the material structure, one can automatically identify plausible sites capable of accommodating active species.

The Fig. 2 illustrates the interaction sites determination of an octahedron, which can represent a typical local unit in the FCC crystal structure. The octahedron is firstly decomposed to the possible atom aggregates with two, three until six atoms. For each aggregate, persistent homology analysis is performed. Taking the six atoms one as an example (Fig. 2a), the behaviors of β₀, β₁ and β₂ are separately plotted with the horizontal axis being the filtration parameter ϵ (atomic diameter) from top to bottom. In the early stage, there are six bars in the β₀ subfigure, indicating the six isolated atoms. When the filtration parameter ϵ reaches ~2.5 Å, each ball, growing from the initially isolated atoms, overlaps with its closest four balls. Consequently, six bars of β₀ turn into one bar in β₂ subfigure, representing the emergence of an independent noncontractible quadrilateral structure (cavity). The contact point of two ball is actually the bridge site between two neighboring atoms and can be readily obtain from the record of barcodes. When the filtration parameter reaches ~3.75 Å, the β₂ bar disappears, leaving only a β₀ bar persistent due to six atoms fully contact and fuse into one large sphere. From this changing moment, the 6-fold site in the center can then be determined. In real material structures, sites are prone to deformation due to dynamic effects. In Fig. 2b, we use a deformed octahedron to illustrate the versatility of the PH-SA method. During the evolution of the barcode, the β₀ disappears successively, which is significantly different from the simultaneous disappearance of the undeformed octahedron. The bridge sites can still be derived from the position where each β₀ disappears, and the disappearance of the β₂ also corresponds to the possible site in the center. Unlike the regular octahedron, the deformed octahedron exhibits β₁ barcodes as the points expand, indicating the presence of loop microstructures at certain stages during the filtration process. Owing to its abstract nature, the filtration method can be applied to atom aggerates with any morphology. The analysis results of the square planar and its distorted derivative, which are simpler with only β₀ and β₁ barcodes but represent a typical local structure on the FCC (100) surface, are shown in Figure S1. Notably, while each Betti number’s death point can correspond to a potential active site, in practice, we focus solely on the last death point (excluding the persistent β₀) for each aggregate to avoid redundant computations. This shift stems from our methodology, which initiates with two atoms and progressively applies persistent homology to larger aggregates. Consequently, the structural information from earlier death points of larger aggregates is generally already captured in smaller aggregates with fewer atoms. Using the octahedral structure as an example (Fig. 2a), the bridge sites corresponding to the β₀ death points were already identified during the two-atom combination analysis. Instead of performing persistent homology directly on the entire structure, we adopt this bottom-up approach to avoid interactions within the whole system that might obscure topological features in the barcodes and the loss of corresponding interaction sites (Figure S2). By doing so, this strategy can achieve a balance between computational efficiency and identifying as many potential active sites as possible.

**Fig. 2: The illustration of persistent homology-based sampling algorithm (PH-SA).**

After the determination of all the adsorption/embedding sites of the material structure with persistent homology, we can then sample the possible configurations of the active phase (Fig. 1b). For the active species embedded in the bulk lattice, their initial positions can be directly placed on the identified interaction sites. For the interaction sites on the surface, the active species will adopt an adsorbed state with their positions determined by the hard sphere model. The close contacts of atoms in this enumeration process corresponding to very high repulsive energies will be rejected based on hard-sphere cut-offs based on their summation of covalent radius (Fig. 2). Graph isomorphism algorithm is applied to eliminate structures with the same graph connectivity to avoid repeated enumeration³⁴. Following the construction of structure ensembles, the MLFF is established to accelerate the structure optimization by transfer learning based on the DPA pre-trained model⁵⁸(Fig. 1c). Lastly, the energy variation trends and phase diagrams connected to external environmental conditions can be readily obtained to analyze the physicochemical properties of the system and facilitate comparisons with experimental results (Fig. 1d). The overall procedure is also summarized by flowchart in Figure S3.

Simulating complex heterogeneous catalytic systems with the framework

To demonstrate the general applicability of the PH-SA method for diverse material morphologies, we selected two of the most commonly used catalyst structural models, slab and cluster, for illustration⁵⁹. The periodic slab model is widely employed in theoretical studies of heterogeneous catalysis to examine the properties of specific exposed surfaces of catalysts^60,61. Its distinguishing feature lies in the fixed lattice parameters, which prevent deformation of the slab in the xy-plane during interactions with environmental species. The cluster model is another commonly used atomic structure, characterized by its amorphous nature^62,63. The interaction of the cluster with environmental species can induce structural deformation, which in turn increases the complexity of subsequent interactions with the surrounding species. Below, we will demonstrate the PH-SA and the overall framework in two specific systems, followed by the detailed discussion of the results.

The PdH_x system with slab model

The formation of palladium hydride is taken as the first example to demonstrate the PH-SA and the overall framework. The research on metal hydrides has attracted widespread attentions due to their potential applications in catalysis, energy storage, superconductivity and so on^64,65. In an electrochemical environment, Pd exhibits a remarkable tendency to intercalate hydrogen, leading to the formation of active PdH_x phase that extends seamlessly from the surface to several atomic layers beneath(Fig. 3a), which can significantly affect the performance of electrocatalytic reactions^{10,66,67,68,69}.

**Fig. 3: The PdH_x system with different H concentrations (x = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1).**

The 4 × 4 Pd (100) surface has been chosen as the model system to investigate the hydrogen phase diagram of Pd (Fig. 3a). Initially, the persistent homology is applied to Pd (100). By using the PH-SA, we identified a total of 1064 possible hydrogen adsorption/intercalation sites. The large number of sites indicates that the configurational space for hydrogen adsorption/insertion is vast. Subsequently, possible adsorption configurations were enumerated across ten gradients with H:Pd ratios of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1, with 5000 configurations for each gradient. We then extracted 50 structures each of the H:Pd ratio from 0.1 to 1, and subjected them to structure relaxation by DFT calculations. These structural data were integrated with the pre-trained DPA model⁵⁸ to construct a machine learning force field for PdH_x systems. The fine-tuned machine learning force field achieved an energy accuracy of 0.001 eV per atom and a force accuracy of 0.045 eV Å⁻¹ (Fig. 3b, c, more details in Methods).

We leveraged the constructed machine learning force field to optimize the structures of all 50,000 configurations and calculated the average hydrogen adsorption energy. Interestingly, the lowest adsorption energy of each concentration gradient does not vary monotonically, exhibiting a trend of first rising (H:Pd = 0.1 − 0.4), then falling(H:Pd = 0.5), and finally rising again(H:Pd = 0.6 − 1) (Figure S4). We further calculated the mixing energy of PdH_x using energy of pure Pd (100) and the most stable configuration of PdH₁ as references. It can be observed that at PdH_0.5, the structure indeed possesses the minimum mixing energy (Fig. 4a). At low hydrogen concentrations, the atoms preferentially occupy the surface sites. As hydrogens begin to infiltrate the bulk, they primarily settle between the second and third Pd layers rather than the sublayer, which can alleviate the atomic repulsion between internal and interfacial hydrogens and foster a smoother integration process of adsorption energy (Fig. 4b). Interestingly, with the increase of hydrogen concentration, the surface of Pd (100) gradually changes from the 4-fold hollow sites arrangement to the mixed arrangement of 3-fold hollow sites, similar to the atomic arrangement of the (111) crystal plane, and 4-fold hollow sites with the coordination number of surface Pd increasing at around H: Pd = 0.5 (Fig. 4c and Figure S5). The slight enhancement in adsorption strength at this juncture may stem from the inherent stability of the (111) crystal plane over the (100) crystal plane, offsetting the instability arising from interatomic repulsion. Nevertheless, as the H concentration further surges, the repulsive forces among hydrogen atoms inevitably undermine the adsorption intensity. When the H: Pd ratio reaches 1, the surface is completely converted into 3-fold hollow sites (Fig. 4c and Figure S5). This phenomenon is widely reported as the “hex” reconstruction in the FCC (100) crystal plane^70,71,72,73, and hydrogen can be a driving force in the electrochemical environment according to our results.

**Fig. 4: Results analysis of the PdH_x system with different H concentrations (x = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1).**

The formation of PdH_x active phases is believed to significantly promote the electroreduction of CO₂ to CO^66,67,68,69. We selected the configurations with the lowest adsorption energy of hydrogen from each gradient to perform free energy calculations, aiming to investigate the phase diagram of the PdH_x system under external pH and potential conditions. At pH = 7.3, Pd tends to accommodate 60% hydrogen at -0.1 V vs RHE, which is consistent with experimental findings in CO₂ electroreduction condition(Fig. 4d and Figure S6)⁶⁹. We have calculated the effects of different hydrogen concentrations on the adsorption energies of *COOH and *CO, observing an overall decrease in the adsorption energies of both *COOH and *CO as the H concentration rises (Fig. 4e). This trend can be attributed to hydrogen drawing away some electrons, thereby diminishing Pd’s capability to donate electrons to carbon-containing species and to bind them strongly. Since Pd is a catalyst that is susceptible to CO poisoning, weakening the adsorption of CO is expected to effectively enhance the reduction performance of CO₂, as evidenced by the free energy diagram calculations (Fig. 4f), exemplified by the case of PdH_0.6.

The PtO_x system with cluster model

Platinum nanoparticles (Pt NPs) hold paramount significance across a broad spectrum of electrocatalytic processes, encompassing the oxidation of small molecules⁷⁴, plasma-assisted catalysis⁷⁵, the oxygen evolution reaction⁷⁶, and the oxygen reduction reaction^77,78. In electrochemical environment, the Pt NPs are performed under oxygen-rich conditions, leading to the platinum surface and interior can be oxidized, which can significantly impact catalytic activity under ambient conditions⁷⁹. The evolution mechanism of Pt NPs oxidation process entails sufficient sampling and analysis of Pt-O active phases.

The icosahedral Pt₅₅ cluster with diameter at ~1 nm has been chosen as the model system to investigate the oxidization of Pt NPs (Fig. 5a). At begins, the persistent homology is applied to Pt₅₅. From the evolution of barcodes, we identified a total of 392 possible oxygen adsorption/intercalation sites. Given the amorphous nature of Pt NPs, we enumerated 10,000 possible PtO_x (x = 0.1 − 1 with the increasement of 0.1) configurations per gradient level, evaluated their energies via MLFF calculations, and selected the lowest 5% as seeds for subsequent gradient iterations. Persistent homology was used to pinpoint updated interaction sites in this process. This operation effectively accounts for the deformation of Pt NPs induced by O adsorption/intercalation, thereby providing a more realistic representation of the gradual oxidation process of Pt NPs under electrochemical conditions. It can be observed that as oxygens are inserted into the Pt cluster, the resulting PtO_x obviously generates more potential interaction sites (Fig. 5b) to accommodate oxygen atoms due to the significant change and expansion in the shape of the cluster, as we will discuss below. The utilization of persistent homology to determine the updated interaction sites further enhances the accuracy and robustness of the simulations, as it allows for a dynamic tracking of structural changes and interactions during the oxidation process. Ab-initio molecular dynamics were performed to generate data for fine-tuning the DPA pre-trained model (details in method sections). The fine-tuned MLFF achieves an energy accuracy of 0.008 eV per atom and a force accuracy of 0.21 eV Å⁻¹ (Fig. 5c, d, more details in Methods).

**Fig. 5: The PtO_x system with different O concentrations (x = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1).**

The overall change of oxygen adsorption energy shows a continuous increase with the increase of oxygen concentration, with a relatively drastic change at the beginning and then a relatively gentle change (Fig. 6a). Initially, oxygens tend to adsorb on the surface. As the increase of oxygen content, oxygen begins to embed into the interior of Pt particles with O: Pd=0.4. These internal oxygen atoms directly emerge near the mass centroid rather than in the vicinity of the surface region (Fig. 6b). This phenomenon is analogous to the PdH_x system and is readily comprehensible from a thermodynamic perspective. The mass centroid is farthest from the surface, thus can effectively avoid the energy penalty associated with the increasing O-O interatomic repulsion. Concurrently, the particles undergo a remarkable volumetric expansion, embracing the emergence of Pt-O components within their bulk (Figure S7). This expansion may also explain why the surge in adsorption energy plateaus at higher oxygen levels. With the escalation of oxygen concentration, the augmentation of Pt-O coordination number is obvious. Pt₅₅ attains Pt sites with 4-6 oxygen coordination, akin to the structural motifs found in Pt₃O₄ or PtO₂, at 60% oxygen content (Fig. 6c). Since higher oxidation states of Pt generally correspond to poorer oxygen reduction reaction performance^80,81,82, it can be predicted that the emergence of such basic units will seriously damage the catalytic activity of clusters. We further constructed the Pourbaix diagram to study the phase transition process under actual electrochemical conditions by correcting the zero-point energy and entropy of the lowest energy configuration at each gradient (Fig. 6d). The findings reveal that, under typical acidic oxygen reduction reaction conditions (pH = 1), Pt₅₅ attains full oxygen incorporation at approximately 0.6 V, implying diminished catalytic activity, given that oxygen reduction reaction typically thrives at elevated potentials, such as 0.9 V. This does align with experimental results, which suggested that ultra-small Pt clusters (< 2 nm) impede the oxygen reduction reaction⁸³.It is noted that we focused only on thermodynamic stability and kinetic processes can be considered for a complete picture of oxygen embedding.

**Fig. 6: Results analysis of the PtO_x system with different O concentrations (x = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1).**

Discussion

The core concept underlying our persistent homology-driven sampling approach rests on the premise that the interplay between active species and material structure is intimately tied to a select few neighboring atoms. As such, we embark on a bottom-up strategy to perform persistent homology analysis on microscopic structural units. From a mathematical perspective, the “barcode death” point represents a critical spatial ___location associated with the scale at which a topological feature, such as a ring or void, disappears during the thickening process. This point reflects a balance in distances to surrounding points, capturing the intrinsic geometry of the structure. This methodical process offers a robust and intuitive approach that not only elucidates the spatial arrangements of these motifs but also elegantly determines the plausible interaction sites. Owing to the abstract mathematical nature of the persistent homology method, it can analyze materials of any morphology. Upon the generation of potential interaction sites, one can easily generate a myriad of potential adsorption/intercalation configurations, guided solely by the basic constraint of hard-sphere radii. This strategy is advantageous for the traversal of local minima and the exploration of the global potential energy landscape, while circumventing the daunting task of navigating the vast, Cartesian-based grand canonical sampling space. Furthermore, the entire process can be highly automated, requiring minimal human intervention and manipulation.

We compared the PH-SA with the classical basin hopping algorithm^2,29,30,84 for the PdH_x and PtO_x systems. The generally lower average adsorption energy achieved by PH-SA highlights its superior ability to identify more stable structures (Figures S8, S9). Fundamentally, the PH-SA and the basin hopping method represent two distinct technical approaches. The PH-SA excels in rapidly identifying and enumerating potential interaction sites across materials with diverse morphologies, ensuring that the identified sites are geometrically plausible. This approach allows for an initial exploration of configuration distributions, followed by the selective refinement of configurations of interest through structural optimization. In contrast, the basin hopping method integrates configuration generation with structural optimization in a sequential manner. New configurations are generated by applying perturbations to the current optimized configurations, requiring each step to undergo optimization before proceeding. Without the optimization step, the reliance on random perturbations would easily generate invalid, high-energy configurations, making the process less efficient and less directed. The difference in logic also endows the PH-SA method with greater potential for computational efficiency, as verified in the Pd-H and Pt-O systems (Figure S10). In addition, when considering only the surface adsorption sites, the number of sites identified by the PH-SA method is consistent with that determined by previous graph-theory-based methods³⁶ in both systems (Figure S11), thereby demonstrating the robustness of the PH-SA method.

In complex systems, persistent homology can swiftly identify key structural features, while basin hopping may necessitate repeated global structural transformations and numerous iterations to explore the underlying state space. In this regard, the persistent homology approach minimizes heuristic bias since it remains largely data-driven, reducing dependency on initial configurations and optimization pathways. Furthermore, basin hopping relies heavily on stochastic processes with less explicit mathematical foundations. Whereas persistent homology is firmly rooted in well-defined mathematical principles, offering systematic and interpretable insights into structural arrangements. It is capable of handling complex structures of arbitrary dimensions and can adapt to amorphous, distorted, and deformed site distributions.

The primary objective of this work is to demonstrate the feasibility of the persistent homology algorithm for structural sampling of materials with arbitrary morphologies. The slab model was utilized to illustrate the scenario where the lattice is fixed, primarily suited for theoretical simulations in the field of heterocatalysis. Furthermore, the cluster model was employed to demonstrate cases where material structures undergo expansion and deformation. The hydrogen adsorption under reduction environment and oxygen adsorption under oxidation environment were selected as model process. The two catalyst structures and reaction process we demonstrated are both representative, providing strong evidence for the general applicability of the method. The number of atoms and involved elements in the adopted models are kept relatively small and simple. Exploring large-scale systems and multi-species interaction systems, such as the CO adsorption phase diagram on CuO_x in CO₂ electroreduction^85,86,87 and the lithium storage process of SiO_x in the electrochemical energy storages⁸⁸, two important and ongoing controversial issues in their respective fields, are also entirely feasible by leveraging the power of machine-learned force fields (MLFFs). Grand canonical Monte Carlo (GCMC) can be readily integrated into the overall framework to further optimize the construction of active phase diagrams³⁸. In fact, in the PtO_x system, our approach of using the top 5% lowest-energy configurations from each gradient to generate configurations for the next gradient like the typical operations in GCMC for enumerating configurations, albeit without the employment of the Metropolis criterion. Lastly, we employed transfer learning to construct the machine learning force field, given that we mainly focus on the trend analysis of energy distribution. For more complex multi-element systems and kinetic analysis, active learning can be integrated with our sampling algorithm to construct high-accuracy force fields on-the-fly.

In summary, we present a framework for automatic active phase exploration grounded in topological data analysis, empowering comprehensive configuration sampling and facilitating efficient computations harnessing the power of MLFF in periodical and amorphous structures. At the heart of this framework lies a sampling algorithm, PH-SA, which treats the material structure as points cloud and employs persistent homology analysis to discern potential adsorption/embedding sites through a methodical bottom-up approach. MLFF are then trained through transfer learning to enable rapid structural optimization of sampled configurations. Based on the energetic information, Pourbaix diagram is constructed to delineate the response of the active phase to varying external environmental conditions. The formation of Pd hydrides with slab models and the oxidation of Pt clusters under electrochemical conditions are elucidated and show excellent agreements with experimental observations in both material reconstruction and catalytic reaction activity tuning. Our algorithm can be easily generalized to diverse structures of interest, thereby pioneering the path towards the realization of automated active phase analysis under practical, real-world conditions.

Methods

Persistent homology and interaction sites determination

Persistent homology is used to identify potential active sites in atomic structures. We primarily utilize the filtration process to obtain the barcodes of atomic aggregates constituting the material structure, from which filtering parameters are extracted to locate the spatial coordinates of potential active sites. The mathematical foundation of persistent homology is provided in the supplementary information. The implementation of the persistent homology relies on the open-source package GUDHI⁸⁹. Here, we focus on describing the practical operational process.

Step one: enumerating atomic aggregates within the material structure. We start by selecting a central atom and constructing atomic aggregates consisting of two to ten atoms within a 5 Å radius via combinatorial enumeration. Each atom in the structure is used as the central atom in turn. The atomic indices and three-dimensional coordinates of each aggregate are recorded in this process, and duplicates (aggregates with the same atomic indices) are removed to avoid redundancy. The effect of threshold variation on site determination is tested in S12.

Step two: obtaining filtering parameters of the atomic aggregates. For each atomic aggregate, persistent homology is applied to generate persistent barcodes, containing β₀, β₁, and β₂ with their respective birth, death, and persistence times. These Betti numbers correspond to topological invariants of independent components, loops, and cavities. Intuitively, each death point in the barcode represents a potential active site in space, as shown in Fig. 2. Here, we focus on the last death point (excluding the persistent β₀) for each aggregate, as it typically marks the disappearance of the largest void or cavity formed by most of the atoms in the aggregate. In material structures, foreign atoms (e.g., hydrogen or oxygen) typically occupy these interstitial sites. Since our calculation begins with pairs of atoms, the structural information from earlier death points of larger aggregates is generally already captured in smaller aggregates with fewer atoms. For instance, the bridge sites on each edge of an octahedron are determined during the persistent homology analysis of two atoms (Fig. 2). Although this conclusion has not been rigorously proven, this approach can strike a balance between computational efficiency and the identification of as many potential active sites as possible. When performing persistent homology, it is also necessary to specify the type of complex used. For atomic aggregates consisting of four or fewer atoms, we employ the Alpha complex, while for aggregates with more than four atoms, we utilize the Vietoris-Rips complex. The advantages of this combined strategy are discussed in Figs. S13, S14.

Step three: determining the three-dimensional coordinates of potential active sites. We can obtain the filtering parameters of the last death point from persistent homology, but the corresponding spatial coordinates of the interaction sites cannot be directly acquired, even though their positions are intuitive, as illustrated in Fig. 2. This requires a simple fitting step as presented in Eqs. (1) and (2):

$$\mathop{\min }\limits_{\text{x}}\mathop{\sum }\limits_{\text{i}=1}^{\text{n}}{\text{w}}_{\text{i}}{({||}{\text{p}}_{\text{i}}-\text{x}{||}-\text{d})}^{2}$$

(1)

$${\text{w}}_{\text{i}}=\frac{{\text{r}}_{\text{i}}}{{\sum }_{\text{j}=1}^{\text{n}}{\text{r}}_{\text{j}}}$$

(2)

Here, p represents the three-dimensional coordinates of each atom within the atomic aggregate, recorded during enumeration. The w is the radius weight of different elements composing the atomic aggregate, r is the radius of each atom, i and j are the atom indices, x denotes the coordinates of interaction site to be determined, and d is the filtration parameter derived from the last death point in persistent homology. The underlying rationale of this formulation is as follows: the death point in persistent homology reflects the critical scale at which a ring and/or cavity is filled during the thickening process. This corresponding filtration parameter d can be interpreted as the maximum distance that the “expanding” spheres (centered at the points in the cloud) must travel from their origins to reach and fill this cavity. The d serves as a guiding value in our fitting formula, which aims to determine a geometric point x such that its distance to all surrounding atoms closely approximates d. By minimizing the cumulative squared error in the objective function, we can determine the spatial coordinates of the interaction site. In this process, we also considered that when the material structure contains different elements, the spatial coordinates of the potential sites should be adjusted based on the radii ratio of the different elements, in order to enhance their physical plausibility. By leveraging this interpretation, the fitting process geometrically approximates the topological information provided by the death point, establishing a connection between the topological scale and spatial geometry. While it is possible to directly optimize the function for each atomic aggregate to identify coordinate without using persistent homology, this approach would transform the optimization into a two-parameter problem. As a result, the computational time would increase significantly, as demonstrated in Figure S15. Therefore, using persistent homology to obtain the filtration parameters is crucial for improving computational efficiency and ensuring the identification of potential active sites in a more systematic and scalable manner.

The DFT calculations

The DFT calculations were performed using VASP (Vienna Ab-initio Simulation Package) with the projector-augmented-wave (PAW) scheme⁹⁰. The Perdew-Burke-Ernzerhof (PBE) functional was utilized to describe electron exchange-correlation interactions. Empirical Grimme’s D3 scheme was employed to account for van-der-Waals interactions⁹¹. For the PdH_x system, the structure optimization was performed for the construction of MLFF and the free energy diagram calculations. The cut-off energy set to 400 eV, and the convergence criteria for electronic structure 0.05 eV/Å with the 2 × 2 × 1 Monkhorst-Pack k-point grid. For the PtO_x system, the ab initio molecule simulation was performed to generate the data for the construction of MLFF due to its amorphous nature. The AIMD was carried out using a canonical system synthesis (NVT) and a Nosé-Hoover thermostat at a temperature of 300 K with Gamma-point. The free energy diagrams of CO₂ electroreduction process on Pd and Pd_0.6 were calculated by the well-known computational hydrogen electrode method⁹² with Eq. (3):

$$\text{G}={\text{E}}_{\text{DFT}}+{\text{E}}_{\text{ZPE}}-\text{TS}$$

(3)

where ${E}_{\text{DFT}}$ is the ground state energy calculated by DFT. The vibrational frequencies of adsorbed species were calculated to consider entropies (TS) and zero-point energies (${E}_{\text{ZPE}}$) in the Gibbs free energy within the harmonic approximations using the finite difference method.

The graph isomorphism to remove duplicate configurations

In the process of configuration enumeration, there may be enumerated configurations with the same chemical environment. Taking Pd as an example, there are many bridge sites on the surface of Pd, and the configurations generated by adsorbing an H on these bridge sites have similar chemical environments, and the energy differences will be very small. In order to avoid enumerating these configurations with repeated chemical environments, we use graph isomorphism to remove duplicates from the enumerated configurations. Specifically, the generated configuration is converted into an atomic graph by atomic connectivity based on atomic radii, and each graph is transformed three times by the Weisfeiler-Lehman (WL) transformation using the WL algorithm to obtain a WL graph⁹³. Finally, the isomorphism algorithm of networkx is used to determine whether the WL graph is isomorphic or not, so as to obtain the unique configuration under the current gradient.

The construction of MLFF based on DPA

DPA, a deep potential model with a gated attention mechanism, is a pre-trained machine learning force field based on the OC20 database, covering 56 elements and millions of atomic configurations⁵⁸. Fine-tuning DPA for specific tasks enables high-accuracy force fields more efficiently than training MLFF from scratch. The fine-tuning process for MLFFs leverages pre-trained weights and adapts them to target systems using high-quality DFT data. After featurizing the DFT data into a compatible format, a loss function based on discrepancies in predicted energies and forces is minimized through backpropagation. We employed global fine-tuning, updating all model parameters without freezing any components. The hyperparameter settings used in the fine-tuning process are detailed below. For both PdH_x and PtO_x systems, the same hyperparameters were used (Table S1), except for the number of training steps. Next, we present the DFT datasets and the results of force field fine-tuning for the PdH_x and PtO_x systems, respectively.

The construction of MLFF for Pd-H system. A Pd unit cell was cut in the (100) plane to form a 4$\times$4 supercell with four layers and a vacuum spacing of 13 Å was added to the end of the slab. A total of 1064 adsorption sites including the upper and lower surfaces as well as the inner part of the slab were generated by the persistent homology approach. For each of the 10 gradients from H: Pd=0.1 to 1, 5000 unique configurations were generated. We randomly selected fifty structures for each H: Pd concentration gradient from all structures to perform DFT structural optimization, resulting in 9,128 frames of structure-energy data. The Pd-H MLFF were constructed by fine tuning the DPA model with above DFT calculations and the pre-training weights using DPA-1-OC2M. The training, validation, and test sets were divided in the ratio of 85%:5%:10%. The fine-tuned process carried out in 100,000 steps. Using DPA-1-OC2M weights on the test set before fine-tuning, a root mean square error (RMSE) of 0.33 eV per atom for energy and 0.14 eV Å⁻¹ for force was obtained. After fine-tuning, the RMSE of energy comes to 0.001 eV per atom, and the RMSE of force is 0.045 eV Å⁻¹. The stability of the model is demonstrated by a 5-fold cross-validation.

The construction of MLFF for Pt-O system. Icosahedral Pt₁₃ and Pt₅₅ NPs were used to fine tune the DPA pre-trained model to construct the Pt-O MLFF. We randomly selected five configurations for each of the systems with O:Pt ratios of 0.1, 0.3, 0.5, 0.7, and 0.9, and conducted AIMD simulations with a 1 fs time step. In general, even relatively short AIMD simulation durations can provide an initial training dataset for MLFF, as we primarily focus on the structural-energy/force pairs at DFT accuracy obtained from each ionic step, which serve as input data for training the MLFF model^25,94,95,96. For the five Pt₁₃ structures, each structure was run for 2000 frames, and for the Pt₅₅ structures, each was run for 10000 frames, resulting in a total of 60,000 frames of structures. The Pt-O dataset was then used to fine-tune the DPA model. The dataset was divided into 85%:5%:10% for training, validation, and test. A 5-fold cross-validation was performed to confirm the robustness of the model. Before fine-tuning, these pre-trained weights were used to go for data testing, yielding an RMSE of 0.47 eV per atom for energy and 0.87 eV Å⁻¹ for force. After 200000 steps of fine tuning, the RMSE of energy is obtained as 0.008 eV per atom as well as the RMSE of force 0.21 eV Å⁻¹. The accuracy is similar with previous work²⁴. In this study, the MLFF is trained to rapidly optimize sampled configurations for trend analysis of energy distribution. For a more detailed investigation of the system’s dynamic processes, additional sampling can be incorporated into the training set. By combining this with active learning, the MLFF can be retrained from scratch to achieve a more generalized and accurate force field.

Energy hull, Phase diagram and free energy diagram

High-throughput calculations based on the MLFF are carried out to obtain the average adsorption energy with different concentration of active species according to equation as Eq. (4):

$${E}_{\text{ad}}({\text{Pd}}_{64}{\text{H}}_{x})=\frac{E({\text{Pd}}_{64}{\text{H}}_{x})}{x}-\frac{E({\text{Pd}}_{64})}{x}-\frac{1}{2}E({\text{H}}_{2})$$

(4)

Where the $E({\text{Pd}}_{64}{\text{H}}_{x})$, $E({\text{Pd}}_{64})$ and$\,E({\text{H}}_{2})$ are electronic energy of ${\text{Pd}}_{64}{\text{H}}_{x}$, pure Pd and H₂, respectively. The energy hull of PtO_x system is calculated in similar way with the energy of oxygen reference to H₂O and H₂⁹². For PdH_x system, we have also calculated the mixing energy to better exhibit the evolution trend as Eq. (5):

$${E}_{\text{Mixing}}=E({\text{Pd}}_{64}{\text{H}}_{x})-{xE}({\text{Pd}}_{64}{\text{H}}_{64})-(1-x)E({\text{Pd}}_{64})$$

(5)

Where the $E({\text{Pd}}_{64}{\text{H}}_{x})$, $E\,({\text{Pd}}_{64}{\text{H}}_{64})$ and $E({\text{Pd}}_{64})$ are electronic energy of ${\text{Pd}}_{64}{\text{H}}_{x}$, ${\text{Pd}}_{64}{\text{H}}_{64}$ with lowest adsorption energy and pure Pd, respectively.

After the obtaining of energy hull of the $\text{Pd}{\text{H}}_{x}$ system and the $\text{Pt}{\text{O}}_{x}$ system, we constructed the Pourbaix diagram for analysis the phase diagram under different environment conditions, specifically, the pH and potential. The computational hydrogen electrode scheme⁹² was used to build the Pourbaix diagram. The lowest adsorption energy case was used for the Gibbs free energy calculation with entropy and zero-point energy correction.

For $\text{Pd}{\text{H}}_{x}$ system, the free energy difference (ΔG) is calculated according to Eqs. (6) and (7) as follows:

$${\text{Pd}}_{64}{\text{H}}_{64}\leftrightarrow {\text{Pd}}_{64}{\text{H}}_{x}+(64-x)({\text{H}}^{+}+{\text{e}}^{-})$$

(6)

$$\Delta G={G}_{{\text{Pd}}_{64}{\text{H}}_{x}}-{G}_{{\text{Pd}}_{64}{\text{H}}_{64}}+\frac{(64-x)}{2}{G}_{{\text{H}}_{2}}-(64-x)(\text{e}U\text{+}0.059\text{pH})$$

(7)

The ${G}_{{\text{Pd}}_{64}{\text{H}}_{x}}$ and ${G}_{{\text{Pd}}_{64}{\text{H}}_{64}}$ are the free energy of ${\text{Pd}}_{64}{\text{H}}_{x}$ and ${\text{Pd}}_{64}{\text{H}}_{64}$, respectively. The pH was set to 7.3. U is the applied potential.

For the PtO_x system, the free energy difference (ΔG) is calculated according to Eqs. (8) and (9) as follows:

$$\text{Pt}+x{\text{H}}_{2}\text{O}-2x({\text{H}}^{+}+{\text{e}}^{-})\leftrightarrow \text{Pt}{\text{O}}_{x}$$

(8)

$$\Delta G\,={G}_{{\text{PtO}}_{x}}-{G}_{\text{Pt}}-2x(\text{e}U\text{+}0.059\text{pH})-x({G}_{{\text{H}}_{2}\text{O}}-{G}_{{\text{H}}_{2}})$$

(9)

The ${G}_{{\text{PtO}}_{x}}$ and the ${G}_{\text{Pt}}$ are the free energy of $\text{Pt}$O_x and $\text{Pt}$, respectively. pH and U are the pH value and applied potential, respectively.

Data availability

The sampled configurations from PH-SA, the DFT data for model training and mechanism analysis are available on Zenodo at https://doi.org/10.5281/zenodo.14779452⁹⁷. Source data are provided with this paper.

Code availability

The code used to generate results in the manuscript is available from: https://github.com/JFLigroup/PH-SA under MIT license⁹⁸.

References

Zhang, Z., Wei, Z., Sautet, P. & Alexandrova, A. N. Hydrogen-induced restructuring of a Cu(100) electrode in electroreduction conditions. J. Am. Chem. Soc. 144, 19284–19293 (2022).
Article CAS PubMed Google Scholar
Sumaria, V., Nguyen, L., Tao, F. F. & Sautet, P. Atomic-scale mechanism of platinum catalyst restructuring under a pressure of reactant gas. J. Am. Chem. Soc. 145, 392–401 (2023).
Article CAS PubMed Google Scholar
Wei, P. et al. Coverage-driven selectivity switch from ethylene to acetate in high-rate CO₂/CO electrolysis. Nat. Nanotechnol. 18, 299–306 (2023).
Article ADS CAS PubMed Google Scholar
Li, J. et al. Constraining co coverage on copper promotes high-efficiency ethylene electroproduction. Nat. Catal. 2, 1124–1131 (2019).
Article CAS Google Scholar
Jung, O., Jackson, M. N., Bisbey, R. P., Kogan, N. E. & Surendranath, Y. Innocent buffers reveal the intrinsic pH- and coverage-dependent kinetics of the hydrogen evolution reaction on noble metals. Joule 6, 476–493 (2022).
Article CAS Google Scholar
Ding, H. et al. Role of surface hydrogen coverage in C–C coupling process for CO₂ electroreduction on ni-based catalysts. ACS Catal. 14, 14330–14338 (2024).
Article CAS Google Scholar
Zheng, S., Ding, H., Yang, X., Li, S. & Pan, F. Automating discovery of electrochemical urea synthesis reaction paths via active learning and graph theory. CCS Chem. 12, 1–13 (2024).
Google Scholar
Zhou, Y. et al. Dopant-induced electron localization drives CO₂ reduction to C₂ hydrocarbons. Nat. Chem. 10, 974–980 (2018).
Article CAS PubMed Google Scholar
Zheng, M. et al. Electrocatalytic CO₂-to-C₂₊ with ampere-level current on heteroatom-engineered copper via tuning *CO intermediate coverage. J. Am. Chem. Soc. 144, 14936–14944 (2022).
Article CAS PubMed Google Scholar
Padavala, S. K. M. & Stoerzinger, K. A. Role of hydride formation in electrocatalysis for sustainable chemical transformations. ACS Catal. 13, 4544–4551 (2023).
Article CAS Google Scholar
Lian, Z., Dattila, F. & López, N. Stability and lifetime of diffusion-trapped oxygen in oxide-derived copper CO₂ reduction electrocatalysts. Nat. Catal. 7, 401–441 (2024).
Article CAS Google Scholar
Spori, C., Kwan, J. T. H., Bonakdarpour, A., Wilkinson, D. P. & Strasser, P. The stability challenges of oxygen evolving catalysts: Towards a common fundamental understanding and mitigation of catalyst degradation. Angew. Chem. Int. Ed. Engl. 56, 5994–6021 (2017).
Article PubMed Google Scholar
Bi, W., Sun, Q., Deng, Y. & Fuller, T. F. The effect of humidity and oxygen partial pressure on degradation of Pt/C catalyst in pem fuel cell. Electrochim. Acta. 54, 1826–1833 (2009).
Article CAS Google Scholar
Guo, Z., Wang, T., Xu, J., Cao, A. & Li, H. Surface coverage and reconstruction analyses bridge the correlation between structure and activity for electrocatalysis. Chem. Commun. 60, 14346–14359 (2024).
Article CAS Google Scholar
Bang, K. et al. Machine learning-enabled exploration of the electrochemical stability of real-scale metallic nanoparticles. Nat. Commun. 14, 3004 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Baeumer, C. et al. Tuning electrochemically driven surface transformation in atomically flat LaNiO₃ thin films for enhanced water electrolysis. Nat. Mater. 20, 674–682 (2021).
Article ADS CAS PubMed Google Scholar
Seh, Z. W. et al. Combining theory and experiment in electrocatalysis: Insights into materials design. Science 355, eaad4998 (2017).
Article PubMed Google Scholar
Li, H., Jiao, Y., Davey, K. & Qiao, S. Z. Data-driven machine learning for understanding surface structures of heterogeneous catalysts. Angew. Chem. Int. Ed. Engl. 62, e202216383 (2023).
Article CAS PubMed Google Scholar
Zheng, S. et al. Multi-center cooperativity enables facile C–C coupling in electrochemical CO₂ reduction on a Ni₂P catalyst. ACS Catal 13, 2847–2856 (2023).
Article CAS Google Scholar
Zhang, Y.-J., Sethuraman, V., Michalsky, R. & Peterson, A. A. Competition between CO₂ reduction and H₂ evolution on transition-metal electrocatalysts. ACS Catal 4, 3742–3748 (2014).
Article CAS Google Scholar
Friederich, P., Hase, F., Proppe, J. & Aspuru-Guzik, A. Machine-learned potentials for next-generation matter simulations. Nat. Mater. 20, 750–761 (2021).
Article ADS CAS PubMed Google Scholar
Deringer, V. L., Caro, M. A. & Csanyi, G. Machine learning interatomic potentials as emerging tools for materials science. Adv. Mater. 31, e1902765 (2019).
Article PubMed Google Scholar
Liu, P. et al. Combining machine learning and many-body calculations: Coverage-dependent adsorption of CO on Rh(111). Phys. Rev. Lett. 130, 078001 (2023).
Article ADS CAS PubMed Google Scholar
Xu, J., Xie, W., Han, Y. & Hu, P. Atomistic insights into the oxidation of flat and stepped platinum surfaces using large-scale machine learning potential-based grand-canonical monte carlo. ACS Catal 12, 14812–14824 (2022).
Article CAS Google Scholar
Fu, F. et al. Unraveling the atomic‐scale mechanism of phase transformations and structural evolutions during (de)lithiation in si anodes. Adv. Funct. Mater. 33, 2303936 (2023).
Zhang, W. et al. Revealing morphology evolution of lithium dendrites by large‐scale simulation based on machine learning force field. Adv. Energy. Mater. 13, 2202892 (2022).
Behler, J. Four generations of high-dimensional neural network potentials. Chem. Rev. 121, 10037–10072 (2021).
Article ADS CAS PubMed Google Scholar
Wexler, R. B., Qiu, T. & Rappe, A. M. Automatic prediction of surface phase diagrams using ab initio grand canonical monte carlo. J. Phys. Chem. C 123, 2321–2328 (2019).
Article CAS Google Scholar
Wales, D. J. & Doye, J. P. K. Global optimization by basin-hopping and the lowest energy structures of lennard-jones clusters containing up to 110 atoms. J. Phys. Chem. A 101, 5111–5116 (1997).
Article CAS Google Scholar
Panosetti, C., Krautgasser, K., Palagin, D., Reuter, K. & Maurer, R. J. Global materials structure search with chemically motivated coordinates. Nano Lett 15, 8044–8048 (2015).
Article ADS CAS PubMed Google Scholar
Obersteiner, V., Scherbela, M., Hormann, L., Wegner, D. & Hofmann, O. T. Structure prediction for surface-induced phases of organic monolayers: Overcoming the combinatorial bottleneck. Nano Lett 17, 4453–4460 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Bauer, M. N., Probert, M. I. J. & Panosetti, C. Systematic comparison of genetic algorithm and basin hopping approaches to the global optimization of Si(111) surface reconstructions. J. Phys. Chem. A 126, 3043–3056 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schusteritsch, G. & Pickard, C. J. Predicting interface structures: From SrTiO₃ to graphene. Phys. Rev. B 90, 035424 (2014).
Article ADS CAS Google Scholar
Deshpande, S., Maxson, T. & Greeley, J. Graph theory approach to determine configurations of multidentate and high coverage adsorbates for heterogeneous catalysis. Npj Comput. Mater 6, 79 (2020).
Article ADS CAS Google Scholar
Ghanekar, P. G., Deshpande, S. & Greeley, J. Adsorbate chemical environment-based machine learning framework for heterogeneous catalysis. Nat. Commun. 13, 5788 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Han, S., Lysgaard, S., Vegge, T. & Hansen, H. A. Rapid mapping of alloy surface phase diagrams via bayesian evolutionary multitasking. Npj Comput. Mater 9, 139 (2023).
Article ADS CAS Google Scholar
Boes, J. R., Mamun, O., Winther, K. & Bligaard, T. Graph theory approach to high-throughput surface adsorption structure generation. J. Phys. Chem. A 123, 2281–2285 (2019).
Article CAS PubMed Google Scholar
Du, X. et al. Machine-learning-accelerated simulations to enable automatic surface reconstruction. Nat. Comput. Sci. 3, 1034–1044 (2023).
Article PubMed Google Scholar
Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci 68, 314–319 (2013).
Article CAS Google Scholar
Edelsbrunner, Letscher & Zomorodian Topological persistence and simplification. Discrete Comput. Geomet. 28, 511–533, 0179–5376 (2002).
Ghrist, R. Barcodes: The persistent topology of data. Bulletin American Mathematical Society 45, 61–75 (2008). 0273–0979.
Article MathSciNet Google Scholar
Raphael Reinauer, M. C. Nicolas Berkouk Persformer: A transformer architecture for topological machine learning. ArXiv preprint. arXiv:2112.15210 (2021).
Zia, A. et al. Topological deep learning: A review of an emerging paradigm. Artif. Intell. Rev. 57, 77 (2024).
Article Google Scholar
Townsend, J., Micucci, C. P., Hymel, J. H., Maroulas, V. & Vogiatzis, K. D. Representation of molecular structures with persistent homology for machine learning applications in chemistry. Nat. Commun. 11, 3230 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Steinberg, L., Russo, J. & Frey, J. A new topological descriptor for water network structure. J. Cheminform 11, 48 (2019).
Article PubMed PubMed Central Google Scholar
Chen, D. et al. Path topology in molecular and materials sciences. J. Phys. Chem. Lett. 14, 954–964 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, X. et al. Topology-based machine learning strategy for cluster structure prediction. J. Phys. Chem. Lett. 11, 4392–4401 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jiang, Y. et al. Topological representations of crystalline compounds for the machine-learning prediction of materials properties. NPJ Comput Mater 7, 28 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Zheng, S., Ding, H., Li, S., Chen, D. & Pan, F. Application of topology-based structure features for machine learning in materials science. Chinese J. Struc. Chem. 42, 100120 (2023).
Google Scholar
Hiraoka, Y. et al. Hierarchical structures of amorphous solids characterized by persistent homology. Proc. Natl. Acad. Sci. Usa. 113, 7035–7040 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Sorensen, S. S., Biscio, C. A. N., Bauchy, M., Fajstrup, L. & Smedskjaer, M. M. Revealing hidden medium-range order in amorphous materials using topological data analysis. Sci. Adv. 6, eabc2320 (2020).
Article ADS PubMed PubMed Central Google Scholar
Minamitani, E., Shiga, T., Kashiwagi, M. & Obayashi, I. Topological descriptor of thermal conductivity in amorphous si. J. Chem. Phys. 156, 244502 (2022).
Article ADS CAS PubMed Google Scholar
Lee, Y. et al. Quantifying similarity of pore-geometry in nanoporous materials. Nat. Commun. 8, 15396 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Krishnapriyan, A. S., Montoya, J., Haranczyk, M., Hummelshoj, J. & Morozov, D. Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks. Sci. Rep. 11, 8888 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Minamitani, E., Obayashi, I., Shimizu, K. & Watanabe, S. Persistent homology-based descriptor for machine-learning potential of amorphous structures. J. Chem. Phys. 159, 084101 (2023).
Article Google Scholar
Chen, D., Liu, J. & Wei, G.-W. Multiscale topology-enabled structure-to-sequence transformer for protein–ligand interaction predictions. Nat. Mach. Intell. 6, 799–810 (2024).
Article Google Scholar
Arango, A. S., Park, H. & Tajkhorshid, E. Topological learning approach to characterizing biological membranes. J. Chem. Inf. Model. 64, 5242–5252 (2024).
Article CAS PubMed Google Scholar
Zhang, D. et al. Pretraining of attention-based deep learning potential model for molecular simulation. Npj Comput. Mater 10, 94 (2024).
Article ADS Google Scholar
Chen, B. W. J., Xu, L. & Mavrikakis, M. Computational methods in heterogeneous catalysis. Chem. Rev. 121, 1007–1048 (2021).
Article CAS PubMed Google Scholar
Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO₂ reduction and H₂ evolution. Nat. Catal. 1, 696–703 (2018).
Article CAS Google Scholar
Wang, S. H., Pillai, H. S., Wang, S., Achenie, L. E. K. & Xin, H. Infusing theory into deep learning for interpretable reactivity prediction. Nat. Commun. 12, 5288 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, Z., Masubuchi, T., Sautet, P., Anderson, S. L. & Alexandrova, A. N. Hydrogen evolution on electrode-supported Pt_n clusters: Ensemble of hydride states governs the size dependent reactivity. Angew. Chem. Int. Ed. Engl. 62, e202218210 (2023).
Article CAS PubMed Google Scholar
Li, X. et al. Advances in heterogeneous single-cluster catalysis. Nat Rev Chem 7, 754–767 (2023).
Article ADS CAS PubMed Google Scholar
Mizoguchi, H., Park, S. W. & Hosono, H. A view on formation gap in transition metal hydrides and its collapse. J. Am. Chem. Soc. 143, 11345–11348 (2021).
Article CAS PubMed Google Scholar
Pickard, C. J., Errea, I. & Eremets, M. I. Superconducting hydrides under pressure. Annu. Rev. Condens. Matter Phys 11, 57–76 (2020).
Article ADS CAS Google Scholar
Abdellah, A. M. et al. Impact of palladium/palladium hydride conversion on electrochemical CO₂ reduction via in-situ transmission electron microscopy and diffraction. Nat. Commun. 15, 938 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhu, W., Kattel, S., Jiao, F. & Chen, J. G. Shape‐controlled CO₂ electrochemical reduction on nanosized Pd hydride cubes and octahedra. Adv. Energy. Mater. 9, 1802840 (2019).
Article Google Scholar
Chang, Q. et al. Boosting activity and selectivity of CO₂ electroreduction by pre-hydridizing pd nanocubes. Small 16, e2005305 (2020).
Article PubMed Google Scholar
Diercks, J. S. et al. Interplay between surface-adsorbed CO and bulk Pd hydride under CO₂-electroreduction conditions. ACS Catal 12, 10727–10741 (2022).
Article CAS Google Scholar
Bohnen, K. P. & Kolb, D. M. Charge- versus adsorbate-induced lifting of the Au(100)-(hex) reconstruction in an electrochemical environment. Surf. Sci. 407, L629–L632 (1998).
Article ADS CAS Google Scholar
Gunther, S., Kopatzki, E., Bartelt, M. C., Evans, J. W. & Behm, R. J. Anisotropy in nucleation and growth of two-dimensional islands during homoepitaxy on “hex” reconstructed Au(100). Phys. Rev. Lett. 73, 553–556 (1994).
Article ADS CAS PubMed Google Scholar
Blizanac, B. B. et al. Anion adsorption, co oxidation, and oxygen reduction reaction on a Au(100) surface: The ph effect. J. Phy. Chem. B 108, 625–634 (2003).
Article Google Scholar
Hammer, R., Meinel, K., Krahn, O. & Widdra, W. Surface reconstruction of Pt(001) quantitatively revisited. Phys. Rev. B 94, 195406 (2016).
Article Google Scholar
Sun, X. et al. Unravelling platinum nanoclusters as active sites to lower the catalyst loading for formaldehyde oxidation. Commun. Chem 2, 27 (2019).
Article Google Scholar
Puliyalil, H., Lasic Jurkovic, D., Dasireddy, V. & Likozar, B. A review of plasma-assisted catalytic conversion of gaseous carbon dioxide and methane into value-added platform chemicals and fuels. RSC Adv 8, 27481–27508 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Suen, N. T. et al. Electrocatalysis for the oxygen evolution reaction: recent development and future perspectives. Chem. Soc. Rev. 46, 337–365 (2017).
Article CAS PubMed Google Scholar
Wang, C., Daimon, H., Onodera, T., Koda, T. & Sun, S. A general approach to the size- and shape-controlled synthesis of platinum nanoparticles and their catalytic reduction of oxygen. Angew. Chem. Int. Ed. Engl. 47, 3588–3591 (2008).
Article CAS PubMed Google Scholar
Brandes, B. A., Krishnan, Y., Buchauer, F. L., Hansen, H. A. & Hjelm, J. Unifying the ORR and OER with surface oxygen and extracting their intrinsic activities on platinum. Nat. Commun. 15, 7336 (2024).
Article CAS PubMed PubMed Central Google Scholar
Merte, L. R. et al. Electrochemical oxidation of size-selected Pt nanoparticles studied using in situ high-energy-resolution x-ray absorption spectroscopy. ACS Catal 2, 2371–2376 (2012).
Article CAS Google Scholar
Nagai, T., Murata, H. & Morimoto, Y. Analysis of the relation between oxidation state and ORR activity of pt by linear sweep voltammetry. ECS Transactions 33, 125–130 (2010).
Article ADS CAS Google Scholar
Xu, R. et al. Improving the orr performance by enhancing the pt oxidation resistance. J. Catal. 416, 311–321 (2022).
Article CAS Google Scholar
Subramanian, N. P., Greszler, T. A., Zhang, J., Gu, W. & Makharia, R. Pt-oxide coverage-dependent oxygen reduction reaction (ORR) kinetics. J. Electrochem. Soc. 159, B531–B540 (2012).
Article CAS Google Scholar
Shao, M., Peles, A. & Shoemaker, K. Electrocatalysis on platinum nanoparticles: Particle size effect on oxygen reduction reaction activity. Nano Lett 11, 3714–3719 (2011).
Article ADS CAS PubMed Google Scholar
Sumaria, V. & Sautet, P. CO organization at ambient pressure on stepped pt surfaces: First principles modeling accelerated by neural networks. Chem. Sci. 12, 15543–15555 (2021).
Article CAS PubMed PubMed Central Google Scholar
Moller, T. et al. Electrocatalytic CO₂ reduction on CuO_x nanocubes: Tracking the evolution of chemical state, geometric structure, and catalytic selectivity using operando spectroscopy. Angew. Chem. Int. Ed. Engl. 59, 17974–17983 (2020).
Article PubMed PubMed Central Google Scholar
Zang, Y. et al. Selective CO₂ electroreduction to ethanol over a carbon-coated CuO_x catalyst. Angew. Chem. Int. Ed. Engl. 61, e202209629 (2022).
Article CAS PubMed Google Scholar
Xiang, K. et al. Boosting CO₂ electroreduction towards C₂₊ products via CO* intermediate manipulation on copper-based catalysts. Environmental Science: Nano 9, 911–953 (2022).
CAS Google Scholar
Liu, Z. et al. Silicon oxides: A promising family of anode materials for lithium-ion batteries. Chem. Soc. Rev. 48, 285–309 (2019).
Article ADS CAS PubMed Google Scholar
Maria, C., Boissonnat, J.-D., Glisse, M. & Yvinec, M. in Mathematical Software – ICMS 2014. (eds. H. Hong & C. Yap) 167–174 (Springer Berlin Heidelberg, Berlin, Heidelberg; 2014).
Blochl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).
Article ADS CAS Google Scholar
Grimme, S., Antony, J., Ehrlich, S. & Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys. 132, 154104 (2010).
Article ADS PubMed Google Scholar
Nørskov, J. K. et al. Origin of the overpotential for oxygen reduction at a fuel-cell cathode. J. Phy. Chem. B 108, 17886–17892 (2004).
Article Google Scholar
Weisfeiler, B. & Leman, A. The reduction of a graph to canonical form and the algebra which appears therein. nti, Series 2, 12–16 (1968).
Google Scholar
Hou, P. et al. Unraveling the oxidation behaviors of Mxenes in aqueous systems by active-learning-potential molecular-dynamics simulation. Angew. Chem. Int. Ed. Engl. 62, e202304205 (2023).
Article CAS PubMed Google Scholar
Hou, P. et al. Proton-driven dynamic behavior of nanoconfined water in hydrophilic Mxene sheets. Angew. Chem. Int. Ed. Engl. 63, e202411849 (2024).
Article CAS PubMed Google Scholar
Mikkelsen, A. E. G., Schiotz, J., Vegge, T. & Jacobsen, K. W. Is the water/Pt(111) interface ordered at room temperature? J. Chem. Phys. 155, 224701 (2021).
Article ADS CAS PubMed Google Scholar
Zheng, S. et al. Data for “Active phase discovery in heterogeneous catalysis via topology-guided sampling and machine learning”. Zenodo, https://doi.org/10.5281/zenodo.14779452 (2025).
Zheng, S. et al. Active phase discovery in heterogeneous catalysis via topology-guided sampling and machine learning. Zenodo, https://doi.org/10.5281/zenodo.14777320 (2025).

Download references

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (22402163-S.S.Z, 22021001-J.L., 21925404-J.L., T2293692-J.L., and 22361132532-J.L.), Natural Science Foundation of Xiamen, China (3502Z202472001-S.S.Z.), the National Key Research and Development Program of China (2022YFA1503803-J.L.), Shenzhen Key Laboratory of New Energy Resources Genome Preparation and Testing (ZDSYS201707281026184-F.P.), Guangdong Key Laboratory of Design and calculation of New Energy Materials (2017B030301013-F.P.), International joint Research Center for Electric Vehicle Power Battery and Materials (2015B01015-F.P.).

Author information

These authors contributed equally: Shisheng Zheng, Xi-Ming Zhang.

Authors and Affiliations

College of Energy, State Key Laboratory of Physical Chemistry of Solid Surfaces, iChEM, College of Chemistry and Chemical Engineering, College of Materials, College of Electronic Science and Engineering, College of Physical Science and Technology, Institute of Artificial Intelligence, School of Mathematical Sciences, Xiamen University, Xiamen, China
Shisheng Zheng, Xi-Ming Zhang, Heng-Su Liu, Ge-Hao Liang, Si-Wang Zhang, Jingling Yang, Xian’an Jin & Jian-Feng Li
School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen, China
Wentao Zhang, Bingxu Wang & Feng Pan
Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, China
Jian-Feng Li

Authors

Shisheng Zheng
View author publications
Search author on:PubMed Google Scholar
Xi-Ming Zhang
View author publications
Search author on:PubMed Google Scholar
Heng-Su Liu
View author publications
Search author on:PubMed Google Scholar
Ge-Hao Liang
View author publications
Search author on:PubMed Google Scholar
Si-Wang Zhang
View author publications
Search author on:PubMed Google Scholar
Wentao Zhang
View author publications
Search author on:PubMed Google Scholar
Bingxu Wang
View author publications
Search author on:PubMed Google Scholar
Jingling Yang
View author publications
Search author on:PubMed Google Scholar
Xian’an Jin
View author publications
Search author on:PubMed Google Scholar
Feng Pan
View author publications
Search author on:PubMed Google Scholar
Jian-Feng Li
View author publications
Search author on:PubMed Google Scholar

Contributions

S.S.Z. and X.Z. contribute equally to this work. S.S.Z., J.L., and F.P. conceived this work. S.S.Z. and X.Z. wrote the codes and performed the DFT calculations. S.S.Z., J.L. and X.Z conducted the data analysis. S.S.Z. wrote the original manuscript. J.L., S.S.Z., and F.P. revised the manuscript. H.L., G.L., S.W.Z., W.Z., B.W., J.Y., X.J., and F.P. participated in the discussions.

Corresponding authors

Correspondence to Shisheng Zheng, Feng Pan or Jian-Feng Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Aurora Clark and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Transparent Peer Review file

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, S., Zhang, XM., Liu, HS. et al. Active phase discovery in heterogeneous catalysis via topology-guided sampling and machine learning. Nat Commun 16, 2542 (2025). https://doi.org/10.1038/s41467-025-57824-4

Download citation

Received: 30 September 2024
Accepted: 28 February 2025
Published: 14 March 2025
DOI: https://doi.org/10.1038/s41467-025-57824-4

This article is cited by

Inverse design of catalytic active sites via interpretable topology-based deep generative models
- Bingxu Wang
- Shisheng Zheng
- Feng Pan
npj Computational Materials (2025)

Subjects

Abstract

Similar content being viewed by others

Machine learning-assisted dual-atom sites design with interpretable descriptors unifying electrocatalytic reactions

Accurate energy barriers for catalytic reaction pathways: an automatic training protocol for machine learning force fields

Data efficient machine learning potentials for modeling catalytic reactivity via active learning and enhanced sampling

Introduction

Results

The overview of PH-SA and the overall framework

Simulating complex heterogeneous catalytic systems with the framework

The PdHx system with slab model

The PtOx system with cluster model

Discussion

Methods

Persistent homology and interaction sites determination

The DFT calculations

The graph isomorphism to remove duplicate configurations

The construction of MLFF based on DPA

Energy hull, Phase diagram and free energy diagram

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Transparent Peer Review file

Source data

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Inverse design of catalytic active sites via interpretable topology-based deep generative models

Search

Quick links

The PdH_x system with slab model

The PtO_x system with cluster model