Introduction

The quantum nature of nuclei influences every material and chemical process. Resulting nuclear quantum effects (NQEs) can arise as zero-point energy,1 quantum tunneling2,3, and quantization of proton energy levels. NQEs are generally expected to be substantial under certain conditions, such as processes occurring at low temperature or involving light nuclei. They have been found to be consequential in genetic stability of DNA4,5, zeolite catalysts6,7,8,9, superconductor materials10,11, and enzymatic reactions12,13,14,15. NQEs also directly impact experimental studies that investigate or depend on isotope substitution16,17,18,19,20,21,22. Therefore, NQEs are important for many biological, chemical, and physical processes, even those containing heavier atoms or strong interactions6,10,23,24,25,26, yet their extent and significance remain not well characterized for many systems.

Nuclear quantum effects are often deduced through equilibrium or kinetic isotope effects. In classical mechanics, kinetic isotope effects are expected but depend solely on the mass differences between isotopes; deviations from such results highlight quantum mechanical phenomena-such as nuclear tunneling. This is common in enzymatic reactions13,27,28,29,30,31,32; for instance, isotope substitution can alter the oxidation rate of yeast alcohol dehydrogenase by up to 33%.15 In a very different scenario, the conductivity of deuterated versus normal phosphoric acid differs by orders of magnitude, while classical expectations suggest a ratio of ca. 1.433. Meanwhile, classical statistical thermodynamics does not predict isotope-dependent changes in equilibrium properties, yet isotope effects on such properties are widely observed34,35,36,37,38,39. For example, the solid-vapor isotope fractionation ratio between deuterated and normal water reaches values as high as 1.20835. Replacing H2O with D2O similarly impacts the structure and properties of biopolymer solutions40,41, despite D2O being chemically equivalent to H2O under classical assumptions. The temperature-dependence of intramolecular isotopic equilibria has even been used to estimate dinosaur body temperatures via isotope-ratio mass spectrometry measurements on fossilized sauropod teeth42. The presence of such isotope effects signals the relevance of NQEs but does not fully characterize them.

Path-integral molecular dynamics (PIMD) enables explicit treatment of NQEs and calculation of isotope effects. This is in contrast to conventional classical MD, which treats nuclei as point particles evolving on a potential energy surface. Meanwhile, electronic structure methods, like density functional theory, provide a quantum mechanical treatment of electronic degrees of freedom but generally rely on the Born-Oppenheimer approximation, under which the nuclei are held fixed or propagated classically. PIMD incorporates NQEs via a mathematical isomorphism, mapping each quantum nucleus to a classical ring polymer, thereby capturing quantum delocalization and thermal fluctuations by sampling in an extended phase space43. For water, PIMD has demonstrated how intramolecular zero-point motion and intermolecular tunneling affect properties like translational diffusion and orientational relaxation rates44,45. PIMD has also revealed the significance of NQEs in DNA base pairs, acetylene:ammonia co-crystals, and electrolyte transport in confined aqueous systems46,47,48. Nevertheless, in addition to the computational overhead of the extended phase space, PIMD can be practically limited by the availability of force fields that accurately represent the Born-Oppenheimer potential energy surface. Most conventional classical MD simulations typically use efficient force fields based on analytical equations. However, these force fields have parameters often calibrated to experimental data, implicitly including NQEs in uncontrolled ways, such that their use in PIMD risks “double counting” NQEs44,49. As a result, broad application of PIMD remains limited, and the role of NQEs across many systems and conditions is still largely unexplored.

Here, we characterize NQEs in 92 molecular liquids, spanning a wide organic chemical space, at ambient conditions. In particular, by comparing classical and path-integral MD simulations, we quantify NQEs on several thermodynamic properties including molar volume (vm), thermal expansion coefficient (αP), isothermal compressibility (κT), static dielectric constant (εr(0)), and enthalpy of vaporization (Δhvap). Equilibrium isotope effects on the same systems are also examined by simulating deuterated systems. This investigation spans a diverse set of chemistries-including amines, ethers, ketones, alcohols, alkanes, and more-and is enabled by the Topology Automated Force-Field Interactions (TAFFI) framework50, an efficient analytical force field parameterized solely from quantum chemical calculations at the ωB97X-D3/def2-TZVP level. Because TAFFI does not utilize experimental calibration, it is well-suited for conducting PIMD across the numerous systems studied. We find a meaningful influence of NQEs for every substance and highlight key chemical features that correlate with NQEs using unsupervised and supervised machine learning. This reveals competing effects between system stability and hydrogen-atom densities on NQEs, which particularly explains trends related to hydrogen-bonding groups and molecular branching. By combining physics-based simulation and data-driven analysis, this work provides a deeper understanding of NQEs in common liquid organic systems and the conditions under which explicit consideration of NQEs may be needed.

Results

Varied effects across molecules and properties

To understand how NQEs manifest across chemical space, we simulate 92 organic molecules at ambient conditions. Of the 92 molecules studied, 87 were previously simulated using classical MD with various force fields51, including TAFFI50. Benchmarking against experimental thermophysical properties showed that TAFFI compared favorably to other widely used empirical force fields in terms of reproducing experimental thermophysical properties across this set, which features amines, ethers, nitriles, ketones, alcohols, aldehydes, esters, amides, as well as some sulfur- and halogen-containing compounds. We further augmented this set with five n-alkanes (C9, C10, C11, C14, and C15) to have representation of nonpolar hydrocarbons. The extent of NQEs is then quantified via

$$\begin{array}{l}{\Delta }_{\lambda }(\%)=100\times \frac{{\lambda }^{{{{\rm{PI}}}}}-{\lambda }^{{{{\rm{cl}}}}}}{{\lambda }^{{{{\rm{PI}}}}}}\end{array}$$
(1)

where λcl and λPI are the properties calculated via classical and path-integral MD simulations.

Figure 1 summarizes the results, with Fig. 1A qualitatively organizing all systems based on influence on molar volume, vm. While it is clear and expected that heavy atoms like chlorine, bromine, and sulfur lead to smaller NQEs, other relativistic organizing principles based on molecular constitution are less evident. More quantitatively, NQEs affect all systems and properties (Fig. 1B). Accounting for NQEs markedly increases molar volume by up to 5.5% and also increases κT for most systems, with an average effect of nearly 8%. These systematic increases may signify weaker cohesive intermolecular interactions due to nuclear delocalization, in line with prior studies on water and other liquid organic systems52,53,54,55,56,57,58,59. In prior investigation featuring linear alkanes, the NQEs were found to decrease density by as much as 11%;56 for water, NQEs induce a much smaller effect on the density, but its direction is consistent with our observations56,60,61. For reference, trends in density are opposite but similar in magnitude to molar volume, as analytically Δρ = \(-\frac{{\Delta }_{{v}_{{{{\rm{m}}}}}}}{1-{\Delta }_{{v}_{{{{\rm{m}}}}}}}\approx -{\Delta }_{{v}_{{{{\rm{m}}}}}}\) (Supplementary Fig. 2). Quantum treatment of nuclei yields less consistent directional changes from classical results on the thermal expansion coefficient, enthalpy of vaporization and the static dielectric constant. For these properties, the average effect is small but also comparable to the statistical error in their calculation. These average effects are slightly positive for αP and negative for εr(0) and Δhvap. Overall, these results indicate that NQEs can significantly and detectably influence certain thermodynamic properties across diverse molecular chemistries.

Fig. 1: Impact on thermophysical properties.
figure 1

A Molecular structures of studied systems. Molecules are categorized into deciles based on \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). Within each decile, higher vertical position indicates higher \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). Atoms are colored as follows: carbon (gray), hydrogen (white), oxygen (red), nitrogen (blue), sulfur (yellow), chlorine (green), bromine (brown), iodine (purple). Structures were rendered with PyMOL (v3.1.0)82. The same data are shown with skeletal formula in Supplementary Fig. 1. B The magnitude of NQEs on molar volume, thermal expansion coefficient, isothermal compressibility, static dielectric constant, and enthalpy of vaporization of each system. C The calculated isotope effects on each property. The gray regions around y = 0 under each plot indicate the average standard error of the mean for the magnitude of NQEs on each property obtained from four independent simulations. The violin plots in (B, C) represent the distribution of Δλ and \({\Delta }_{{\lambda }^{{{{\rm{D\to H}}}}}}\) values, with the white dot representing the median, the black box representing the interquartile range, and the inner lines representing the 1.5× interquartile range. The spread along the x-axis represents the density obtained from Gaussian kernel density estimation. All displayed properties are calculated at 298.15 K and 1 atm. Source data are provided as a Source Data file.

For the original TAFFI framework, all bonds are fit to purely harmonic functions, but anharmonicity in the potential energy surface-especially in bond-stretching modes-has been identified as a key factor influencing the magnitude of NQEs44,62,63,64. For example, in water, anharmonicity in the O-H bond contributes to competing effects: intermolecular zero-point energy and tunneling weaken hydrogen bonding, while intramolecular zero-point motion enhances the dipole moment and strengthens interactions, resulting in relatively small net NQEs44.

To evaluate then whether the harmonic description in TAFFI may be responsible for the large NQEs relative to water, we re-parameterized bonds involving hydroxyls, amines, and thiols (29 molecules) with an anharmonic Morse potential44 based on expanded mode scans at ωB97X-D3/def2-TZVP level of theory. We then evaluated \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) with this anharmonic bond description and compared it to the harmonic results (Supplementary Fig. 3). Overall, we find that anharmonicity in bond-stretching slightly suppresses NQEs, though the overall magnitudes remain comparable to the harmonic description. We speculate that this mitigating effect is weaker in the organic molecules studied than in water, likely due to smaller dipole moment changes and correspondingly less enhancement of intermolecular interactions following introduction of anharmonicity. Intuitively, as most of the molecules in this study are larger than water, they are expected to exhibit smaller overall changes in dipole moment due to anharmonicity, as local bond distortions contribute less to the total molecular polarization. Although anharmonicity appears to have little impact for the molecules and conditions studied, its role in other phases, molecules, or force fields warrants careful consideration.

Isotope effects versus nuclear quantum effects

Building on the preceding results that directly compare quantum and classical systems, we probe isotope effects of deuteration using

$$\begin{array}{l}{\Delta }_{{\lambda }^{{{{\rm{D\to H}}}}}}(\%)=100\times \frac{{\lambda }^{{{{\rm{PI}}}}}-{\lambda }^{{{{\rm{PI,D}}}}}}{{\lambda }^{{{{\rm{PI,D}}}}}}\end{array}$$
(2)

where λPI,D is the property for a fully-deuterated system simulated via PIMD; deuterated systems are often expected to approximate classical behavior. The quantity \({\Delta }_{{\lambda }^{{{{\rm{D\to H}}}}}}\) is equivalent to \(1-\frac{{\lambda }^{{{{\rm{H}}}}}}{{\lambda }^{{{{\rm{D}}}}}}\) where \(\frac{{\lambda }^{{{{\rm{H}}}}}}{{\lambda }^{{{{\rm{D}}}}}}\) is often noted as the (experimentally accessible) equilibrium isotope effect. Trends in \({\Delta }_{{\lambda }^{{{{\rm{D\to H}}}}}}\) (Fig. 1C) align directionally with Δλ (Fig. 1B) but with reduced magnitudes. However, beyond consistent shifts in molar volume, the effect of deuteration is often within the range of statistical error. This change is most notable for κT, where the average effect shifts from 7.8% when comparing to purely classical systems to 3.5% when comparing to quantum mechanical but deuterated ones. This highlights that deuteration does not fully replicate a purely classical treatment. A system with a negligible isotope effect may still display significant NQEs, which partially cancel between the isotopically normal and deuterated systems.

Key molecular determinants

We next identify key molecular features that influence the strength of NQEs in liquid systems. Given its consistent and statistically resolvable effects, in addition to vm being a fundamental thermophysical property relevant to equations of state and other material properties, our discussion focuses on \({\Delta }_{{v}_{{{{\rm{m}}}}}}\), while analyses of other properties are included in Supplementary Fig. 4.

To visualize how NQEs vary across a broad chemical space, we apply dimensionality reduction from a high-dimensional molecular feature space to two dimensions using unsupervised machine learning. Specifically, we use the Uniform Manifold Approximation and Projection (UMAP) algorithm65,66 on a dataset that includes our 92 investigated molecules and an extended set of 2874 molecules from the ChEMBL database67,68. Each molecule is initially represented by a 34-dimensional atom-type feature vector generated using the Merck Molecular Force Field (MMFF94)69, and then UMAP constructs a neighborhood graph from the collection of such vectors, models pairwise relationships probabilistically, and optimizes a low-dimensional embedding that preserves both local and global structure. The resulting collective variables, Z1 and Z2, are non-linear transformations of the high-dimensional feature values, which obfuscates facile interpretation. Nevertheless, Z1 and Z2 define a coordinate space that can be easily visualized and for which chemical similarity trends with distance.

Within the broader chemical context of the ChEMBL database, we find that the simulated substances indeed span a wide chemical space (Fig. 2A). In terms of readily identifiable trends, chemical structures displaying lower \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) are seemingly clustered in the Z1-Z2 space in the vicinity of molecules with generally higher molecular weights, such as chloroform. In contrast, molecules with larger \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) are distributed across the manifold, indicating that diverse molecular features may enhance NQEs. This is further supported by the fact that different functional groups can be associated with population-level effects on \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) (Supplementary Fig. 6). For instance, molecules with amines tend to exhibit higher \({\Delta }_{{v}_{{{{\rm{m}}}}}}\), and those containing heavier atoms exhibit lower \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). At the same time, there is significant variation of \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) within each population of functional groups, such that attributing NQEs purely to functional groups is difficult. This highlights the necessity to consider additional factors to ultimately understand the manifestation of NQEs.

Fig. 2: Correlation of chemical features with the magnitude of NQEs.
figure 2

A A two-dimensional manifold organization of the 92 organic molecules (blue, larger markers) and 2874 small molecules (tan, smaller markers) obtained from the ChEMBL database. The coordinates Z1 and Z2 are derived by applying the Uniform Manifold Approximation and Projection (UMAP) algorithm to a 34-dimensional input vector based on MMFF94 atom typing69. Inset lines highlight representative molecules from different regions of the manifold. B Comparison of predicted versus simulated effect of NQEs on molar volume (\({\Delta }_{{v}_{{{{\rm{m}}}}}}\)). Predicted values are obtained from a data-driven model with three chemically interpretable input features: average atomic mass (mw), hydrogen density (\({n}_{{{{\rm{H}}}}}^{\,{\mbox{cl}}\,}\)), and thermal expansion coefficient (\({\alpha }_{{{{\rm{P}}}}}^{\,{\mbox{cl}}\,}\)); the latter two are determined from classical MD simulations. Error bars reflecting the standard error of the mean, determined using four independent simulations and from five different training and testing cycles for the data-driven model, are generally smaller than the symbol size. The markers (blue) use the same color scale as those in (A). C Impact of feature contributions based on Shapley Additive Explanations (SHAP) analysis. The position on the x-axis indicates the impact of each feature on the model output, and the marker color indicates feature value. Feature values (λi) are displayed following a Yeo-Johnson power transformation, denoted by ψ, to present data on similar scales83. For visual clarity, the SHAP data are shown over a restricted range; Supplementary Fig. 5 depicts the full observed range of data. Source data are provided as a Source Data file.

To gain insight into what factors correlate with NQEs, we create a data-driven model for \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) from simple descriptors, with the rationale that an effective model would highlight the importance of those descriptors. We find that a model using just three inputs-average atomic mass (mw), hydrogen density (\({n}_{\,{{\rm{H}}}}^{{{\rm{cl}}}\,}\)), and thermal expansion coefficient (\({\alpha }_{P}^{\,{{\rm{cl}}}\,}\))-accurately predicts \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) (Fig. 2B). The coefficient of determination over all predictions is R2 = 0.881 ± 0.002. Here, predictions are made as \({\hat{\Delta }}_{{v}_{{{{\rm{m}}}}}}={\hat{T}}_{{v}_{{{{\rm{m}}}}}}{\alpha }_{P}^{{{\rm{cl}}}\,}\), where \({\hat{T}}_{{v}_{{{{\rm{m}}}}}}={{\rm{RF}}}({m}_{w},{n}_{{{{\rm{H}}}}}^{{{\rm{cl}}}\,})\) is the output of a random forest (RF) regressor trained to predict a defined quantity \({T}_{{v}_{{{{\rm{m}}}}}}\equiv {({\alpha }_{P}^{{{{\rm{cl}}}}})}^{-1}{\Delta }_{{v}_{{{{\rm{m}}}}}}\), and the RF model is trained on all data except that of the molecule being predicted. We empirically find that this approach outperforms models formulated as \({\hat{\Delta }}_{{v}_{{{{\rm{m}}}}}}=\,{{\rm{RF}}}({m}_{w},{n}_{{{{\rm{H}}}}}^{{{{\rm{cl}}}}},{\alpha }_{P}^{{{\rm{cl}}}\,})\), likely due to the large variability in \({\alpha }_{P}^{\,{{\rm{cl}}}\,}\) across molecules and its nature as a response function, unlike the more intrinsic descriptors mw and \({n}_{\,{{\rm{H}}}}^{{{\rm{cl}}}\,}\). As a point of possible interest, we note that \({T}_{{v}_{{{{\rm{m}}}}}}\) has units of temperature and, to leading order, can be interpreted as the temperature shift required for a classical system to match the molar volume of its quantum counterpart (i.e., \({v}_{{{{\rm{m}}}}}^{\,{{\rm{cl}}}}(T+{T}_{{v}_{{{{\rm{m}}}}}})\approx {v}_{{{{\rm{m}}}}}^{{{\rm{PI}}}\,}(T)\)). This interpretation does not imply a true physical equivalence between the quantum system and a classical system at elevated temperature-a comparison that has been cautioned against as potentially misleading in prior work70. Across the molecules studied, \({T}_{{v}_{{{{\rm{m}}}}}}\) ranges from a few to several tens of degrees (Supplementary Fig. 7).

We use Shapley Additive Explanations (SHAP) analysis71 to attribute feature contributions to predictions made by our machine learning model. This technique assesses the impact of each feature by computing the difference in model output with and without its inclusion for all feature combinations. This analysis reveals some intuitive trends: abundance of hydrogens and greater \({n}_{\,{{\rm{H}}}}^{{{\rm{cl}}}\,}\) enhance \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) (Fig. 2C). These effects reflect a general dependence on molecular composition, as systems with low mass-density or hydrogen-rich systems would generally imply greater sensitivity to NQEs due to the presence of lighter nuclei. However, \({\alpha }_{P}^{\,{{\rm{cl}}}\,}\) correlates positively with \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). SHAP analysis also indicates that \({\alpha }_{P}^{\,{{\rm{cl}}}\,}\) and \({n}_{\,{{\rm{H}}}}^{{{\rm{cl}}}\,}\) features have greater impact on model output than mw, which has a non-monotonic relationship with \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). Although variable importance was established within the context of 92 organic molecules, it is useful to consider how these principles translate to a well-studied system, like water. Using classical simulation data from the q-TIP4P/F water model44 as input, the data-driven model predicts \({\hat{\Delta }}_{{v}_{{{{\rm{m}}}}}}^{{{{\rm{RF}}}}}=0.47\%\), in good agreement with the simulation result \({\Delta }_{{v}_{{{{\rm{m}}}}}}=0.25\%\). Despite water being chemically distinct from the other molecules studied, the model captures the small impact on vm due to NQEs-an outcome largely driven by its low αP.

Overall, these results indicate a complex interplay between NQEs, fluid composition, and its stability that is yet predictable through simple properties readily computed with classical simulation.

Competing effects of system stability and hydrogen density

At first glance, the trends for nH and αP appear to reflect competing effects: higher nH suggests stronger intermolecular forces, which would typically reduce αP. To explore this trade-off, we leverage the dataset to create controlled comparisons by fixing either nH or αP and examining the effect of the other on \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). This is accomplished through k-means clustering to group molecules with similar nH values and examining correlations of the groups with \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) (Fig. 3). Within groups with similar nH, a decrease in fluid stability (indicated by higher αP) tends to enhance NQEs. Conversely, across groups with comparable stability (similar αP), an increase in nH (and thus more light nuclei) leads to stronger NQEs. Nevertheless, the overlapping αP ranges across different nH groups (with some weak negative correlation) suggest that both properties facilitate predicting the magnitude of NQEs.

Fig. 3: Analysis of hydrogen density and thermal expansion coefficient.
figure 3

Four groups of data are generated with k-means clustering algorithm on the respective hydrogen densities nH; the marker colors indicate being within the range specified by the color bar. The colored regions are distinguished by support vector machine (SVM) margin lines. Error bars (black) represent the standard error of the mean obtained from four independent simulations. For visual clarity of trends, the data are shown over a restricted range; Supplementary Fig. 8 depicts the full observed range of data. Source data are provided as a Source Data file.

Trends with hydrogen bonding and branching

To elucidate how \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) correlates with molecular constitution, we systematically compare groups of molecules varying in hydrogen bonding (Fig. 4A) and branching (Fig. 4B) and examine how key descriptors, nH and αP, are affected.

Fig. 4: Impact of hydrogen bonding and branching on the extent of NQEs.
figure 4

A Hydrogen densities and thermal expansion coefficients for systems of various hydrogen-bonding groups and counts. Visualized molecules are labeled as (i) 1-bromobutane, (ii) 1-chlorobutane, (iii) butane-1-thiol, (iv) butan-1-ol, (v) butane-1,4-diol, (vi) pentane-1,5-diol, (vii) propane-1,2,3-triol for reference. B Comparison between four pairs including linear molecules and their branched analogues of chemically similar structures. The impact of branching on the thermal expansion coefficient and hydrogen density is shown on the y- and x-axis, respectively. The marker color intensity denotes the magnitude of NQEs, indicated by the color bars and arrow directions in each panel. Visualized molecules are labeled as (I) butan-1-amine, (II) 2-methylpropan-2-amine, (III) propan-1-amine, (IV) propan-2-amine, (V) pentan-1-ol, (VI) 2-methylbutan-2-ol, (VII) butan-1-ol, (VIII) 2-methylpropan-2-ol. It should be noted that while the two color bars have different bounds, the range of the bounds is equal. Error bars (black) represent the standard error of the mean obtained from four independent simulations. Source data are provided as a Source Data file.

In Fig. 4A, comparing 1-bromobutane (i), 1-chlorobutane (ii), butane-1-thiol (iii), and butan-1-ol (iv) highlights the positive impact of hydrogen bonding on nH and \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). Among these molecules– which share the same number of heavy atoms, the same chemical topology, and comparable αP–the hydrogen-bonding thiol and alcohol exhibit higher nH. However, nH and \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) do not scale monotonically with additional hydrogen-bonding groups. This is revealed by comparison of butan-1-ol (iv), butane-1,4-diol (v), pentane-1,5-diol (vi), and propane-1,2,3-triol (vii). Among these alcohols, increasing the number of hydrogen-bonding groups minimally affects hydrogen density but increases the stability of the fluid (decreases αP). Evaluation of different hydrogen-bonding groups reveals similar trends (Supplementary Fig. 9) where the balance of nH and αP elucidates the resulting \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). For example, relative to alcohols with similar nH, amines display larger NQEs and associated larger αP.

In Fig. 4B, comparing multiple linear molecules and their branched analogues, we observe consistent impact on NQEs. In particular, the branched molecule exhibits larger \({\Delta }_{{v}_{{{{\rm{m}}}}}}\) than the linear one. Across all four pairs, branching reduces nH, but there is a coupled increase in αP. For these molecules, the net result is stronger NQEs in the branched molecules. We therefore suggest that NQEs, at least as manifested through \({\Delta }_{{v}_{{{{\rm{m}}}}}}\), are likely to be more substantial in branched molecules owing to relatively diminished fluid stability. This observation is reminiscent of an expected trend: boiling points for branched molecules are typically lower than those of linear molecules of comparable molecular weight and composition.

Detailed physics of linear versus branched systems

The conventional rationale for the trend involving boiling temperatures is that branched molecules have reduced surface area and less efficient packing, which weakens intermolecular forces and impacts the magnitude of NQEs. Intriguingly, the linear and branched molecules studied here exhibit comparable interaction patterns with respect to the number of hydrogen bonds, their strength, and average nearest-neighbor distances (Supplementary Fig. 10). This renders the importance of hydrogen-bonding groups on trends in αP, and thus their impact on \({\Delta }_{{v}_{{{{\rm{m}}}}}}\), unclear.

Resolving interaction patterns as a function of distance between neighboring molecules, however, illustrates key differences between butan-1-ol with its branched analogue, 2-methylpropan-2-ol (Fig. 5). The distribution of hydrogen bonds (Fig. 5A) shows that butan-1-ol forms bonds over a broad range of intermolecular distances, while methylpropan-2-ol, with less conformational flexibility, forms bonds only at separations of approximately 4.6 Å. A further distinction emerges in energetics of molecular interactions within the fluid (Fig. 5B): 2-methylpropan-2-ol exhibits a pronounced minimum in interaction energy near the distance associated with hydrogen bonding, while butan-1-ol shows steadily diminishing attractive interactions with distance. These nuanced changes in intermolecular interactions due to branching result in a pronounced increase in αP, which tends to enhance \({\Delta }_{{v}_{{{{\rm{m}}}}}}\). Notably, the minimum interaction energy for butan-1-ol occurs at a distance where few or no hydrogen bonds form. Additionally, butan-1-ol lacks a strong preference for relative molecular orientation, unlike 2-methylpropan-2-ol (Fig. 5C). These findings suggest that the cohesive energy of 2-methylpropan-2-ol relies more heavily on hydrogen bonding whereas butan-1-ol exhibits overall stronger dispersion forces. This comparison also serves to demonstrate the sensitivity of NQEs to subtle changes in nanoscale interactions.

Fig. 5: Effect of branching on intermolecular interactions through comparison of linear butan-1-ol with branched 2-methylpropan-2-ol.
figure 5

Ensemble averages of (A) the hydrogen-bond density, (B) pairwise intermolecular interaction energies, and (C) interaction orientations as a function of distance between molecules. The inset images of panel (C) illustrate calculations of the interaction distance and orientation; the renderings adjacent to the y-axis illustrate configurations consistent with the nearest tick marks. Error regions indicate the standard error obtained from four simulations for each bin. All molecular renderings were generated using the Visual Molecular Dynamics (VMD) software84 (version 1.9.4). Source data are provided as a Source Data file.

Discussion

Nuclear quantum effects (NQEs) are present in all physical systems, but their significance is rarely known beyond certain well-studied systems (e.g., water) or expected scenarios (e.g., low temperatures). In this study, we quantified the impact of NQEs at ambient conditions on various properties across 92 organic molecules spanning diverse chemical space. By comparing path-integral and classical simulations, we observed that NQEs tend to significantly and measurably enhance molar volumes and isothermal compressibilities. Effects on thermal expansion coefficients, dielectric constants, and enthalpies of vaporization lack clear general directional trends relative to the uncertainty of the calculations themselves, though individual systems can still be significantly impacted. Simulations of deuterated systems reveal similar but reduced effects, with isotope-induced changes generally within the statistical uncertainty, except for molar volume. This suggests deuteration often underestimates the full extent of NQEs, as a quantum treatment of deuterium still impacts properties.

We further identified physical properties that effectively correlate with the magnitude of NQEs. This was particularly demonstrated by a data-driven model with interpretable inputs of the average mass of composite atoms, hydrogen densities, and thermal expansion coefficients. We posit this as a pragmatic approach to anticipate the relevance of NQEs and the necessity of computationally intensive path-integral simulations. Beyond prediction, these property trends highlight key physical factors, some intuitive and others less so, affecting NQEs. An intriguing finding from the analysis is the competing effect between hydrogen density and fluid stability (linked to the thermal expansion coefficient), where both factors enhance NQEs but are somewhat inversely correlated.

By exploiting the chemical diversity of our dataset, molecular-level insights were obtained by examining sets of molecules with variations in hydrogen content, hydrogen-bonding capabilities, and comparisons between linear and branched topologies. Grouping data based on functional groups revealed that NQEs are generally more significant in amines compared to ethers and alcohols, for example. Broadly, systems with high hydrogen density (and therefore light nuclei) but weaker interactions, leading to lower stability, exhibit the largest NQEs. This explains a potentially unintuitive result that systems with multiple hydrogen-bonding groups may not necessarily exhibit strong NQEs because the hydrogen bonding tends to enhance fluid stability, reflected in a smaller thermal expansion coefficient. By contrast, molecular branching has a general tendency to both reduce fluid stability and hydrogen density. This study advances our understanding of NQEs in commonly used substances, guiding experimental and computational approaches and providing a framework for when explicit treatment of NQEs is necessary to capture essential physical phenomena.

Methods

General simulation protocols

Systems were modeled with the quantum chemistry-based Topology Automated Force-Field Interactions (TAFFI) framework50 with Waldman-Hagler mixing rules. The TAFFI framework was chosen for this study due to its efficiency as an analytical force field, its strict parameterization using quantum chemical calculations at the ωB97X-D3/def2-TZVP level, and the generally good agreement for a range of thermophysical properties, comparing values calculated from classical simulations and those reported experimentally50. Real-space non-bonded interactions were truncated at 14 Å, unless otherwise noted. Long-range electrostatics calculations used the Particle Mesh Ewald (PME) algorithm; tail corrections were applied for Lennard-Jones interactions beyond the real-space cutoff, unless otherwise noted. All PIMD simulations used a ring-polymer bead count of P = 32. All simulations used a timestep of 0.5 fs. Constant-temperature conditions of 298.15 K were achieved using a Langevin thermostat for classical simulations and the path-integral Langevin equation (PILE) thermostat72 for path-integral simulations; both utilize a friction coefficient of 1 ps−1. Constant-pressure conditions at 1 atm were achieved using a Monte Carlo barostat with an attempt frequency of 25 fs.

The force field in tandem with aforementioned settings enables accurate reproduction of liquid-phase experimental densities51 for 87 systems with available empirical data (Supplementary Fig. 11). Benchmarking of classical and PIMD simulation results also shows that path-integral treatment of most systems brings predicted molar volumes closer to experiment. To gain quantitative insight, we apply a one-tailed paired t-test on the differences with experiment and report p = 4.8 × 10−17 <0.05, indicating that PIMD results are systematically closer to experiment. These results, along with good alignment between force-field intramolecular normal-mode frequencies and DFT results, suggest that TAFFI can effectively represent and discriminate the chemically diverse set of systems in this study.

Property calculations and analyses were performed for four independent classical and path-integral MD simulations of each molecular system. Each simulation contained the minimum number of molecules to reach 5000 atoms, initialized with random positions and orientations within a simulation cell of V = 50 nm3. The initial configuration was subjected to energy minimization with an energy tolerance of 10 kJ/mol followed by 0.1 ns simulation in the microcanonical ensemble. The system was then equilibrated in the canonical ensemble at 400 K for 0.2 ns and cooled to 298.15 K over 0.8 ns with a linear temperature schedule. Subsequently, systems were equilibrated in the isothermal-isobaric ensemble at 1 atm for 3 ns followed by a production run of 10 ns, during which thermodynamic data were collected, and system configurations were saved every 10 ps. Simulations of q-TIP4P/F water systems followed a similar procedure with 1000 water molecules and with a real-space non-bonded cutoff distance of 10 Å. The above procedure was found to be insufficient to equilibrate C14 and C15 systems. For these, initial structures were generated using a 100 ns classical simulation in the isothermal-isobaric ensemble at 298.15 K and 1 atm, followed by 3 ns of additional equilibration and 10 ns production runs using either classical MD or PIMD. These protocols were sufficient to achieve adequate energy conservation and effectively converge the system density (see Supplementary Figs. 12 and 13).

All simulations were performed using the GPU implementation of the OpenMM 7.7.0 software package73. PIMD simulations were performed on NVIDIA A100 and P100 GPUs, with mean simulation times of 46.5 and 89.3 h, respectively, in contrast with classical MD simulations of 1.7 h on A100 GPUs. Total simulation time throughout the study was 55,383 GPU hours.

Gas-phase simulations

Single-molecule simulations of each system were used to calculate the gas-phase energy to obtain Δhvap. A molecule was placed in a large box with a volume of 125  nm3 and simulated in the microcanonical ensemble for 1 ns, followed by the canonical ensemble at 298.15 K for 0.4 ns using an Andersen thermostat with a collision probability of 10 ps−1. For these simulations, all pairwise interactions were evaluated in real-space and restricted to within 15 Å to ensure that there are no interactions with periodic images. Linear and angular momenta of each system were removed, the particle velocities were rescaled, and the potential energies were saved every 1 ps.

Analysis of system properties

Molar volumes were calculated using

$$\begin{array}{l}{v}_{{{{\rm{m}}}}}=\frac{M}{\langle \rho \rangle }\end{array}$$
(3)

where M is the molar mass of the molecule and 〈ρ〉 is the average (mass) density.

Thermal expansion coefficients, αP, were calculated using

$$\begin{array}{l}{\alpha }_{{{{\rm{P}}}}}=\frac{1}{\langle V\rangle }{\left(\frac{\partial V}{\partial T}\right)}_{P}\end{array}$$
(4)

where \({\left(\frac{\partial V}{\partial T}\right)}_{P}\) was numerically approximated using average volumes from simulations at 293.15, 298.15, and 303.15 K. The additional simulations at 293.15 and 303.15 K were performed using the equilibrated configurations from 298.15 K. The systems were equilibrated in the isothermal-isobaric ensemble at for 0.5 ns, followed by 1.5 ns production runs where the system volume was saved every 1 ps.

Isothermal compressibilities, κT, were calculated using the fluctuations in the system volume using

$$\begin{array}{l}{\kappa }_{{{{\rm{T}}}}}=\frac{\langle {V}^{2}\rangle -{\langle V\rangle }^{2}}{{{{{\rm{k}}}}}_{{{{\rm{B}}}}}T\langle V\rangle }\end{array}$$
(5)

where kB is the Boltzmann constant. Similarly, static dielectric constants, εr(0) were calculated using

$${\varepsilon }_{{{{\rm{r}}}}}(0)=\frac{1}{P}{\sum }_{i=1}^{P}1+\frac{4\pi }{3}\frac{\langle {{{{\bf{M}}}}}_{{{{\bf{i}}}}}^{2}\rangle -{\langle {{{{\bf{M}}}}}_{{{{\bf{i}}}}}\rangle }^{2}}{V{{{{\rm{k}}}}}_{{{{\rm{B}}}}}T}$$
(6)

where M is the total dipole moment of the simulation box (P = 1 for classical MD simulations).

Molar heats of vaporization, Δhvap, were calculated using

$$\begin{array}{l}\Delta {h}_{{{{\rm{vap}}}}}=\frac{1}{P}{\sum }_{i=1}^{P}{E}_{i}^{(g)}+{{{{\rm{k}}}}}_{{{{\rm{B}}}}}T-\frac{1}{P}{\sum }_{i=1}^{P}{E}_{i}^{(l)}\end{array}$$
(7)

where E(ν) is the total internal energy per mole (including both potential and kinetic energy) in phase ν, either liquid or gas. This expression assumes the ideal gas approximation and that the vapor-phase molar volume is much greater than that of the liquid. Often, the kinetic energy contributions in the gas and liquid phases are expected to cancel, allowing Δhvap to be approximated from potential energies alone51. However, in PIMD simulations, quantum kinetic energy-arising from harmonic coupling of systems in the extended phase space-can differ between phases. Therefore, we retain all energy contributions in the calculation for simplicity.

Unsupervised Learning

Unsupervised learning via Uniform Manifold Approximation and Projection (UMAP) dimension reduction was performed using the Python umap-learn package (version 0.5.3), with MMFF94 atom types to describe the 34-dimensional feature space. The MMFF94 atom types were obtained using the rdkit Python package (version 2022.9.3)74. The UMAP analysis includes additional small organic molecules obtained from the ChEMBL database. Molecules containing rings, net charge, fluorine atoms, or more than two double or triple bonds were excluded, resulting in 2874 molecules. For the UMAP hyperparameters, the analysis used 100 for the size of local neighborhood, 1.0 for the minimum distance between embedded points and Euclidean distances. The k-means clustering algorithm to categorize the systems into four groups based on their nH was generated using the scikit-learn Python package (version 1.0.2) with default hyperparameters75.

Supervised Learning

Supervised learning via random forest regression used the scikit-learn Python package. Model inputs included mw and nH from classical MD simulations. Model training and assessment used a leave-one-out split. Each molecule was tested by training the model on the other 91 molecules. This process was repeated for all 92 molecules. The reported R2 value and its associated error were calculated from five iterations with different random seeds. Each random forest model used 20 decision trees and required two samples per split. Shapley Additive Explanations (SHAP) analysis was performed using the shap Python package (version 0.44.1) on each of the three input parameters for the data-driven model.

The LIBSVM implementation of C-Support Vector Classification algorithm76,77 in scikit-learn Python package (version 1.0.2) was used to generate support vector machine boundary lines for four groups of molecules based on their nH. A linear kernel type and a regularization parameter of 1.0 were chosen as the algorithm hyperparameters.

Analysis of molecular interactions

All interaction analyses were performed and averaged over configurations sampled every 1 ns from four independent simulations of each molecule. The interaction distance, rCM, was defined as the distance between the centers of mass of two molecules. The array of interaction distances was discretized into 50 bins between 3.3 and 9 Å. Hydrogen-bond densities (nHB) were defined as the number of hydrogen bonds in each bin per volume. Hydrogen bonds were defined using geometric criteria with a cutoff distance of 3.6 Å between the donor and the acceptor and a cutoff angle of 150 degrees between the three participating atoms. These calculations were facilitated using the MDAnalysis Python library (version 2.1.0)78,79. Interaction energies, Eint, were computed as the sum of all pairwise non-bonded energy contributions between two molecules. Interaction orientations, \(\cos (\theta )\), were calculated based on the angle (θ) between the bond vectors adjoining the carbon to the oxygen on the alcohol.

Impact of force field

As previously discussed, TAFFI is used because it relies purely on quantum chemical calculations for its parameterization and computational efficiency; however, we acknowledge that the reported magnitude of NQEs could differ based on the underlying force field. To gain some insight into the sensitivity of NQEs, classical and PIMD simulations using two different force fields–the all-atom optimized potentials for liquid simulations (OPLS-AA) force field and the Open Force Field (OpenFF, version 2.0.0 with unconstrained bonds)–were also performed and analyzed for a subset of 11 molecules, following the procedures described in Methods.

Relative to the TAFFI results, OPLS-AA and OpenFF consistently find larger Δvm across all systems (Supplementary Fig. 15). It is important to note that OPLS-AA and OpenFF include empirical adjustments, potentially inherited from prior developments, making them strictly unsuitable for PIMD simulations. As a result, this analysis is merely suggestive rather than definitive regarding differences from alternative representations of the Born-Oppenheimer potential energy surface. However, the results do not indicate significant exaggeration of effect sizes from TAFFI. Future work could explore variations arising from entirely different interaction models, such as machine learning potentials.