Introduction

Defects have the potential to exert both beneficial and detrimental effects on the electronic, optical, mechanical, and magnetic properties of materials. For example, defect engineering plays a pivotal role in applications such as quantum computing1, energy harvesting2, and catalysis3. In contrast, the microelectronic industry has focused on eliminating defects due to their adverse impact on process yield, device performance, and scalability down to smaller feature sizes. In electronic devices, point defects cause charge trapping which primarily manifests as flicker (1/f) noise4 on short time scales and as drift in longer time scales due to bias-temperature instabilities (BTI). However, when device dimensions are aggressively scaled such that only a handful of defects are present within the active device area, 1/f noise gradually appears as random telegraph noise (RTN)5. RTN is observed as stochastic temporal fluctuations in the output current or voltage, compromising device reliability and computational accuracy6,7,8,9,10,11,12,13. As a consequence, the incorporation of error-correcting circuits becomes inevitable, increasing the energy and area overhead14. While defects are considered undesirable and must be eliminated in traditional computing, they present intriguing possibilities in the emerging field of neuromorphic and bio-inspired computing. In this context, emerging two-dimensional (2D) semiconductors, grown synthetically, which are naturally inflicted with a variety of defects, provide an excellent platform for both investigating and harnessing defects.

In this study, we utilize high-resolution atomistic imaging and low-temperature spectroscopy techniques to reveal various types of point defects present in metal-organic chemical vapor deposition (MOCVD) grown WSe2 films. Subsequently, using ab initio computational analysis and device-level modeling, we identify Se antisites (\({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\)) and W vacancies (\({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\)) as plausible defect candidates to contribute to the observation of RTN in ultra-scaled WSe2 field-effect transistors (FETs), as both defect types generate hole trapping states with charge transition levels (CTLs) near the valence band edge. Next, we utilize RTN to develop a stochastic encoder for image encoding and show that medical Modified National Institute of Standards and Technology (MNIST) images, encoded into stochastic spike trains and fed into a pre-trained spiking neural network (SNN), yield significantly higher inference accuracy (> 85%) compared to deterministic encoding (<55%), especially when the images were inflicted with substantial noise. In essence, we demonstrate that the dynamics of point defects in ultra-scaled WSe2 FETs can be exploited for noise resilient information processing.

Results

Defects in monolayer WSe2

An MOCVD technique was used to grow large-area WSe2 films on 2-inch sapphire substrates using tungsten hexacarbonyl, W(CO)6 (99.99%, Sigma-Aldrich) and hydrogen selenide, H2Se (99.99%, Matheson) as the precursors (see the Methods section for details on the synthesis procedures and parameters). Scanning transmission electron microscopy (STEM) was used to investigate defects in the MOCVD-grown WSe2 film. A polymethyl methacrylate (PMMA)-assisted transfer technique was used to transfer the as-grown film from the sapphire growth substrate to a TEM grid. Figure 1a shows the structure of WSe2 viewed down its c-axis with atomic resolution high-angle annular dark field (HAADF)-STEM imaging at an 80 kV accelerating voltage. The film appears to have a crystalline 2H-WSe2 structure with several point defects that include Se vacancies (\({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\)), antisite defects with Se substituting W (\({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\)) and W substituting Se (\({{{{\rm{W}}}}}_{{{{\rm{Se}}}}}\)), and W vacancies (\({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\)) as shown in Fig. 1b. Our observations are consistent with literature15 where \({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\), \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\), \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\), and \({{{{\rm{W}}}}}_{{{{\rm{Se}}}}}\) have been identified as the most abundant defect types in monolayer WSe2. Figure 1c, d depicts the defect configuration obtained from structural optimizations with density functional theory (DFT) of an \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\) in the neutral and positive charge states, respectively. The defect configuration is formed when a Se atom substitutes for a W atom in the WSe2 lattice. More information on the models and functions used for DFT simulations is provided in the Methods section. The electronic wavefunctions of the localized molecular orbitals at the defect site are drawn at an iso value of 0.05 \(e{{{\text{\AA }}}}^{-3}\). Here, blue bubbles represent the highest occupied molecular orbital (HOMO) of the neutral structure, while red bubbles depict the lowest unoccupied molecular orbital (LUMO) of the positively charged system. We find that this configuration is stable when the Se atom moves away from the W lattice position and stabilizes in the center of four adjacent Se atoms near the defect site. The HOMO of the neutral structure is localized around several W atoms of the defect site. An added hole localizes at the W atoms around the defect site, following small atomic relaxations. It is observed that both the \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) and \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\) could act as active hole trapping sites in WSe2, thereby accounting for the observed RTN due to hole trapping and detrapping at the defect site; this will be discussed further in the forthcoming section. The relaxed atomic configuration of a \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) is depicted in Supplementary Fig. 1. For the neutral \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\), the HOMO is concentrated at the Se atoms surrounding the vacancy. When a hole is added to the system, it localizes on these Se atoms, following atomic relaxations. To confirm our assumption of hole trapping at \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\) and \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) sites as the most promising defect candidates, the thermodynamic CTLs for hole exchange at the defect sites are calculated as described in Supplementary Fig. 2. Compared to the Kohn-Sham levels of a defect from DFT calculations, which are obtained for a fixed atomic configuration, the CTL also accounts for the unavoidable energy change of the system upon charge trapping due to atomic relaxations and can thus be related to experimentally detected trap levels. The CTL is calculated by comparing the formation energies of the defect in the positive and neutral charge states16 and corresponds to the intersection point of two formation energy functions depending on the Fermi level. It is thus equal to the Fermi level at which the formation energies of the neutral and charged systems match and thus both charge states of the defect are equally stable. The CTLs and corresponding band diagrams are shown in Supplementary Fig. 2; relaxation energies for hole capture and release are provided in Supplementary Table 1 along with more comprehensive details regarding the calculations.

Fig. 1: Defects in monolayer WSe2.
figure 1

a STEM-HAADF image of MOCVD-grown large-area WSe2 film viewed on its c-axis. Scale bar, 2 nm. The STEM image reveals the presence of several points defects such as \({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\), \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\), \({{{{\rm{W}}}}}_{{{{\rm{Se}}}}}\) and \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) as shown in (b). Scale bar, 0.5 nm. Atomic structure of a selenium antisite in a WSe2 monolayer geometry optimized with DFT (c) in the neutral charge state with the HOMO depicted as blue bubbles at an isovalue of 0.05 e Å−3 and (d) in the positive charge state with the localized hole (LUMO) drawn as red bubbles. Se atoms are drawn in yellow and W atoms in pink. e Raman spectra obtained from WSe2 film post transfer onto the target substrate, showing the characteristic in-plane \({{{{\rm{E}}}}}_{2{{{\rm{g}}}}}^{1}\) modes at 250.06 cm-1 and the longitudinal-acoustic mode (2LA (M)) at 258.24 cm-1. The absence of the \({{{{\rm{B}}}}}_{2{{{\rm{g}}}}}\) peak at 310 cm-1, which is ascribed to interlayer interactions between the different layers of the WSe2 film, confirms the monolayer nature of the transferred WSe2 film. f Photoluminescence (PL) spectra measured at 300 K with characteristic A-excitonic peak at 1.66 eV. g PL spectra measured at 77 K exhibiting exciton, trion and an increased peak from defect-induced states.

Next, the WSe2 film possessing native point defects was wet-transferred to a back-gate stack that consists of 25 nm atomic layer deposition (ALD) grown Al2O3 on sputter-deposited Pt/TiN on a p++-Si substrate via the PMMA-assisted wet-transfer process. Following transfer, Raman, and photoluminescence (PL) spectroscopy were performed to gain further insights on the native defects present in the film as well as to assess the film quality and spatial uniformity. Figure 1e shows the Raman spectra obtained at 300 K; the characteristic in-plane \({{{{\rm{E}}}}}_{2{{{\rm{g}}}}}^{1}\) mode was observed at 250.06 cm−1 and the longitudinal-acoustic mode, \(2{{{\rm{LA}}}}({{{\rm{M}}}})\), was observed at 258.24 cm−1. The absence of the \({{{{\rm{B}}}}}_{2{{{\rm{g}}}}}\) peak at 310 cm−1, which is ascribed to interlayer interactions between the different layers of the WSe2, confirms the monolayer nature of the WSe2 film. Similarly, Fig. 1f shows the PL spectra measured at 300 K with a characteristic A-excitonic peak at 1.66 eV. However, when examining the PL spectrum obtained at 77 K, as depicted in Fig. 1g, we observe the presence of an elevated peak originating from defect-induced states in addition to the characteristics excitonic and trionic peaks. Supplementary Fig. 3a, b, respectively, show the PL spectrum at different temperatures and the normalized peak intensity of the defect-induced states as a function of temperature. The emergence of defect induced states is commonly associated with the presence of point defects such as \({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\), \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\), and \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\) as evident from prior studies17,18,19,20,21. Hence, the STEM images and accompanying low-temperature spectroscopy convincingly point to the presence of native point defects in MOCVD-grown monolayer WSe2. Supplementary Fig. 3c shows the spatial map for the defect-induced peak over a 25 µm \(\times\) 25 µm area at T = 77 K, confirming that the defects are uniformly distributed throughout the WSe2 film.

Observation of RTN in ultra-scaled WSe2 FETs

Following spectroscopic analysis of the transferred WSe2 film, electron beam (e-beam) lithography and SF6 plasma dry etching were used to define the WSe2 channel area. The source and drain contacts were then defined using another set of e-beam exposures. Finally, e-beam evaporation was performed to deposit 20 nm palladium (Pd) to serve as contacts to the WSe2 FETs. More details on device fabrication can be found in the Methods section and in our previous works22,23,24,25,26,27. Figure 2a, b, respectively, show the schematic and top-view scanning electron microscope (SEM) image of a representative back-gated monolayer WSe2 FET with 20 nm channel length (\({L}_{{{{\rm{ch}}}}}\)) and 500 nm channel width (\({W}_{{{{\rm{ch}}}}}\)). Supplementary Fig. 4 shows a zoomed-in SEM image confirming \({L}_{{{{\rm{ch}}}}}\) to be 20 nm. The dual-sweep transfer characteristics, i.e., \({I}_{{{{\rm{DS}}}}}\) as a function of the back-gate voltage, \({V}_{{{{\rm{BG}}}}}\), of this representative WSe2 FET, measured at \({V}_{{{{\rm{DS}}}}}\) = 1 V, are shown in Fig. 2c, d, for two different temperatures, \(T\) = 300 K and 15 K, respectively. Supplementary Fig. 5 shows the device statistics obtained from 30 WSe2 FETs showing the transfer characteristics and corresponding distributions for the field-effect mobility (\({\mu }_{{{{\rm{FE}}}},{{{\rm{p}}}}}\)), subthreshold slope, maximum ON-current, threshold voltage and interface trap density. Figure 2e, f illustrate the \({I}_{{{{\rm{DS}}}}}\) traces acquired at \(T\) = 300 K and 15 K for \({V}_{{{{\rm{BG}}}}}\) = −6 V and −3.5 V, respectively, with a sampling interval of \({\tau }_{{{{\rm{s}}}}}\) = 2 ms. It is important to note that the selection of different \({V}_{{{{\rm{BG}}}}}\) biases was deliberate to ensure similar \({I}_{{{{\rm{DS}}}}}\) values. Notably, at \(T\) = 300 K, there are no observable RTNs. However, at \(T\) = 15 K distinct RTN patterns become clearly evident because the majority of defects that contribute to the 1/f noise have frozen out28.

Fig. 2: Observation of RTN in ultra-scaled monolayer WSe2 FETs.
figure 2

a Schematic and (b) scanning electron microscope (SEM) image of an ultra-scaled WSe2 FET with 25 nm atomic layer deposition grown Al2O3 as the gate dielectric, Pt/TiN/p++-Si as the back-gate electrode, Pd as the source/drain contact metal. The \({L}_{{{{\rm{ch}}}}}\) and \({W}_{{{{\rm{ch}}}}}\) were defined to be 20 nm and 500 nm, respectively. Scale bar, 200 nm. Dual-sweep transfer characteristics of the ultra-scaled WSe2 FET measured using \({V}_{{{{\rm{DS}}}}}\) = 1 V at different temperatures, c \({{{\rm{T}}}}\) = 300 K and d \({{{\rm{T}}}}\) = 15 K. Corresponding \({I}_{{{{\rm{DS}}}}}\) sampled every \({\tau }_{{{{\rm{s}}}}}\) = 2 ms at e \({{{{\rm{V}}}}}_{{{{\rm{BG}}}}}\) = -6 V and f \({V}_{{{{\rm{BG}}}}}\) = −3.5 V for \({{{\rm{T}}}}\) = 300 K and \({{{\rm{T}}}}\) = 15 K, respectively. RTN is absent at \({{{\rm{T}}}}\) = 300 K while distinct RTN signals are observed at \({{{\rm{T}}}}\) = 15 K. Power spectral density (PSD) obtained using the fast Fourier transform (FFT) of the measured \({I}_{{{{\rm{DS}}}}}\) at g \({{{\rm{T}}}}\) = 300 K and h \({{{\rm{T}}}}\) = 15 K. The presence of RTN results in a Lorentzian profile in the frequency ___domain, i.e., slope = 1/f2, whereas the superposition of many RTN results in flicker noise in the frequency ___domain with a slope = 1/f.

To explain the above observation, we must note that RTN traces are seen in FETs primarily when a single dominant defect is engaged in the charge carrier trapping and detrapping processes, thus leading to a finite and discrete shift in the threshold voltage (\({V}_{{{{\rm{TH}}}}}\)) of the device, that manifests as discrete fluctuations in \({I}_{{{{\rm{DS}}}}}\)29,30. RTN traces are, therefore, more easily observable in scaled FETs since the number of defects scales with the channel area31. In addition, the impact of a single defect scales with the channel area, making the impact of dominant defects more pronounced in scaled FETs. It must be noted that the impact of single defects on RTN follows an exponential distribution. The single defect regime typically consists of a few defects (~10), with the defects in the tail of the distribution affecting the observed RTN. Supplementary Fig. 6 shows the RTN traces obtained from several WSe2 FETs at \(T\) = 15 K with increasing channel area (\({A}_{{{{\rm{ch}}}}}\)) ranging from 0.01 \({{{{\rm{\mu }}}}{{{\rm{m}}}}}^{2}\) to 3 \({{{{\rm{\mu }}}}{{{\rm{m}}}}}^{2}\). As \({A}_{{{{\rm{ch}}}}}\) increases, the discrete nature of the RTN traces disappears because multiple RTN traces associated with different defects start to superimpose, underscoring the importance of fabricating ultra-scaled devices to observe and investigate the RTN phenomenon.

Temperature also plays a significant role in the observation of RTN as it supplies the required energy for phonons to overcome the barriers imposed by structural relaxation. At high temperatures, charge carriers surmount large trapping barriers. Conversely, at lower temperatures these barriers are too large, making transitions only possible via nuclear tunneling between the configurations. This is typically much less likely than over-the-barrier-reactions at higher temperatures, resulting in freezing out of oxide traps32. Thus, access is limited to defects with smaller relaxation energies, e.g., those in the crystalline semiconducting channel or defects at the dielectric/semiconductor interface with energy levels close to the band edges33. According to the standard non-radiative multi phonon model, the capture and emission of carriers by defect states involves both a tunneling event as well as a structural relaxation at the defect site, which determines the barriers and results in the characteristic time constants of the corresponding defect28,32,34. These discrete tunneling events manifest as RTN in the time ___domain and as a Lorentzian spectrum (slope =1/f2) in the frequency ___domain. If the barriers are uniformly distributed in energy5, the summation of all RTN events, each with different characteristic time constants, gives rise to the universally observed 1/f noise spectra in most electronic devices (see Supplementary Fig. 7). This is evident from Fig. 2g, h, which show the power spectral density (PSD) plots obtained by taking the fast Fourier transform of the \({I}_{{{{\rm{DS}}}}}\) data shown in Fig. 2e, f. The Lorentzian frequency spectrum obtained for the RTN trace corresponding to \(T\) = 15 K indicates that only one defect state is dominant in our ultra-scaled WSe2 FETs at low temperature; at higher temperature, i.e., for \(T\) = 300 K, more defects are accessible, thus resulting in a distinct 1/f spectra. Given the relatively large width of 500 nm, we therefore need to operate our devices at 15 K to access a single dominant defect. Supplementary Fig. 8a, b, respectively, show the \({I}_{{DS}}\) traces and corresponding PSD plots obtained at \(T\) = 15 K, 50 K, 100 K, 200 K, and 300 K. While RTN is observed at temperatures as high as \(T\) = 200 K, short-lived, spike-like transitions between the two states are specifically observed at \(T\) = 15 K. As we will discuss next, spike-like RTN is critical for the design of biomimetic afferent neurons and this distinctive signal pattern can only be achieved when there is a significant difference between the average capture and emission time constants, i.e., \(\bar{{\tau }_{{{{\rm{c}}}}}}\) and \(\bar{{\tau }_{{{{\rm{e}}}}}}\), associated with the defect state35.

RTN dynamics and defect correlation

In this section, our objective is to establish a correlation between the observed RTN in ultra-scaled WSe2 FETs with various point defects through experimental analysis and device modeling using technology computer aided design (TCAD). It is noteworthy to mention that RTN traces are typically measured at room temperature where the noise is dominated by oxide defects. Channel defects, on the other hand, have much faster time constants, typically ranging from picoseconds to microseconds, and therefore do not contribute significantly to the observed RTN. At lower temperatures, as employed in our study, oxide traps freeze out and defects in the channel are slowed down enough to move into our measurement window. Therefore, our initial task is to confirm that the RTN traces we observe at 15 K are indeed related to the WSe2 channel rather than the gate oxide. To achieve this, we acquired RTN traces at different gate biases (\({V}_{{{{\rm{BG}}}}}\) = -3.2, -3.4, -3.6, -3.8 and -4 V) at \(T\) = 15 K as shown in Fig. 3a. Figure 3b, c, respectively, show the normalized histogram plots for capture and emission time, i.e., \({\tau }_{{{{\rm{c}}}}}\) and \({\tau }_{{{{\rm{e}}}}}\), which denote the time spent in the two distinct \({I}_{{{{\rm{DS}}}}}\) states at \(T\) = 15 K. As expected, both \({\tau }_{{{{\rm{c}}}}}\) and \({\tau }_{{{{\rm{e}}}}}\) are found to be distributed exponentially. From the exponential fits, \(\bar{{\tau }_{{{{\rm{c}}}}}}\) and \(\bar{{\tau }_{{{{\rm{e}}}}}}\) can be extracted. Notably, Fig. 3d, e reveal that \(\bar{{\tau }_{{{{\rm{c}}}}}}\) and \(\bar{{\tau }_{{{{\rm{e}}}}}}\) are independent of \({V}_{{{{\rm{BG}}}}}\), suggesting that the origin of the observed single defects is within the monolayer WSe2 channel31. This assertion is substantiated by TCAD modeling where a qualitatively similar trend of gate bias independent capture and emission times is observed for charge traps within the WSe2 channel. Our modeling analysis further indicates that the defect possesses a trap energy (\({E}_{{{{\rm{T}}}}}\)) of approximately 100 meV above the valence band maximum (\({E}_{{{{\rm{V}}}}}\)) and a relaxation energy (\({E}_{{{{\rm{relax}}}}}\)) smaller than 150 meV. It should be noted that these CTLs, and relaxation energies agree well with the values calculated for hole trapping at \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\) and \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) using DFT calculations, (see Supplementary Fig. 2 and Supplementary Table 1). Additionally, the defect is found to be located near the drain contact within the channel. These findings provide robust evidence that the observed RTN results from defects present in the WSe2 channel. We also fabricated monolayer WSe2 FETs on 20 nm ALD-grown HZO, which acts as the back-gate dielectric. Supplementary Fig. 9 shows the presence of RTN in the \({I}_{{{{\rm{DS}}}}}\) trace at \(T\) = 15 K in these devices as well, further supporting the notion that the single defects contributing to the observation of RTNs in WSe2 FETs are associated with the channel material rather than the gate oxide.

Fig. 3: Gate-bias independent RTN dynamics and defect correlation.
figure 3

a RTN traces obtained for \({V}_{{{{\rm{BG}}}}}\) = -3.2, -3.4, -3.6, -3.8 and -4 V at \({{{\rm{T}}}}\) = 15 K. Normalized histogram plots for b \({\tau }_{{{{\rm{c}}}}}\) and c \({\tau }_{{{{\rm{e}}}}}\). Insets show the exponential fits for extracting \(\bar{{\tau }_{{{{\rm{e}}}}}}\) and \(\bar{{\tau }_{{{{\rm{c}}}}}}\). d \(\bar{{\tau }_{{{{\rm{e}}}}}}\) and \(\bar{{\tau }_{{{{\rm{c}}}}}}\) as a function of \({V}_{{{{\rm{BG}}}}}\) obtained through exponential fits to the experimental data. e \(\bar{{\tau }_{{{{\rm{e}}}}}}\) and \(\bar{{\tau }_{{{{\rm{c}}}}}}\) as a function of \({V}_{{{{\rm{BG}}}}}\) obtained using a Canny step detection algorithm. The dots represent the median values, while the whiskers show the maximum and minimum values, providing a range for each measurement. The purple lines indicate the sampling interval (\({\tau }_{{{{\rm{s}}}}}\) = 2 ms) and ten times the sampling interval, respectively. For accurate extraction of time constants, they must exceed approximately ten times the sampling time.

Next, we conduct ab initio investigations to gain insights into the possible types of point defects responsible for the transient, spike-like RTN observed in our devices. We theoretically analyze the hole trapping properties of \({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\), \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) and \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\), by employing DFT in conjunction with a hybrid functional. While \({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\) are abundant in WSe215, they do not offer a possible state for hole trapping, as a hole cannot localize at the \({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\) defect but rather creates a delocalized state inside the valence band36. \({{{{\rm{V}}}}}_{{{{\rm{Se}}}}}\) will therefore not be considered as a potential hole trap that leads to the observed RTNs. On the other hand, \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) and \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\) were experimentally detected in sufficiently high concentrations in WSe2 monolayers15 to be possible candidates for causing the detected RTN signal via hole trapping and detrapping at the defect site. Our calculations reveal that for both \({{{{\rm{V}}}}}_{{{{\rm{W}}}}}\) and \({{{{\rm{Se}}}}}_{{{{\rm{W}}}}}\), the hole trap level lies closely above the valence band maximum of pristine bulk WSe2, with only small relaxation energies (23 – 196 meV) for both hole capture and release. Hence these defect types can therefore be classified as hole traps and are probable defect candidates to explain the RTN signal measured in our devices.

Next, we confirm the high yield and reproducibility of the RTN dynamics obtained from ultra-scaled WSe2 FETs at low temperatures, which is critical for their practical implementation in neuromorphic applications. Supplementary Fig. 10a showcases RTN traces obtained from ten representative ultra-scaled WSe2 FETs that are located in different physical locations of the fabricated chip, measured at \(T\) = 15 K with \({L}_{{{{\rm{ch}}}}}\) = 20 nm and \({W}_{{{{\rm{ch}}}}}\) = 500 nm corresponding to an \({A}_{{{{\rm{ch}}}}}\) of 0.01 \({{{{\rm{\mu }}}}{{{\rm{m}}}}}^{2}\). Additionally, we captured RTN traces from these devices at higher temperatures of \({T}\) = 50, 100, 200 and 300 K. Supplementary Fig. 10b depicts the relationship between the yield of spike-like RTNs and temperature. To calculate yield, we only consider devices showing spike-like RTN with \(\bar{{\tau }_{{{{\rm{c}}}}}}\)/\(\bar{{\tau }_{{{{\rm{e}}}}}}\) > 100. As expected, the yield steadily declines with temperature, transitioning from 100% at 15 K to 0% at 300 K. This observation strongly suggests that for the relatively large devices used here, low-temperature measurements are essential for the reliable observation of RTNs with a high yield. Supplementary Fig. 11 shows the transfer characteristics of the ten representative ultra-scaled WSe2 FETs that yielded RTN, measured at 300 K and 15 K. In addition, Supplementary Fig. 12 illustrates the RTN traces corresponding to three cooling cycles for five devices, confirming the reproducibility of the phenomenon.

Realization of biomimetic afferent neurons

Afferent neurons are a fundamental component of the brain’s information processing system, facilitating the transformation of external stimuli received by sensory organs into stochastic electrical spikes. Supplementary Fig. 13a provides a schematic representation of a biological neural network (BNN); wherein input stimuli, such as images, are translated by visual afferent neurons into stochastic spike trains. These spike trains, with the average interspike interval (\({\bar{\tau }}_{{{{\rm{spike}}}}}\)) reflecting the corresponding intensity of light in each pixel, are then used for subsequent higher-order processing and inference within the brain. Stochastic spike encoding is a fundamental and essential element of neural information processing, empowering the brain to manage fluctuations and uncertainties in the environment, thereby enhancing the reliability of information processing37. This is highlighted using an example in Supplementary Fig. 13b, c. Although deterministic encoding provides a more precise representation of an image in noise-free conditions, stochastic encoding surpasses it in the presence of noise, allowing for the detection of finer features that deterministic methods struggle to discern. This emphasizes the importance of stochastic encoding for pattern identification amid background noise, without requiring extensive noise filtering. As a result, stochastic encoding finds valuable application in the domains of bio-medical imaging and information processing, where noise poses significant challenges. Furthermore, Supplementary Fig. 14 shows the noise-resilience of stochastic encoding over deterministic encoding for MNIST handwritten digits, thus illustrating the robustness of our approach.

To mimic the functioning of afferent neurons, it is important to first realize a mechanism to generate a stochastic spike train wherein \({\bar{\tau }}_{{{{\rm{spike}}}}}\) conveys information about the intensity or strength of the input stimuli. In other words, larger \({\bar{\tau }}_{{{{\rm{spike}}}}}\) encodes stronger inputs and vice versa. Note that the gate voltage independence of \(\bar{{\tau }_{{{{\rm{c}}}}}}\) and \(\bar{{\tau }_{{{{\rm{e}}}}}}\) do not allow for the use of \({V}_{{{{\rm{BG}}}}}\) as the input stimulus. However, when we use \({I}_{{{{\rm{DS}}}}}\) as the input variable at a constant \({V}_{{{{\rm{BG}}}}}\), RTN is observed in the output voltage, \({V}_{{{{\rm{DS}}}}}\). Here, \({\bar{\tau }}_{{{{\rm{spike}}}}}\) is found to be exponentially dependent on the magnitude of \({I}_{{{{\rm{DS}}}}}\) as illustrated in Fig. 4a for different \({I}_{{{{\rm{DS}}}}}\) ranging from 50 nA to 130 nA in steps of 20 nA. Figure 4b shows the corresponding normalized histogram plots for \({\tau }_{{{{\rm{spike}}}}}\), defined as the time elapsed between the occurrence of two consecutive spikes, for different \({I}_{{{{\rm{DS}}}}}\). From these distributions, \({\bar{\tau }}_{{{{\rm{spike}}}}}\) can be extracted using an exponential fit. Figure 4c shows \({\bar{\tau }}_{{{{\rm{spike}}}}}\) as a function of \({I}_{{{{\rm{DS}}}}}\) that enables the construction of biomimetic afferent neurons. We have formulated an empirical model using Eq. 1 to replicate the relationship between \({\bar{\tau }}_{{{{\rm{spike}}}}}\) and \({I}_{{DS}}\).

$${\bar{\tau }}_{{{{\rm{spike}}}}}={\tau }_{0}{{{\rm{\times }}}}{exp} \left(-\frac{{I}_{0}}{{I}_{{{{\rm{DS}}}}}}\right)$$
(1)

where \({I}_{0}\) and \({\tau }_{0}\) are fitting parameters. This model serves as a bridge between experiments to obtain stochastic spike trains and simulation that will be discussed in the following section for accurate inference of noise-inflicted medical MNIST images using an SNN.

Fig. 4: Biomimetic afferent neuron and stochastic spike encoding.
figure 4

a Voltage RTN traces obtained from our biomimetic afferent neuron for different input \({I}_{{{{\rm{DS}}}}}\) ranging from 50 nA to 130 nA in steps of 20 nA. b Normalized histogram plots for \({\tau }_{{{{\rm{spike}}}}}\), defined as the time elapsed between the occurrence of two consecutive spikes, for different \({I}_{{{{\rm{DS}}}}}\). From these distributions, \({\bar{\tau }}_{{{{\rm{spike}}}}}\) can be extracted using an exponential fit. c)\({\bar{\tau }}_{{{{\rm{spike}}}}}\) as a function of \({I}_{{{{\rm{DS}}}}}\) revealing a monotonic dependence that can serve the purpose of constructing biomimetic afferent neurons. d Schematic and e optical image of the resistive capacitive (RC) differentiator for spike digitization mounted on a printed circuit board. The differentiator consists of a resistor with \(R=1{{{\rm{M}}}}\Omega\) in series with a capacitor having \(C=0.01{{{\rm{\mu }}}}{{{\rm{F}}}}\). The voltage spikes from the afferent neuron (\({V}_{{{{\rm{DS}}}}}\)) obtain by inputting different values of \({I}_{{{{\rm{DS}}}}}\) at \({V}_{{{{\rm{BG}}}}}\) = -6 V are applied to the capacitor while the other end of the resistor is grounded by applying \({V}_{{{{\rm{GND}}}}}\) = 0 V. The output from the RC differentiator (\({V}_{{{{\rm{diff}}}}}\)) is then provided as input to an op-amp that serves as the comparator. A supply voltage (\({+V}_{{{{\rm{CC}}}}}\)) of 2 V is applied to the comparator while a reference voltage (\(-{V}_{{{{\rm{CC}}}}}\)) of 0 V is established to eliminate the negative voltage spikes generated at the differentiator’s output. Digital stochastic spikes (\({V}_{{{{\rm{C}}}}}\)) are obtained at the output of the comparator. f \({V}_{{{{\rm{diff}}}}}\) for the different voltage inputs shown in (a). g \({V}_{{{{\rm{C}}}}}\) obtained on passing the spike train obtained in (f) through a comparator. The spike digitization peripheral circuit enables the seamless generation of digital stochastic spike trains with constant output voltage levels for the hardware implementation of afferent neurons.

To confirm whether the spike trains are truly random, their autocorrelation function (ACF) was calculated for different \({I}_{{{{\rm{DS}}}}}\), as shown in Supplementary Fig. 15. The ACF lies in the interval [-1,1] where a value of -1 and 1 indicate anti-correlation and correlation, respectively, and a value of 0 suggests no-correlation in the spike train. Clearly, the RTN traces do not show any long-term correlation validating our claim that the encoded spike trains are truly random in nature. Therefore, the current-controlled RTN traces obtained due to single defects in our scaled monolayer WSe2 FETs at\(\,T\) = 15 K can be used as a biomimetic afferent neuron for rate-based stochastic information encoding and thereby accelerate the development of noise-immune stochastic spiking neural networks (SSNNs). While this work focuses on harnessing the dynamics of inherent growth defects in large-area grown WSe2, other forms of stochasticity in ultra-scaled 2D FETs such as the defect dynamics of oxide traps could also be harnessed for the construction of afferent neurons.

A key observation from the RTN traces in \({V}_{{{{\rm{DS}}}}}\) is that both \({\bar{\tau }}_{{{{\rm{spike}}}}}\) and the output voltage levels change with \({I}_{{{{\rm{DS}}}}}\). However, in the context of inference applications, it is preferable to maintain constant output voltage levels while allowing \({\bar{\tau }}_{{{{\rm{spike}}}}}\) to vary with the strength of the input stimulus. To accomplish this, a simple peripheral circuit composed of a differentiator followed by a comparator was prepared as shown in Fig. 4d, e. Figure 4f, g show the respective outputs from the differentiator and the comparator for five representative voltage RTN traces obtained from the WSe2-FET-based afferent neuron. As expected, following differentiation, the output voltage waveform encompasses both positive and negative values, which are subsequently rectified by the comparator. Additionally, the comparator shapes the waveform into spike trains with a constant output voltage level aligned with the comparator’s supply voltage, in this case 2 V. The reference voltage for the comparator was established at 0 V to eliminate the negative voltage spikes generated at the differentiator’s output.

Application of stochastic encoding in biomedical imaging

Finally, we demonstrate the importance of stochastic encoding in the context of biomedical imaging by utilizing the stochastic spike trains obtained from our WSe2-FET-based afferent neuron to classify medical MNIST images with and without the presence of noise as shown in Fig. 5a, b, respectively, using an SSNN. It is essential to acknowledge that data generated by biomedical applications are particularly susceptible to external noise. This susceptibility can arise from limitations in instrument resolution, calibration errors, movements of the subjects being studied, etc. Consequently, extensive post-processing of the acquired biomedical data is often required, which may potentially obscure vital information necessary for accurate diagnosis. This inherent challenge served as motivation for selecting the medical MNIST dataset38 as the ideal testbed for our demonstration. The medical MNIST dataset consists of 48,000 images, each 64 × 64 pixels, split among 6 different image classes corresponding to abdominal, breast, chest CT, chest X-ray, hand X-ray, and head CT as shown in Fig. 5a.

Fig. 5: Noise-resilient inference of medical MNIST data set using stochastic SNN.
figure 5

a Representative image from each class in the medical MNIST dataset corresponding to abdomen, breast, chest CT, chest X-ray, hand X-ray, and head CT. The dataset contains a total of 48,000 images, with each class containing 8,000 images of size 64×64 pixel. b Gaussian noise with standard deviations, σ = 0, 0.2, 0.4, and 0.6, added to an example image from the chest X-ray class. c A fully connected, two-layered neural network with 4,096, 300 and 6 neurons in the input, hidden and output layers, respectively, was used to classify the medical MNIST dataset. The network was trained on 40,000 images using gradient descent algorithm with the learning rate of 0.0001 and batch size of 1 for 150 epochs. d Training accuracy as a function of epoch. A training accuracy of ~90% was achieved. The ANN was subsequently converted to an SNN to test the inference accuracy using the remaining 8,000 images from the medical MNIST dataset. We obtained a testing accuracy of ~91%. e Inference accuracy as a function of σ for both deterministic and stochastic encoding. We found that the inference accuracy dropped from ~91% to 86.3% for stochastic spike encoding and from 87.3% to 53.4 % for deterministic encoding.

Since SNNs require spike trains as inputs, all pixel intensities are converted into spike trains where the mean time between spikes (\({\bar{\tau }}_{{{{\rm{spike}}}}}\)) is inversely related to pixel brightness; brighter pixels result in shorter times between spikes, and darker ones longer. We employed the empirical model from Eq. 1 to mimic the functionalities of the afferent neuron by utilizing the stochastic spike trains generated from our devices. Given the extensive task of generating spike trains for the entire set of 64 × 64 × 48,000-pixel intensities across all images, we used Eq. 1 to encode the pixel intensities. First, an artificial neural network (ANN) was trained to classify the medical MNIST images as shown in Fig. 5c. We adopted the approach proposed by Sengupta et al. 39,40. where a trained ANN is converted into an SNN. This approach yields higher inference accuracy owing to near-lossless ANN-SNN conversion39. The ANN network has 4096 input neurons, 300 hidden layer neurons, and 6 output neurons corresponding to each class in the medical-MNIST dataset. The 64 × 64-pixel medical MNIST images were converted to one-dimensional vectors of size 4096 × 1 and fed to the input layer. The network was trained using a gradient descent algorithm for high convergence accuracy using a learning rate of 0.0001 for 150 epochs. Since the rectified linear unit (ReLU) function operates similarly to an integrate and fire neuron (IF), it was employed as the activation function for the neurons in the hidden layer. 40,000 images from the medical MNIST data set were used to train the ANN. Figure 5d shows the evolution of training accuracy to ~90% as a function of the training epoch. After training, the ANN was converted to an SNN, and the remaining 8000 images from the medical MNIST data set were used to test the inference accuracy: a final testing accuracy of ~91% was obtained.

Next, we purposefully injected gaussian noise with standard deviation (\(\sigma\)) up to 0.6 into the medical MNIST dataset to evaluate the robustness of an SSNN to noise. Figure 5b shows examples of noise-afflicted medical MNIST images and Fig. 5e shows the inference accuracy as a function of \(\sigma\) for both deterministic and stochastic encoding. In deterministic spike encoding, each pixel intensity of the image is translated directly into a spike train, where the timing between each spike is precisely determined by the pixel intensity. This method employs a linear mapping of the intensity values to the interspike interval ensuring that higher intensities result in the spikes occurring more frequently. In other words, more spikes occur for higher intensity values within a predefined time window. This eliminates any variability in the spike interval or number of spikes that represents a specific intensity value. In stochastic spike encoding, however, a pixel with higher intensity will, on average, generate spikes at a higher rate than a pixel with lower intensity, but the precise moments at which these spikes occur can vary. This stochasticity introduces a level of randomness that can mimic the natural variability observed in biological neural systems.

We ultimately found that the inference accuracy only dropped from ~91% to 86.3% for stochastic spike encoding in the presence of noise, whereas the inference accuracy dropped from 87.3% to 53.4 % for deterministic encoding under the same conditions. Supplementary Fig. 16 shows similar results for the MNIST handwritten digits dataset41. Here, the inference accuracy dropped from 92.1% to 87.5% for stochastic spike encoding and from 87.8% to 66.7 % for deterministic encoding. For practical applications, it is important to ensure the reproducibility of the spiking dynamics, keeping in mind non-idealities such as cycle-to-cycle and device-to-device variation. To this end, we also performed simulations considering these variations and to understand their impact on inference accuracy. These results are shown in Supplementary Fig. 17. Notably, our results indicate that the performance of our model remains unaffected by cycle-to-cycle and device-to-device variations. The intrinsic noise tolerance of SNNs may be responsible for this stability, underscoring the potential of SNNs for deployment in real-world scenarios where variability is a given. Our findings demonstrate that WSe2-based stochastic spike encoders can effectively accelerate noise-tolerant inference using SNNs. While some recent studies have explored RTN for the generation of true random numbers (TRNs)42,43,44,45 and physically unclonable functions (PUFs)46,47, its utilization for neuromorphic applications appears to be a pioneering endeavor.

Discussion

In conclusion, this study emphasizes the dual role of defects in electronic devices, particularly in emerging nanomaterials such as 2D semiconductors. We have used a comprehensive approach including high resolution atomistic imaging, DFT calculations, device modeling, low-temperature spectroscopy, and transport measurements to explore the impact of point-defects on RTN dynamics. Notably, while defects have historically posed challenges, our research reveals a surprising and promising facet, demonstrating that defects in aggressively scaled 2D transistors can be harnessed for hardware acceleration of inference engines based on SSNNs with exceptional noise resilience. This underlines the untapped potential of defects for computational purposes. In essence, our investigation underscores the importance of understanding and leveraging intrinsic point defects in 2D materials.

Methods

Growth of WSe2 film

The growth of WSe2 thin films on c-plane sapphire substrates was carried out in a metal-organic chemical vapor deposition (MOCVD) system (https://doi.org/10.60551/znh3-mj13) equipped with a cold-wall horizontal reactor with an inductively heated graphite susceptor with gas-foil wafer rotation. The tungsten hexacarbonyl (W(CO)6) (99.99%, Sigma-Aldrich) was used as the metal precursor while hydrogen sulfide (H2Se) was the chalcogen source with H2 as the carrier gas. The W(CO)6 powder was maintained inside stainless-steel bubbler where the temperature and pressure of the bubbler were held at 30 °C and 400 Torr, respectively. The synthesis of WSe2 film is based on a multi-step process, consisting of nucleation, ripening, and lateral growth steps. In general, the WSe2 sample was nucleated for 30 sec at 850 oC, then ripened for 5 min at 850 oC and 5 min at 1000 oC, and then grown for 20 min at 1000 °C. During the lateral growth, the tungsten flow rate was set as 3.8×10-3 sccm and the chalcogen flow rate was set as 75 sccm while the reactor pressure was kept at 200 Torr. After growth, the substrate was cooled in H2Se to 300 °C to inhibit the decomposition of the obtained WSe2 films.

WSe2 film transfer to target substrates

To fabricate the WSe2 field-effect transistors, the WSe2 film was first transferred from the sapphire growth substrate to the global back-gated Al2O3/Pt/TiN/p++-Si substrate using a PMMA-assisted wet transfer process. First, the WSe2 film on the sapphire substrate was spin-coated with PMMA and baked at 150 °C for 2 min to ensure good PMMA/WSe2 adhesion. The corners of the spin-coated film were scratched using a razor blade and immersed inside a 2 M NaOH solution kept at 90 °C. Capillary action caused the NaOH to be preferentially drawn into the substrate/film interface due to the hydrophilic nature of sapphire and the hydrophobic nature of WSe2 and PMMA, separating the PMMA/WSe2 film from the sapphire substrate. The separated film was then fished from the NaOH solution using a clean glass slide and rinsed in three separate water baths for 15 min each before finally being transferred onto the target substrate. Subsequently, the substrate was baked at 50 °C and 70 °C for 10 min each to remove moisture and promote film adhesion, thus ensuring a pristine interface, before the PMMA was removed using acetone immersion overnight and the film was cleaned with IPA.

WSe2 etching and channel definition

To define the channel regions for the WSe2 FETs, the substrate was spin-coated with PMMA and baked at 180 °C for 90 s. The resist was then exposed via e-beam and developed using a 1:1 mixture of 4-methyl-2-pentanone (MIBK) and IPA. The WSe2 film was subsequently etched using a sulfur hexafluoride (SF6) reactive ion etch chemistry at 5 °C for 12 s. Next, the sample was immersed in acetone overnight and cleaned with IPA to thoroughly remove the photoresist.

WSe2 ultra-scaled device fabrication

The ultra-scaled devices were fabricated using a two-step e-beam lithography process, where the scaled source-drain contact terminals were patterned, evaporated, and lifted-off in the first step using a single layer e-beam photoresist, followed by the second step which involved patterning, evaporation, and lift-off of large contact pads shorted to the scaled source/drain terminals for access and measurements. ZEP 520 A 1:1 is used as the photoresist in the first step. Prior to resist spinning, the sample is dipped in Surpass 4 K for 60 s, rinsed in DI water, and baked at 100 °C for 1 min. This is done to improve the adhesion of the photoresist to the substrate that contains exposed metal alignment markers. The sample is then spin-coated at 5000 RPM for 45 s and baked at 180 °C for 3 min. The sample is then exposed using e-beam lithography. The ultra-scaled patterns exposed in the first step are developed using a cold-develop process involving organic solvent n-amyl acetate chilled to -10 °C (3 min) and then rinsed in IPA for 60 s. Post the develop process, 20 nm of palladium is evaporated using e-beam evaporation, which now serves as contacts to the WSe2. The remaining metal on the photoresist is lifted-off with immersion in acetone for 30 min. The sample is then immersed in Photo Resist Stripper 3000 (PRS 3000) heated at 60 °C on a hotplate for 15 mins to completely remove the resist. Higher adhesion of the resist to the substrate can make it difficult to strip the resist, thus requiring the use of a pipette or a purge bottle to purge with acetone or PRS 3000 depending on the solvent used. Following the resist removal, the sample is rinsed with IPA for 10 min. For the second step, contact pads are again defined using e-beam lithography. Now, the sample is spin coated with MMA followed by A3 PMMA. E-beam lithography is used to pattern the large access pads that overlap with the ultra-scaled source and drain contacts on the WSe2 channel. The sample is developed using a 1:1 mixture of MIBK/IPA for 60 s and pure IPA for 45 s. Next, 40 nm of palladium and 30 nm of gold are deposited using e-beam evaporation. Finally, a lift-off process is performed to remove the evaporated palladium/gold except from the source/drain patterns by immersing the sample in acetone for 30 min followed by IPA rinse for another 10 min.

Metal evaporation process

Metal evaporation was done in a Ferrotec Temescal F-2000 evaporator with a standard fixture that allows for a substrate to source crucible distance of at least 50 cm. Radiative damage is prevented during ramp-up process by closing both the substrate and crucible shutters until a stable deposition rate is achieved. It is important to note that the shutters cannot fully protect the substrate, and the metal atoms can still find alternative pathways to get to the substrate from the open sides. For our scaled device fabrication, we limit the metal contact thickness to 20 nm.

Electrical characterization

Electrical characterization of the fabricated devices at room temperature and at low temperature was performed using a Lake Shore CRX-VF probe station with a Keysight B1500A parameter analyzer.

Raman and PL characterization

Raman and PL spectra were collected using a Horiba LabRAM HR Evolution confocal Raman microscope with an excitation wavelength of 532 nm. The objective lens had a magnification of 100× and a numerical aperture of 0.9. A grating with a groove spacing of 1800 gr mm-1 was used for Raman and a grating with 300 gr mm-1 was used for PL. The low-temperature PL measurements were measured using the Linkam stage temperature control system. The stage was cooled to 77 K using liquid nitrogen.

Transmission electron microscopy

PMMA-assisted wet transfer process was used to transfer the monolayer WSe2 film onto the Quantifoil® TEM Substrate (658-200-CU-100, Ted Pella) for TEM characterization. HAADF-STEM was performed using an aberration-corrected ThermoFisher Titan3 G2 60–300 with monochromator and X-field emission gun source at an accelerating voltage of 80 kV. The convergence semi-angle used for STEM imaging was 30 mrad and the collection angle range of the HAADF detector was 42–244 mrad.

TCAD modeling

We describe the electrostatics and the current flow in the thin 2D FETs with dimensions on the micrometer scale using a drift-diffusion based TCAD model, namely the commercial version of the software package Minimos-NT48 by GTS. The electrostatics were calibrated to the measured transfer characteristics of the devices and subsequently we used the non-radiative multi-phonon (NMP) model to describe charge transfer of electrons from the channel to defects located at the interface of the channel to the gate oxide. In the on-state of 2D FETs the 2D channel is in accumulation and the surface potential shows negligible gate bias dependence. In turn, the trap levels of channel-related defects also become gate-bias independent in the on-state which leads to gate-bias independent capture and emission time constants in the NMP model.

Computational details

The Gaussian plane wave (GPW) method as implemented in the CP2K code49 was employed for all DFT calculations. Defect calculations were carried out in a monolayer WSe2 supercell containing 432 atoms with 40 Å of vacuum perpendicular to the monolayer to minimize spurious interactions with periodic images of the monolayer. We use the PBE0_TC_LRC hybrid functional50 with the default mixing parameter of 0.25 to accurately describe the localization of charge and the electronic interactions, including exchange and correlation effects. The electronic wave functions were described with double-zeta valence-polarized Goedecker–Teter–Hutter basis sets and auxiliary basis sets of type cFIT for calculations of the Hartree-Fock exchange. The systems were self-consistently relaxed down to a residual maximum force of 20 meV Å-1 for each atom with a convergence criterion for the total energy of 13.6 μeV. Cell parameters were relaxed with the hybrid functional to reduce the internal stress to <0.01 GPa. Finite size corrections of the total energy compensating for electrostatic potential offsets and spurious interactions due to periodic boundary conditions in charged supercells were carried out using the CoFFEE code51, which implements the FNV correction scheme52 for 2D systems.