In-sensor compressing via programmable optoelectronic sensors based on van der Waals heterostructures for intelligent machine vision

Huang, Haoxin; Shi, Shuhui; Zha, Jiajia; Xia, Yunpeng; Wang, Huide; Yang, Peng; Zheng, Long; Xu, Songcen; Wang, Wei; Ren, Yi; Wang, Yongji; Chen, Ye; Chan, Hau Ping; Ho, Johnny C.; Chai, Yang; Wang, Zhongrui; Tan, Chaoliang

doi:10.1038/s41467-025-59104-7

Download PDF

Article
Open access
Published: 24 April 2025

In-sensor compressing via programmable optoelectronic sensors based on van der Waals heterostructures for intelligent machine vision

Haoxin Huang¹^na1,
Shuhui Shi^2,3^na1,
Jiajia Zha¹^na1,
Yunpeng Xia¹,
Huide Wang⁴,
Peng Yang⁵,
Long Zheng⁶,
Songcen Xu⁷,
Wei Wang⁸,
Yi Ren⁹,
Yongji Wang⁹,
Ye Chen ORCID: orcid.org/0000-0003-0821-7469⁶,
Hau Ping Chan¹,
Johnny C. Ho ORCID: orcid.org/0000-0003-3000-8794⁸,
Yang Chai ORCID: orcid.org/0000-0002-8943-0861¹⁰,
Zhongrui Wang ORCID: orcid.org/0000-0003-2264-0677³ &
…
Chaoliang Tan ORCID: orcid.org/0000-0003-1695-5285^1,11,12

Nature Communications volume 16, Article number: 3836 (2025) Cite this article

7527 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Efficiently capturing multidimensional signals containing spectral and temporal information is crucial for intelligent machine vision. Although in-sensor computing shows promise for efficient visual processing by reducing data transfer, its capability to compress temporal/spectral data is rarely reported. Here we demonstrate a programmable two-dimensional (2D) heterostructure-based optoelectronic sensor integrating sensing, memory, and computation for in-sensor data compression. Our 2D sensor captured and memorized/encoded optical signals, leading to in-device snapshot compression of dynamic videos and three-dimensional spectral data with a compression ratio of 8:1. The reconstruction quality, indicated by a peak signal-to-noise ratio value of 15.81 dB, is comparable to the 16.21 dB achieved through software. Meanwhile, the compressed action videos (in the form of 2D images) preserve all semantic information and can be accurately classified using in-sensor convolution without decompression, achieving accuracy on par with uncompressed videos (93.18% vs 83.43%). Our 2D optoelectronic sensors promote the development of efficient intelligent vision systems at the edge.

Hardware-accelerated integrated optoelectronic platform towards real-time high-resolution hyperspectral video understanding

Article Open access 15 August 2024

Ultrasensitive dim-light neuromorphic vision sensing via momentum-conserved reconfigurable van der Waals heterostructure

Article Open access 18 October 2024

In-sensor dynamic computing for intelligent machine vision

Article 08 February 2024

Introduction

The rapid advancement of multidimensional visual information processing, encompassing spatial distribution, temporal evolution, phase and wavelength, amid the surge of Internet of Things (IoTs) and artificial intelligence (AI), has led to a growing demand on high-throughput data acquisition. This demand poses significant challenges to conventional signal processing, which is constrained by limited bandwidth^1,2,3. Efficiently handling multidimensional data is crucial for edge AI to minimize energy consumption and system complexity without compromising performance. However, directly recording multidimensional visual information (e.g., high-speed video and hyperspectral data) using conventional full-frame sensors requires complicated modules, large storage capacity and substantial transmission bandwidth^4,5. Addressing these limitations to simultaneously capture comprehensive photon information has become a significant challenge in multidimensional high-definition (HD) imaging^6,7.

With the evolution of compressive sensing (CS) theory^8,9, the snapshot compressive imaging (SCI) technology has emerged as an effective solution to tackle the abovementioned dilemma^10,11,12,13. In SCI hardware implementations, continuous optical data (video frames or spectral channels) can be encoded first via an optical modulator and then compressed/integrated directly into a single snapshot, which can be reconstructed into the original scene using advanced algorithms. In this case, SCI allows for capturing multidimensional data ( ≥ three-dimensional (3D)) via traditional two-dimensional (2D) image sensors without spatiotemporal resolution^10,11. Unlike full-frame sensors that generate millions of pixel elements, SCI compresses multiple mapped optical information into the same pixel through multi-channel parallel acquisition, thereby enhancing photon utilization and sampling rates. This results in the development of efficient imaging system with low storage and bandwidth requirements, as well as high optical throughput, making SCI highly promising for processing temporal, spectral–temporal, and spatial–temporal data^1,4,12. Although event-based vision sensors (EVS) can also mitigate the issues of high bandwidth and storage demands through their sparse event data capture¹⁴, their performance is inherently constrained by a decrease in compression efficiency under static conditions and the inability to generate interpretable images directly without relying on computationally intensive iterative reconstruction algorithms^15,16. In contrast, SCI utilizes spatiotemporal optical encoding to enable non-iterative reconstruction of full-scene information. This approach not only matches EVS in terms of bandwidth efficiency in high-dynamic scenarios but also significantly enhances the effective information density in static scenes (Supplementary Table 1)^1,12. The encoding process of SCI is the precondition for compressed imaging. Although various optical modulators have been developed, they are usually expensive and come with limitations. In video SCI, for example, the digital mirror device (DMD) encoder relies on mechanical deflection and that limit its footprint and applicability in diverse scenarios¹⁷. While liquid crystal encoder has better resolution, it suffers from pixel crosstalk and limited switching speed¹⁸. Moreover, the encoder and integrator of SCI are physically separated, that poses challenges for precise alignment¹⁹, and the sequential processing requires independent computing resources, leading to data inconsistency, transmission latency, and increased system complexity^19,20,21,22. These issues present significant obstacles, especially as cutting-edge intelligent applications increasingly prioritize portability and efficiency.

Given the rising prominence of visual sensors based on 2D van der Waals (vdW) materials and their effectiveness across a range of electronic and optoelectronic applications^{20,23,24,25,26}, constructing functionalized 2D heterojunctions is expected to further advance SCI technology. The ultra-thin nature of 2D materials enables rapid response to optoelectronic modulation, making them potential candidates for use as encoders. Additionally, 2D heterojunctions can integrate sensing, storage and computing functions within a single device to eliminate data transfer between different modules, leading to reduced latency and power consumption^{20,27,28,29,30,31,32,33}. Currently, reported 2D visual sensors have been utilized to reduce processing loads by removing redundant information while retaining feature information^21,22,34,35. However, lossless compression of visual information has not been realized. Achieving this could be particularly advantageous in scenarios that demand high acquisition efficiency, reconstruction accuracy, and processing precision, such as industrial inspection, security monitoring, and autonomous driving^1,5,12. Therefore, integrating sensing, snapshot compression, and computing into a single sensor is a promising approach for optimizing integration functionality. This strategy could enhance multidimensional data acquisition, simplify system complexity, and broaden application scenarios.

Here we report a 2D vdW programmable photoinduced memory sensor (PPMS) that offers simultaneous sensing and processing capabilities for highly compact SCI system, particularly the video and spectral SCI with 3D signals. Leveraging its rich optoelectronic dynamics, our developed PPMS can not only simulate the perception of dynamic motions, but also encode, compress, and store optical signals using its programmable non-volatile conductance under electro-optical co-modulation. This sensor can compress 3D grayscale videos and hyperspectral images into 2D encodings with a compression ratio of 8:1 (from original size to compressed size), then being reconstructed in high fidelity with distinguishable and reproducible characteristics compared to the original dataset. For video compression, the peak signal-to-noise ratio (PSNR) value achieved by the sensor (15.82 dB) is close to that achieved by software (16.21 dB). In addition, we demonstrate that the compressed data retains all semantic information, which can be accurately classified by a convolutional neural network (CNN). This work could deliver an inspiration for constructing compact and efficient optoelectronic sensors, promoting the development of edge intelligent vision systems in the future.

Results

In-sensor dynamic compressing and computing

Figure 1a illustrates the implementation of in-sensor compressing and downstream computing based on our 2D PPMS. The PPMS integrated incoming 2D frames modulated by electrical masks, effectively compressing 3D videos or hyperspectral data into 2D via SCI. The compressed pixel can either be reconstructed into original 3D signal or be classified using a CNN without the need of decompression. Notably, the use of a binarization strategy for input data is primarily aimed at simplifying the representation of complex real-world optical field scenarios. This approach will be discussed in further detail later. The realization of the in-sensor compression depends on the unique photoinduced memory characteristics of our PPMS. As depicted in Fig. 1b, either optical or electrical stimuli result in instant and volatile channel conductance change of the sensor, referring to the perception of input signals. These changes can be non-volatile and further integrated under electro-optical co-modulation, yielding non-volatile multi-level channel conductance. This feature endows our sensor with two working modes, which are labeled as P_ij and M_ij, respectively. As shown in Fig. 1c, P_ij represents the perception of optical signal by pixel(i,j), while M_ij represents the perception and memorization of the sensed optical signal under electrical encoding. These unique optoelectronic behaviors lay the foundation for dynamic encoding and compressing of 3D visual information.

**Fig. 1: In-sensor compressing and computing.**

The device architecture of our specifically designed PPMS for this purpose is illustrated in Fig. 2a, f, which consists of the Se/h-BN/MoS₂ vdW heterostructure. The single-crystalline trigonal selenium (t-Se) nanosheet is prepared via a modified physical vapor deposition (PVD) method with leaf-like morphology (Fig. 2c)^36,37, and its crystal structure is corroborated by the high-resolution transmission electron microscopy (HRTEM), where Se atoms covalently bond with each other in molecular chains along the c-axis ([0001] direction) and these chains stack via vdW forces to constitute the Se nanosheet. (See Supplementary Figs. 1–3 and Note 1 for details). The optical bandgaps for MoS₂ and Se nanosheets are characterized to be 1.82 eV and 1.87 eV, respectively (Fig. 2d). Figure 2e shows the Raman spectra of the vdW heterostructure, in which the typical characteristic peaks match well with previous reports^36,37. The high quality of the prepared vdW heterostructure is demonstrated by the cross-sectional transmission electron microscope (TEM) image (Fig. 2g), where the clear and sharp boundaries between the layered materials indicate the clean and flat interfaces. The composition is further evidenced by elemental energy-dispersive X-ray spectroscopy (EDS) mapping (Fig. 2h).

**Fig. 2: Sensor structure and performance characteristics.**

In stark contrast with traditional optoelectronic sensors with fixed charge storage density^22,38,39,40, our 2D PPMS exhibits unique photoinduced memory behaviors, enabling two distinct operation modes as shown in Fig. 2b: (1) When the sensor is triggered by electrical pulses under optical illumination, it demonstrates photoinduced nonvolatile memory, and multi-level states can be achieved through consecutive electro-optical co-modulation. (2) When the sensor is exposed solely to optical or electrical stimuli, its conductance change remains volatile and is directly proportional to the stimuli intensity, owing to the limited carrier density in the Se layer. This configuration allows the sensor to detect stimuli but does not retain memory of them. Based on these features, the PPMS can be applied to perform dynamic encoding, allowing in-sensor compression of pixel values over time series for the realization of highly compact SCI.

Demonstration of working principle

The two-mode operation principle is explained in detail by sweeping the control gate (V_cg) of the sensor under illumination and dark conditions, respectively (see Supplementary Fig. 4 for the morphology characterization of a typical device). The large memory window recorded under illumination shown in Fig. 3a corroborates the photoinduced memory in our device, which can be ascribed to the increased charge density within Se layer (see Supplementary Fig. 5a, b for detailed characterization)^37,41. The memory window’s dependence on V_cg under illumination is further characterized in Supplementary Fig. 5c, where the linear dependence of ∆V on maximum control gate voltage (V_{cg, max}) demonstrates that the stored charge quantity can be effectively tuned by the control gate. It’s worth mentioning that ∆V reaches 75 V at V_{cg, max} = 40 V, that equal to a stored charge density of 5.62 × 10¹²cm^-2 and is comparable to conventional flash memory device³³ (Supplementary Note 2).

**Fig. 3: Working mechanism of the PPMS.**

The optical power-dependent storage capacity in PPMS is further explored (Supplementary Fig. 5d–f). Notably, the threshold voltages obtained from positive and negative branches (ie., V_thp and V_thn) quickly shifts to negative side under illumination even with an effective optical power as low as 0.0096 nW (i.e., V_th-Light-V_th-Dark < 0, as shown in Supplementary Fig. 5d), indicating the activation of hole trapping within Se layer, while large optical power leads to large memory windows (Supplementary Fig. 5e, f). Figure 3b presents the variation of illumination-induced storage capacity, with V_thp moving from left to right with the increase of laser power (i.e., V_thp-Light-V_thp-Dark > 0), corresponding to the increase of tunneling charge stored in the charge trapping layer. The variation in both V_thn and V_thp is further summarized in Fig. 3c, with a transition effective power reaching ~0.035 nW. The unique V_th shifting phenomenon can be attributed to the reduction of the electron tunneling barrier on the Se side caused by illumination (φ_{e_Se}) and the photosensitive feature of Se (Supplementary Note 3). In short, the finely adjustable charge storage capability grounds the visual input compression. Besides, the conducatance of the sensor can be tuned to distinguished states stimulated by varied V_cg pulses (Supplementary Fig. 6a), proving its potential for analog storage. The unique optoelectronic behaviors in our sensor are further illustrated in I_ds-t test (Fig. 3d–f). Without illumination, the channel conductance remains unchanged after electrically programming ( + 30 V/10 ms)/erasing (−30 V/10 ms) operations, while the introduction of illumination induces the distinct and non-volatile OFF state (high-resistance state, denoted as State-0) and ON state (low-resistance state, denoted as State-1) with an extinction ratio over 10⁶. Notably, the retention states (state 0 or 1) can be retained after removing illumination (Fig. 3f), demonstrating the non-volatile memory mode. Decent photoinduced endurance and retention characteristics of our sensor are also verified (Supplementary Fig. 6b, c), where no observable performance degradation after > 3000 periodic P/E (Programming: +30 V/10 ms; Erasing: −30 V/10 ms) operations and the extinction ratio of > 10⁵ triggered by ± 40 V/10 ms pulses could be maintained for more than 3000 s under illumination. In addition, the switching speed of the sensor can reach the time scale of hundreds of microseconds under illumination (Supplementary Fig. 6d, e).

The unique photoinduced memory function is highly related to the photosensitive Se nanoflake. In the dark, Se holds negligible intrinsic carrier density, whereas it increases significantly under illumination (on/off photocurrent ratio >10², Supplementary Fig. 7c, d)^37,41. The mechanism behind is further revealed by the Fowler–Nordheim (FN) tunneling model (See Supplementary Figs. 8–9 and the Supplementary Note 3 for details)^42,43. In the dark, the calculated barrier height for electron tunneling (φ_{e_Se}) from Se to Au layer in Se/h-BN (11 nm)/Au heterostructure is ~2.40 eV, while an extra lower barrier close to 0.51 eV appears when the device is under the illumination with a low effective power of P_{532 nm} = 0.042 nW (Supplementary Fig. 8e). In this case, large amounts of electrons from valence band (VB) or the deep trapping states in the band gap could be excited to the CB with the absorption of incident photons³⁸, resulting in lower potential barrier and higher tunneling probability, leading to the obvious photoinduced memory in the sensor.

Besides, the PPMS can also be configured to the perception mode. Figure 3g, h present the current evolution triggered by optical pulses when the device has been electrically tuned to low-/high-resistance states, where only instant photo-response rather than non-volatile positive or negative photoconductivity behaviors (PPC or NPC) appears⁴⁴. In perception mode, PPMS can serve as a photodetector. It’s worth pointing out that the sensor presents ultra-high sensitivity towards visible light in the perception mode. As shown in Supplementary Fig. 10a–c, distinct photocurrent values and responsivities can be obtained at erased state, and an admiring photocurrent of I_ph = 495 nA and responsivity of R = 7583 A/W are observed at state 1-6 under the illumination of a low power laser (P_{532 nm} = 0.065 nW) (Supplementary Fig. 10b, c). Se is a p-type semiconductor with longer hole lifetimes, which has also been confirmed in Supplementary Fig. 7a, b. The ultra-high photoresponsivity can be attributed to the extra electrostatic control of the channel induced by the hole trapping and the strengthened photogating effect at MoS₂/h-BN surface^45,46. Moreover, the increased hole density within trapping layer will further mitigate the Schottky barrier between the channel and source/drain electrodes, leading to a higher carrier extraction efficiency⁴⁷.

The mechanism behind the perception mode depends on the decreased contact potential difference (V_CPD) between Se and MoS₂ under illumination⁴⁸, as revealed in the Kelvin probe force microscopy (KPFM) characterization (Supplementary Fig. 11). Although the trapped carriers create energy band bending of MoS₂, the tunneling probability of the photogenerated carriers across the tunneling barrier remains low, that inhibits the memory behaviors and leaves along the perception mode (The band diagrams of the sensor working under the two modes has also been depicted in Fig. 2b). The above results fully demonstrate the unique characteristics of our PPMS with reconfigurable perception and memory modes. Noted that a strong enough electric field can still provoke carrier tunneling in the dark, we map the extinction ratio change in the sensor with different P/E pulses in Supplementary Fig. 6f, where two modes can be well distinguished when the sensor works under |V_cg| ≤ 30 V and pulse width ≤ 100 ms, accordingly. The application of this sensor for subsequent SCI implementation is based on these experimental conditions.

SCI implementation based on dynamic electro-optical modulation

In conventional SCI system, 3D data (e.g., dynamic visual signals) are first encoded through a temporal-variant mask frame-by-frame (e.g., DMD and spatial light modulator (SLM)) before being integrated by conventional image sensors (e.g., complementary metal oxide semiconductor (CMOS) and charge coupled device (CCD))^4,12,19. In this way, the frames across temporal or spectral dimension can be compressed into a single snapshot measurement, which can then be used for lossless reconstruction or classified by machine learning models without decompression. However, the physical separation between DMD/SLM mask, CMOS/CCD sensors, and integration/memory unit results in low energy efficiency, large latency, and high system complexity. In our demonstration, the photoinduced memory behaviors enable the integration of perception, encoding, and compressing of dynamic scenes within a single device, minimizing power, latency, and hardware footprint.

The core function in our strategy depends on the optoelectronic AND gate (Fig. 4a), consisting of optical pulse input (IN1) and electrical pulse input (IN2), and the output corresponds to the change of the sensor conductance states. The dynamic compressing principle is designed as follows (Fig. 4a): Only when optical sensing (IN1) and pixel encoding (IN2) functions simultaneously can result in pixel value perception and integration, in which case, the conductance of the optoelectronic sensor experiences non-volatile changes based on the photoinduced memory effect, and such changes accumulate over time. Otherwise, the conductance state will remain unchanged. In this way, the 2D PPMS physically implements the SCI.

**Fig. 4: Working principle of snapshot compressive imaging (SCI) enabled by PPMS.**

To prove this concept, we carried out the dynamic modulation (including sole optical/electrical and electro-optical co-stimulus) within the sensor that had been pre-tuned to low-resistance state. Figure 4c shows the conductance states will not be changed permanently when being triggered by sole electrical pulse trains (E: 25 V/100 ms, 0.25 Hz) or optical pulse trains (L: 532 nm, 0.036 nW/1 s, 0.5 Hz), while the synergic electro-optical pulses can effectively tune the sensor conductance state, manifesting the feasibility of the reconfiguration between photo-detecting and encoding-and-integration. More details and evidence supporting the integrated functions are provided in Supplementary Fig. 12 and Supplementary Note 4. Notably, the multiple conductance states modulated by electrical pulses under illumination (Supplementary Fig. 12d) present high linearity and that is crucial for encoding and compressing process. To further verify the possibility of in-sensor multi-frame compression, we shortened the width of the optical pulses to close to real-world applications. As shown in Fig. 4d and e, multi-state conductance decrements over 19 current stages with remarkable uniformity can be observed (E: 25 V/100 ms, L: 0.018 nW/1 s, 0.125 Hz). According to the enlarged figure in Fig. 4d, the non-volatile channel current variation with pulse number (N) can be fitted into a linear relationship with a decent R² = 0.9980. In this case, for the same optical intensity appearing in different frames, the uniform change in conductance guarantees consistency in transmitting pixel values (similar to the linear relationship between induced charge and light intensity in CCD)⁴⁹. This ensures high-quality implementation of the SCI operation, providing a foundation for fidelity in subsequent decompression. Additionally, we demonstrated synergic electro-optical modulation of the conductance state of the sensor under varying optical pulse intensities, while maintaining a constant electrical pulse intensity (25 V/100 ms). As shown in Supplementary Fig. 13a–c, the absolute value of the average current change $(\Delta {I}_{{{\rm{ds}}},{{\rm{Avg}}}})$ incrementally increases with the rise in optical pulse intensity (i.e., 0.014 nW/1 s, 0.030 nW/1 s, and 0.045 nW/1 s). This increment confirms the controllable modulation of tunneling charges between Se and MoS₂, as the variation in channel current relative to pulse numbers maintains robust linearity (Supplementary Fig. 13d–f). We then plotted the relationship between the average current change and the effective power (Supplementary Fig. 14), where the strong linear correlation between $\Delta {I}_{{{\rm{ds}}},{{\rm{Avg}}}}$ and effective power (R² = 0.9868) demonstrates our device’s capability to resolve continuous analog signals with high fidelity. This suggests its potential for precise analog signal compression, where input power levels can be directly mapped to quantized current states. The charge retention capability of PPMS during the dynamic encoding process is also investigated (Supplementary Fig. 15), revealing the distinct non-volatile memory behavior (retention time > 500 s for each electro-optical co-modulation) with varied optical intensities. This prevents the loss of inter-frame information, laying the foundation for high-quality image decompression.

Besides green light (532 nm), our PPMS can also respond to blue (450 nm, 0.0085 nW/1 s) and red light (635 nm, 0.018 nW/1 s) in the same way. The results are provided in Fig. 4f, g and Supplementary Figs. 16 and 17, which also show decent electro-optical co-modulation with the fitted R² equals to 0.9905 and 0.9979 for 450-nm and 635-nm lasers, respectively. The broadband spectrum co-modulation with our sensor is the basis for 3D hyperspectral data compression using SCI. We also supplemented the comparison of PPMS and typical SCI encoders in terms of structure and performance. As shown in Supplementary Table 2 and Supplementary Fig. 18, the frame rate and compression ratio of PPMS can reach 1000 fps and 40:1, respectively, matching the reported encoding strategies^50,51,52. Although PPMS shows a limited dynamic range, its potential can be further explored by improving measurement accuracy. For practical applications, the noise level of the device should be improved to achieve high-fidelity pixel value transmission and compression (the linearity of R² value closer to 1).

In addition, a 3 × 3 sensor array is fabricated (See Supplementary Fig. 19a–e and “Methods”). All the nine devices manifest distinct photoinduced memory behaviors with considerable uniformity in ∆V (Supplementary Fig. 20). The device-to-device variation in the on/off current ratio (10³–10⁵, as shown in Supplementary Fig. 19f, g) arises from the inevitable damage introduced by the transfer process and the variation remains within an acceptable range. The successful array fabrication provides a reference for future device scalability.

Video and spectral SCI based on the optoelectronic sensor

The schematic processes for video and spectral SCI using our sensor are depicted in Fig. 5a and c, respectively. The flowchart outlining the compression and decompression process is shown in Supplementary Fig. 21. Additionally, the detailed steps for the compression and reconstruction of video and spectral data are illustrated in Supplementary Figs. 22 and 23, respectively. A unified SCI algorithm using element-wise multiplication can be mathematically written as¹:

$$Y=\mathop{\sum }\limits_{{{\rm{k}}}=1}^{{{\rm{n}}}}{X}_{{{\rm{k}}}}\odot {M}_{{{\rm{k}}}}+E$$

(1)

Where X denotes the 3D data cube (e.g., video or hyperspectral data) and is moderated by random 3D mask M (e.g., electric co-modulation signal). k = 1,…,n signifies the kth frame (t) or spectral band (λ) of the data cube, M_k corresponds to kth mask, and E is the measurement noise. Y is the 2D measurement realized by our sensor, or the compression result. Different from traditional SCI systems where the compressed pixel value is derived from the induced charges (Q_Induced) within the sensor (e.g., CCD) through photoelectric conversion and analog-to-digital conversion, the pixel value in our PPMS system is derived from the tunneling charge (Q_Tunnel) via electro-optical co-modulation (see Supplementary Note 5 for details) and can be fitted according to FNT model^42,43:

$${I}_{{{\rm{FNT}}}}\left(V\right)=\frac{{A}_{{{\rm{eff}}}}{q}^{3}m{(\alpha {V}_{{{\rm{cg}}}})}^{2}}{8\pi h{\varphi }_{B}{d}^{2}{m}^{*}}\exp \left(\frac{-8\pi \sqrt{2{m}^{*}}{\varphi }_{{{\rm{B}}}}^{\frac{3}{2}}d}{3{hq}{V}_{{{\rm{ds}}}}}\right)$$

(2)

$${Q}_{{{\rm{Tunnel}}}}\left(t\right)={\int }^{T}_{0}\left({I}_{{{\rm{FNT}}}}\right){dt}=\frac{\Delta {V}_{{{\rm{T}}}}\times {C}_{{{\rm{CG}}}-{{\rm{FG}}}}}{q}$$

(3)

Where V_cg and T denote the intensity and width of the electrical pulse applied to the control gate or back gate, respectively. The proportionality constant α is used to describe the voltage drop across the floating gate (FG) or charge trapping layer. The total tunneling charge is derived using the integral formula Eq. (3) and is converted into a change in the threshold voltage (∆V_T). Note that the tunneling current (I_FNT) induced by the control gate of PPMS within the working range is negligible in the absence of optical illumination. Therefore, we can precisely control the quantity of charges tunneling each time when T is held constant:

$${Q}_{{{\rm{Tunnel}}}}\left(t\right)=0\left({{\rm{sole}}}\; {{\rm{electrical}}}\; {{\rm{modulation}}}\right)$$

(4)

$${Q}_{{{\rm{Tunnel}}}}\left(t\right)={{\rm{Constant}}}\left({{\rm{specfic}}}\; {{\rm{electro}}}-{{\rm{optical}}}\; {{\rm{co}}}-{{\rm{modulation}}}\right)$$

(5)

**Fig. 5: Video and spectral SCI enabled by PPMS.**

For a metal-oxide-semiconductor (MOS) transistor, the I–V characteristic in the linear region can be described as follows (low V_ds)⁵³:

$${I}_{{{\rm{ds}}}}=\mu {C}_{{{\rm{ox}}}}\frac{W}{L}\left(\left({V}_{{{\rm{GS}}}}-{V}_{{{\rm{T}}}}\right){V}_{{{\rm{ds}}}}-\frac{{V}_{{{\rm{ds}}}}^{2}}{2}\right)\approx \mu {C}_{{{\rm{ox}}}}\frac{W}{L}\left({V}_{{{\rm{GS}}}}-{V}_{{{\rm{T}}}}\right){V}_{{{\rm{ds}}}}$$

(6)

Where μ is the carrier mobility, C_ox is the oxide capacitance per unit area, W is the channel width, L is the channel length, and V_GS is the gate-source voltage. Therefore, the V_GS is to be modified to effective gate voltage (V_{GS, eff}), which is expressed as:

$${V}_{{{\rm{GS}}},{{\rm{eff}}}}={V}_{{{\rm{GS}}}}-\Delta {V}_{{{\rm{T}}}}$$

(7)

Then Eq. (6) can be rewritten as:

$${I}_{{ds}}=\mu {C}_{{{\rm{ox}}}}\frac{W}{L}\left({V}_{{{\rm{GS}}},{{\rm{eff}}}}-{V}_{{{\rm{T}}}}\right){V}_{{{\rm{ds}}}}$$

(8)

Therefore, PPMS can achieve linear regulation of I_ds through precise adjustment of charges within Se layer via electro-optical co-modulation, which aligns with our experimental observations. The threshold voltage variation can be transformed into grayscale values through analog-to-digital converter (ADC) conversion. We have also demonstrated the PPMS’s ability to sense different light intensities that enable the compression of grayscale information maps (Supplementary Figs. 13 and 14). To simplify the computational load, we here only consider the compression scenario under sole light intensity and pre-binarized the input image. Specifically, the pixel values of the original image, which range from 0 to 255, are first normalized to a range of 0 to 1. A threshold value of 0.2 is then selected for binarization.

Figures 4b and 5a illustrate the in-sensor video SCI which integrates sensing, encoding, and integration/compression of the binarized video, and the masks used are shown in Supplementary Fig. 24. During the encoding process, when both the pixel values (X_k (x, y, t)) of the binary high-speed scene and the corresponding mask (M_k (x, y, t)) are “1”, then the electrical (mask) and optical (data cube) co-modulate the 2D sensor conductance. The encoded pixels, in the form of non-volatile conductance change, will accumulate over time (from frame t₁ to t_n). The resulting final conductance represents the integration/compression pixel value (C), is proportional to the number of simultaneous optical and electrical pulses. Referring to Fig. 4d, the “fingerprint” current value associated with the number of synergetic electrical and optical pulses can be read and converted into the pixel value of the compressed 2D image (C = I_{ds, Y}).

Based on the optoelectronic AND logic gate enabled by our PPMS, we have extended the functionality of a single device across a sensor array (Supplementary Fig. 27a) to facilitate simulated compression processing of dynamic signals across multiple pixels. To evaluate the device performance, we first reconstruct the 2D images from the device compression to produce hardware-compressed reconstructed images. We simultaneously apply the same reconstruction algorithm to the 2D images after software compression to generate software-compressed reconstructed images. Additionally, we binarize multiple frames from the original video, using these binarized images as a reference for comparison. We then calculate the PSNR values for each frame of the reconstructed images from both device and software, comparing them against the corresponding binarized original images (i.e., PSNR(DE_BI) and PSNR(SW_BI)) to evaluate the compression effectiveness. Figure 5b shows the compressed videos (below the purple box) and reconstructed 3D video (under the red box). We use a set of eight video frames about Kobe dribbling, each with a pixel resolution of 256 × 256, for SCI simulation. The left panel of Fig. 5b presents 4 selected frames before and after binarization (detailed enlarged images are shown in Supplementary Fig. 25), with the post-binarization frames still clearly depicting Kobe’s shooting dynamics. These frames are then encoded and compressed into a single snapshot 2D measurement by our simulated optoelectronic sensor, as detailed in Supplementary Fig. 22, aligning with the software compression illustrated in the middle panel of Fig. 5b (see Supplementary Fig. 26 for further details). The compressed 2D snapshots are then used to reconstruct the original videos through Plug-and-Play (PnP) algorithm^50,54 (see Supplementary Note 5 for reconstruction proccess). As shown in the right panel of Fig. 5b, the denoised and reconstructed video frames are highly consistent with the binarized frames with preserved details. Quantitatively, our in-sensor SCI shows a decent average PSNR(DE_BI) value of 15.81 dB with respect to the PSNR(SW_BI) of 16.21 dB (Fig. 5e), which demonstrates that our in-sensor SCI can parallel the software SCI performance.

As the 2D PPMS features broadband spectrum response, it could compress 3D hyperspectral data using SCI. The schematic illustration of spectral SCI is shown in Fig. 5c. The spectral scenes (X_k (x, y, λ)) are collected through the objective lens, followed by the spatial coding and compressing $({X}_{{{\rm{k}}}}\odot {M}_{{{\rm{k}}}})$ of the binarized data cube. Similar to conventional single-disperser coded aperture compressive spectral imager (CASSI) system^1,55, the coded scene here first undergoes spectral dispersion through a dispersive element, such as a prism or a grating (see Supplementary Fig. 23 for detailed process). Different from the video compression, the 2D PPMS accumulates the encoded frames across different wavelengths, where the current value corresponding to the number of pulses can then be read referring to the linearly changed conductance shown in Fig. 4d.

Figure 5d shows the simulated compression and reconstruction of hyperspectral data, employing the Bird data comprising 24 spectral bands with each sized at 1021 × 703 pixels as the inputs⁵⁴. For quantitative analysis, image data (400–500 nm) with ten continuous wavelengths (step 10 nm) were selected and binarized, which is shown in the left panel of Fig. 5d. Then the spectral data is encoded and compressed using both the PPMS and software. The reconstruction images by PnP algorithm^50,54 is illustrated by the middle (compress) and right (reconstruction) panel of Fig. 5d, and the spectral wavelength interval of 10 nm remains consistent before and after compression. As shown in Fig. 5f and Supplementary Table 3, the average PSNR for images reconstructed using software compression (21.49 dB) is higher than that for images reconstructed using device compression (16.73 dB). This indicates that software compression achieves greater similarity to the original image compared to device compression. It is also noteworthy that the average Structural Similarity Index (SSIM) for images reconstructed from device enabled compressed image (0.69) is close to the average SSIM obtained from software-enabled compressed image (0.79). This suggests that, from a human visual perspective, the quality of reconstruction is comparable between the two methods. Thus, the successful application of video and spectral SCI fully demonstrates the great potential of the 2D optoelectronic sensor in intelligent imaging systems. While it should be noted that under practical situations, SCI is much more complex and requires a proper balance between the decompression algorithm and compression process to achieve high reconstruction quality.

CNN classification of compressed images

Video classification is one of the most important tasks in machine vision. Usually, videos need to undergo compression before being stored or transmitted. These data packets require decompression to restore explicit semantic information before downstream video classification^56,57. Here we employ compressed 2D images for video classification. This methodology significantly reduces the additional expenditure of time and energy that is typically associated with the process of decompression⁵⁸.

To perform convolution right at where the data are compressed, the sensor’s channel current (I_ds) should be linear with respect to the applied bias voltage (V_ds), the latter represents kernels. We examined the relation between channel current of the sensor and V_ds (from |± 1mV| to |± 100 mV| in step of 1 mV) at non-volatile low-resistance state. As revealed in Supplementary Fig. 28, the calculated conductance with negligible variation perfectly meets our requirements.

The 2D optoelectronic sensor array, as depicted within the dotted box in Supplementary Fig. 27b, performs in-sensor convolution. The array initially encodes and integrates the video data using the array conductance G_ml (G_ml, m = 1, 2, 3…n; l = 1, 2, 3…n), which serves as the input feature map (Supplementary Fig. 27a). Then we map different 3 × 3 convolution kernels into bias voltage matrices (V₁₁, V₁₂,…, V₃₃). Each voltage matrix is split into columns. At each time step, a voltage column vector (e.g., V₁₁, V₁₂, V₁₃) bias the 2D optoelectronic sensor array for vector-vector inner product. The output current vector is I₁₁, I₁₂, I₁₃. This is repeated with the rest two voltage column vectors, producing I₂₁, I₂₂, I₂₃; and I₃₁, I₃₂, I₃₃, respectively. The currents are then summed over time. According to Ohm’s law and Kirchhoff’s law^20,59, the summed currents, reading voltage (kernel), and conductance physically implements vector-matrix multiplication⁴⁹, or a stride of the convolution:

$${I}_{{{\rm{kl}}}}=\mathop{\sum }\limits_{m=1}^{3}{V}_{{{\rm{km}}}}{G}_{{{\rm{ml}}}}$$

(9)

Figure 6a illustrates the simulated classification of SCI compression recorded on optoelectronic sensor array using in-device convolution. Here, we selected the Weizmann Human Action Dataset, which are videos about ten different human motion postures (i.e., bend, jack, jump, etc.) for the classification experiment. First, all the videos were subject to background removal. Then the videos are compressed using in-sensor video SCI, resulting in a single compressed frame containing the semantic information of the entire video (input case 1). For comparison, single-frame images selected from each video (input case 2) are classified. Figure 6c, d show compression of two human action movies, i.e., “Run” and “Walk”, and the resultant compressed images (for the dynamic movies, please refer to Supplementary Movie 1 and 2), respectively. It can be observed that the compressed images include information from all video frames (Fig. 6d), instead of single frame information in the control experiment (Fig. 6c). This feature is further evidenced in the 2D distribution maps (Fig. 6f, g) and confusion matrices (Fig. 6b, e), which clarify the classification results for both input cases 1 and 2.

**Fig. 6: In-sensor computing based on 2D PPMS.**

Figure 6f, g depict the distribution features of different motions using the two input cases, employing dimensionality reduction through linear discriminant analysis (LDA). In these maps, balls of the same color represent the same motion category, and the quantity of balls of each color indicates the sample size. The spatial separation between different categories highlights their distinguishability, with greater distance suggesting easier differentiation. A comparison of Fig. 6f, g reveals that compressed images (Fig. 6g) encapsulate richer feature information, as evidenced by tighter clustering within the same motion category and clearer separation between different categories, thereby enhancing their distinguishability. Additionally, the confusion matrices in Fig. 6b, e correspond to Fig. 6f, g, respectively, illustrating the classification accuracy across different categories. For example, Fig. 6b shows that the accuracy for correctly classifying ‘wave1’ is 95%, with a 5% misclassification rate where ‘wave1’ is confused with ‘wave2’. This overlap is visually represented in Fig. 6f, where the blue balls signifying ‘wave1’ significantly overlap with the purple balls for ‘wave2’. In contrast, Fig. 6e demonstrates a 100% classification accuracy for the as-mentioned categories, with Fig. 6g showing no overlap between the two colors of balls.

It is important to note that these two representation methods primarily visualize the classification results for a single motion and do not provide a comprehensive evaluation of classification accuracy. Therefore, we also present the overall classification accuracy based on the two input cases to evaluate the efficacy of the strategies more thoroughly. For a more objective evaluation, we employed a strategy that uses all frames as input for classification. As shown in Fig. 6h, the overall classification accuracies for single-frame, compressed-frame, and all-frame input strategies are 83.43%, 93.18%, and 94.21%, respectively. The results demonstrate that compressed-frame inputs significantly improve classification accuracy (93.18% vs. 83.43%) over single-frame approaches, effectively mitigating information loss inherent to sparse sampling. Remarkably, the compressed-frame strategy attains accuracy comparable to all-frame classification while reducing convolutional parameters by 69% (see “Method” for details). Thus, our compression paradigm eliminates computationally intensive reconstruction steps, enabling direct feature extraction from compressed data—a critical advantage for real-time processing on resource-constrained edge devices.

We have also summarized the recent advances in machine vision based on dynamic behavior of devices fabricated from 2D heterostructures (Supplementary Table 4). Different from existing imaging approaches that primarily focus on post-processing or feature extraction, often compromising image quality and real-time performance, our work explores the potential of 2D heterostructures in achieving in-sensor compression. We have developed a compact compressed imaging prototype device that not only addresses the current research gap in high-quality image acquisition but also provides a versatile platform for future intelligent vision systems.

Discussion

In summary, we developed an optoelectronic sensor with programmable storage capacity and functionalities. The sensor can work in either optical perception or memory modes, attributed to the photosensitive properties of t-Se. Leveraging this property, we designed a specialized optoelectronic AND gate, which enables simultaneous perception, encoding, compression, and storage of dynamic information. The compressed 2D images can be further reconstructed into 3D data using the PnP- algorithm. Our results demonstrate that this sensor provides a more compact and efficient 2D hardware encoder for video and spectra SCI technology. Furthermore, these compressed images keep all relevant semantic information of the video and enable direct in-sensor classification using CNN without decompression. This leads to reduced energy consumption, shorter processing time, and lower hardware costs. Overall, our work unveils the potential of 2D vdW heterostructure for compact and energy-efficient hardware for future intelligent visual systems.

Methods

The sensor and array fabrication

t-Se with the thickness of 30–50 nm was prepared via a modified PVD method (see Supplementary Note 1 for details), and the obtained products were transferred using a polydimethylsiloxane (PDMS) stamp onto a heavily doped Si covered by 300-nm SiO₂ (Silicon Valley Microelectronics, Inc.). Multilayer h-BN and MoS₂ nanosheets were mechanically exfoliated from their bulk crystals (2D Semiconductor Inc.) first, then transferred onto Se nanoflake step-by-step via a dry transfer method (Supplementary Fig. 2). The source/drain electrodes were defined using the standard electron beam lithography (EBL) (TESCAN, VEGA3), followed by the electron beam evaporation (EBE) and thermal evaporation of contact metals (Cr/Au, 8/50 nm). The 3 × 3 sensor array was also fabricated via the dry-transfer method: Firstly, three Se nanosheets with lateral size of ~10 μm × 50 μm were transferred and aligned parallelly. The nanosheets were then patterned into an array of 3 × 3 pixels with each pixel size of 8 μm × 8 μm (Supplementary Fig. 19) via reactive ion etching (RIE). Then a large h-BN nanosheet ( > 50 μm × 50 μm) was exfoliated onto PDMS and transferred onto the 3 × 3 pixels. Finally, a large exfoliated MoS₂ nanosheet ( > 50 μm × 50 μm) was transferred onto the Se/h-BN to finalize the Se/h-BN/MoS₂ heterostructure arrays followed by the RIE patterning. The electrodes were also defined via EBL followed by deposition of Cr/Au (8/50 nm) as electrodes.

Material and device characterizations

The material and device morphologies were captured by an optical microscope (Nikon, EBLIPSE LV100ND). The high-resolution transmission electron microscope (HRTEM) and TEM images of crystalline Se were obtained from Tecnai F20 TEM (200 kV), while the selected area electron diffraction (SAED) pattern was captured by Tecnai Spirit TEM (120 kV). Raman spectra of Se and Se/h-BN/MoS₂ vdW heterostructure were obtained using Renishaw Raman Microscope (wavelength of 532 nm). Atomic force microscope (AFM) (Bruker, Dimension Icon with Scan Asyst) was employed to characterize the thickness of each layer in vdW heterostructure. Electrical/optical measurements of the device were conducted on a probe station equipped with the semiconductor analyzer Agilent 4155 C and the source meter Keysight B2902B. The optical pulses were generated via the home-made platform where different optical sources (monochrome lasers of 450, 532, and 635 nm) combined with a mechanical chopper to generate the pulse trains.

In sensor CNN demonstration

Utilizing light pulse input as a representation of image signals and electric pulse input to signify the mask, we compress binarized multi-frame video signals. Subsequently, a CNN is employed to directly classify the compressed 2D images. The pixel values within the 2D images are substituted by the output current values from the sensor depicted in Fig. 4d. For single frames/compressed frames as inputs, the image classification network architecture comprises two convolutional layers (with output channel counts of 8 and 16, respectively) and a single fully connected layer (encompassing 10 output nodes). For all frames as inputs, the CNN consists of four convolutional layers (with output channels of 8, 8, 16, and 16, respectively) and one fully connected layer to guarantee the capacity-task matching. Following the convolution output, a sigmoid activation function is applied $({sigmoid}\left(x\right)=\frac{1}{1+{e}^{-x}})$. The cross-entropy loss serves as the loss function and is optimized via the Adam optimizer, with an initial learning rate set at 0.001. The network undergoes training for 100 epochs. The dataset, consisting of 990 samples, is partitioned into five subsets, with four subsets allocated for training and the remaining subset reserved for testing. The outcome is derived from the average of five experiments.

Data availability

The Source Data underlying the figures of this study are available with the paper. All raw data generated during the current study are available from the corresponding authors upon request. Source data are provided with this paper.

Code availability

The source codes used for simulation and data plotting are available from the corresponding authors upon request.

References

Yuan, X., Brady, D. J. & Katsaggelos, A. K. Snapshot compressive imaging: theory, algorithms, and applications. IEEE Signal Process. Mag. 38, 65–88 (2021).
Article Google Scholar
Liu, B., Chen, Y., Liu, S. & Kim, H.-S. Deep learning in latent space for video prediction and compression. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 701–710 (IEEE, Nashville, TN, USA, 2021).
Chakraborty, I., Roy, D., Garg, I., Ankit, A. & Roy, K. Constructing energy-efficient mixed-precision neural networks through principal component analysis for edge intelligence. Nat. Mach. Intell. 2, 43–55 (2020).
Article Google Scholar
Xu, Y., Lu, L., Saragadam, V. & Kelly, K. F. A compressive hyperspectral video imaging system using a single-pixel detector. Nat. Commun. 15, 1456 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Park, J., Feng, X., Liang, R. & Gao, L. Snapshot multidimensional photography through active optical mapping. Nat. Commun. 11, 5602 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, Z., Wan, T., Ma, S. & Chai, Y. Multidimensional vision sensors for information processing. Nat. Nanotechnol. 19, 919–930 (2024).
Article CAS PubMed Google Scholar
Wang, X. et al. Integrated photonic encoder for low power and high-speed image processing. Nat. Commun. 15, 4510 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Candes, E. J. & Wakin, M. B. An introduction to compressive sampling. IEEE Signal Process. Mag. 25, 21–30 (2008).
Article ADS Google Scholar
Donoho, D. L. Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006).
Article MathSciNet Google Scholar
Gehm, M. E., John, R., Brady, D. J., Willett, R. M. & Schulz, T. J. Single-shot compressive spectral imaging with a dual-disperser architecture. Opt. Express 15, 14013–14027 (2007).
Article ADS CAS PubMed Google Scholar
Gao, L., Liang, J., Li, C. & Wang, L. V. Single-shot compressed ultrafast photography at one hundred billion frames per second. Nature 516, 74–77 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, L., Cao, M., Zhong, Y. & Yuan, X. Spatial-temporal transformer for video snapshot compressive imaging. IEEE Trans. Pattern Anal. Mach. Intell. 45, 9072–9089 (2023).
PubMed Google Scholar
Hu, X. et al. HDNet: high-resolution dual-___domain learning for spectral compressive imaging. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 17521–17530 (IEEE, New Orleans, LA, USA, 2022).
Gallego, G. et al. Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 154–180 (2022).
Article PubMed Google Scholar
Rebecq, H., Ranftl, R., Koltun, V. & Scaramuzza, D. Events-to-video: bringing modern computer vision to event cameras. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3852–3861 (IEEE, Long Beach, CA, USA, 2019).
Scheerlinck, C. et al. Fast image reconstruction with an event camera. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 156–163 (IEEE, Snowmass, CO, USA, 2020).
Takashima, Y. & Hellman, B. Review paper: imaging lidar by digital micromirror device. Opt. Rev. 27, 400–408 (2020).
Article Google Scholar
Ke, X. et al. Liquid-crystal spatial light modulators and their applications. In Handbook of Optical Wireless Communication, 1015–1063 (Springer Nature Singapore Pte Ltd., 2024).
Martel, J. N. P. et al. Neural sensors: learning pixel exposures for HDR imaging and video compressive sensing with programmable sensors. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1642–1653 (2020).
Article PubMed Google Scholar
Mennel, L. et al. Ultrafast machine vision with 2D material neural network image sensors. Nature 579, 62–66 (2020).
Article ADS CAS PubMed Google Scholar
Zhou, Y. et al. Computational event-driven vision sensors for in-sensor spiking neural networks. Nat. Electron. 6, 870–878 (2023).
Article Google Scholar
Zhang, Z. et al. All-in-one two-dimensional retinomorphic hardware device for motion detection and recognition. Nat. Nanotechnol. 17, 27–32 (2022).
Article ADS PubMed Google Scholar
Jiang, Y. et al. A scalable integration process for ultrafast two-dimensional flash memory. Nat. Electron. 7, 868–875 (2024).
Article Google Scholar
Dodda, A. et al. Active pixel sensor matrix based on monolayer MoS₂ phototransistor array. Nat. Mater. 21, 1379–1387 (2022).
Article ADS CAS PubMed Google Scholar
Liu, S. et al. Nanopatterning technologies of 2D materials for integrated electronic and optoelectronic devices. Adv. Mater. 34, 2200734 (2022).
Article CAS Google Scholar
Polyushkin, D. K. et al. Analogue two-dimensional semiconductor electronics. Nat. Electron. 3, 486–491 (2020).
Article CAS Google Scholar
Wu, G. et al. Ferroelectric-defined reconfigurable homojunctions for in-memory sensing and computing. Nat. Mater. 22, 1499–1506 (2023).
Article ADS CAS PubMed Google Scholar
Sun, X. et al. Reconfigurable logic-in-memory architectures based on a two-dimensional van der Waals heterostructure device. Nat. Electron. 5, 752–760 (2022).
Article CAS Google Scholar
Kumar, D. et al. Artificial visual perception neural system using a solution-processable MoS₂-based in-memory light sensor. Light.: Sci. Appl. 12, 109 (2023).
Article ADS CAS PubMed Google Scholar
Marega, G. M. et al. A large-scale integrated vector-matrix multiplication processor based on monolayer molybdenum disulfide memories. Nat. Electron. 6, 991–998 (2023).
Article Google Scholar
Yang, Y. et al. In-sensor dynamic computing for intelligent machine vision. Nat. Electron. 7, 225–233 (2024).
Article CAS Google Scholar
Zhu, Y. et al. Non-volatile 2D MoS₂/black phosphorus heterojunction photodiodes in the near- to mid-infrared region. Nat. Commun. 15, 6015 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. Inorganic halide perovskite nanowires/conjugated polymer heterojunction-based optoelectronic synaptic transistors for dynamic machine vision. Nano Lett. 24, 4132–4140 (2024).
Article ADS CAS PubMed Google Scholar
Wang, F. et al. A two-dimensional mid-infrared optoelectronic retina enabling simultaneous perception and encoding. Nat. Commun. 14, 1938 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, P.-Y. et al. Neuro-inspired optical sensor array for high-accuracy static image recognition and dynamic trace extraction. Nat. Commun. 14, 6736 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, Q. et al. Fabrication and growth mechanism of selenium and tellurium nanobelts through a vacuum vapor deposition route. J. Phys. Chem. C. 111, 12926–12932 (2007).
Article CAS Google Scholar
Qin, J. et al. Controlled growth of a large-size 2D selenium manosheet and its electronic and optoelectronic applications. ACS Nano 11, 10222–10229 (2017).
Article CAS PubMed Google Scholar
Lee, S., Peng, R., Wu, C. & Li, M. Programmable black phosphorus image sensor for broadband optoelectronic edge computing. Nat. Commun. 13, 1485 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Wu, L. et al. Atomically sharp interface enabled ultrahigh-speed non-volatile memory devices. Nat. Nanotechnol. 16, 1–6 (2021).
Article Google Scholar
Vu, Q. A. et al. Two-terminal floating-gate memory with van der Waals heterostructures for ultrahigh on/off ratio. Nat. Commun. 7, 12725 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Luo, L.-B. et al. Transparent and flexible selenium nanobelt-based visible light photodetector. CrystEngComm 14, 1942–1947 (2012).
Article CAS Google Scholar
Vu, Q. A. et al. A high-on/off-ratio floating-gate memristor array on a flexible substrate via CVD-grown large-area 2D layer stacking. Adv. Mater. 29, 1703363 (2017).
Article Google Scholar
Britnell, L. et al. Electron tunneling through ultrathin boron nitride crystalline barriers. Nano Lett. 12, 1707–1710 (2012).
Article ADS CAS PubMed Google Scholar
Wang, Y. et al. Negative photoconductance in van der Waals heterostructure-based floating gate phototransistor. ACS Nano 12, 9513–9520 (2018).
Article CAS PubMed Google Scholar
Liu, M. et al. Photogating-assisted tunneling boosts the responsivity and speed of heterogeneous WSe₂/Ta₂NiSe₅ photodetectors. Nat. Commun. 15, 141 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Yin, Z. et al. Single-layer MoS₂ phototransistors. ACS Nano 6, 74–80 (2012).
Article CAS PubMed Google Scholar
Prakash, A. et al. Understanding contact gating in Schottky barrier transistors from 2D channels. Sci. Rep. 7, 12596 (2017).
Article ADS PubMed PubMed Central Google Scholar
Lai, H. et al. Photoinduced multi-bit nonvolatile memory based on a van der Waals heterostructure with a 2D-perovskite floating gate. Adv. Mater. 34, 2110278 (2022).
Article CAS Google Scholar
Janesick, J., Elliott, T., Collins, S., Blouke, M. & Freeman, J. Scientific charge-coupled devices. Opt. Eng. 26, 268692 (1987).
Google Scholar
Yuan, X., Liu, Y., Suo, J., Durand. F. & Dai, Q. Plug-and-Play algorithms for large-scale snapshot compressive imaging. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7093–7111 (2022).
Chen, W. et al. Snapshot compressive imaging based digital image correlation: temporally super-resolved full-resolution deformation measurement. Opt. Express 30, 33554–33573 (2022).
Article ADS PubMed Google Scholar
Zhang, B., Suo, J. & Dai, Q. Event-enhanced snapshot compressive videography at 10K FPS. IEEE Trans. Pattern Anal. Mach. Intell. 47, 1266–1278 (2025).
Article PubMed Google Scholar
Razavi, B. et al. Fundamentals of microelectronics. (John Wiley & Sons, 2021).
Qiu, H., Wang, Y. & Meng, D. Effective snapshot compressive-spectral imaging via deep denoising and total variation priors. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9123–9132 (IEEE, Nashville, TN, USA, 2021).
Arce, G. R. et al. Compressive coded aperture spectral imaging: an introduction. IEEE Signal Process. Mag. 31, 105–115 (2014).
Article ADS Google Scholar
Pranav, S. V., Suraj, P., Pothineni, N., Kadambari, K. V. & Balasubramanyam, P. Optimizing video compression & decompression using deep learning networks. 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON), 1–6 (IEEE, Kuala Lumpur, Malaysia, 2021).
Tzagkarakis, G., Tsagkatakis, G., Starck, J.-L. & Tsakalides, P. Compressive video classification in a low-dimensional manifold with learned distance metric. 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), 155–159 (IEEE, Bucharest, Romania, 2012).
Goswami, D., Chakraborty, S. Active learning for video classification with frame level queries. 2023 International Joint Conference on Neural Networks (IJCNN), 1-9 (IEEE, Gold Coast, Australia, 2023).
Zhou, G. et al. Full hardware implementation of neuromorphic visual system based on multimodal optoelectronic resistive memory arrays for versatile image processing. Nat. Commun. 14, 8489 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, H. et al. From bulk to monolayer MoS₂: evolution of Raman scattering. Adv. Funct. Mater. 22, 1385–1390 (2012).
Article ADS CAS Google Scholar
Zhang, Z.-C. et al. An ultrafast nonvolatile memory with low operation voltage for high-speed and low-power applications. Adv. Funct. Mater. 31, 2102571 (2021).
Article CAS Google Scholar

Download references

Acknowledgements

This research is supported by the National Key R&D Program of China (Grant No. 2023YFB2806300). C.T. thanks the funding support from the National Natural Science Foundation of China – Excellent Young Scientists Fund (Hong Kong and Macau) (No. 52122002), the Start-Up Grant (Project No. 9610710) from City University of Hong Kong, ECS scheme (21201821), General Research Fund (11200122) and the Collaborative Research Fund (RGC; no. C2001-23Y and C5001-24Y) from the Research Grant Council of Hong Kong and ITC via Hong Kong Branch of National Precious Metals Material Engineering Research Center (NPMM). Z.W. thanked the support from the National Natural Science Foundation of China (Grant Nos. 62122004, 62374181), Beijing Natural Science Foundation (Grant No. Z210006), Hong Kong Research Grant Council (Grant Nos. 27206321, 17205922, 17212923). This research is also partially supported by ACCESS – AI Chip Center for Emerging Smart Systems, sponsored by Innovation and Technology Fund (ITF), Hong Kong SAR. Y.C. Thanks the funding support from the General Research Fund (15301621) from the Research Grant Council of Hong Kong. P.Y. thanks the funding support from the National Natural Science Foundation of China (Grant No. 62404138) and Shenzhen Science and Technology Program (Grant No. 20231128102926002).

Author information

These authors contributed equally: Haoxin Huang, Shuhui Shi, Jiajia Zha.

Authors and Affiliations

Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
Haoxin Huang, Jiajia Zha, Yunpeng Xia, Hau Ping Chan & Chaoliang Tan
Department of Electrical and Electronic Engineering, University of Hong Kong, Hong Kong SAR, China
Shuhui Shi
School of Microelectronics, Southern University of Science and Technology, Shenzhen, 518055, China
Shuhui Shi & Zhongrui Wang
College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen, 518060, China
Huide Wang
College of Integrated Circuits and Optoelectronic Chips, Shenzhen Technology University, Shenzhen, 518118, China
Peng Yang
Department of Chemistry, The Chinese University of Hong Kong, Hong Kong SAR, China
Long Zheng & Ye Chen
Department of Electronic & Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Songcen Xu
Department of Materials Science and Engineering, City University of Hong Kong, Hong Kong SAR, China
Wei Wang & Johnny C. Ho
Department of Chemistry, City University of Hong Kong, Hong Kong SAR, China
Yi Ren & Yongji Wang
Department of Applied Physics, The Hong Kong Polytechnic University, Hong Kong SAR, China
Yang Chai
Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, China
Chaoliang Tan
Hong Kong Branch of National Precious Metals Material Engineering Research Center (NPMM), City University of Hong Kong, Kowloon, Hong Kong SAR, 999077, China
Chaoliang Tan

Authors

Haoxin Huang
View author publications
Search author on:PubMed Google Scholar
Shuhui Shi
View author publications
Search author on:PubMed Google Scholar
Jiajia Zha
View author publications
Search author on:PubMed Google Scholar
Yunpeng Xia
View author publications
Search author on:PubMed Google Scholar
Huide Wang
View author publications
Search author on:PubMed Google Scholar
Peng Yang
View author publications
Search author on:PubMed Google Scholar
Long Zheng
View author publications
Search author on:PubMed Google Scholar
Songcen Xu
View author publications
Search author on:PubMed Google Scholar
Wei Wang
View author publications
Search author on:PubMed Google Scholar
Yi Ren
View author publications
Search author on:PubMed Google Scholar
Yongji Wang
View author publications
Search author on:PubMed Google Scholar
Ye Chen
View author publications
Search author on:PubMed Google Scholar
Hau Ping Chan
View author publications
Search author on:PubMed Google Scholar
Johnny C. Ho
View author publications
Search author on:PubMed Google Scholar
Yang Chai
View author publications
Search author on:PubMed Google Scholar
Zhongrui Wang
View author publications
Search author on:PubMed Google Scholar
Chaoliang Tan
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.W. and C.T. proposed the research direction. H.H., C.T. and Z.W. conceived the idea for the project. Y.Chai, Z.W. and C.T. supervised the project. H.H. performed materials synthesis, fabrication and characterization of devices. S.S. conducted simulation of devices. J.Z. contributed to optoelectronic performance measurements and data analysis. H.P.C. provided guidance on theoretical analysis. Y.X., H.W., and P.Y. assisted in materials synthesis. J.H., W.W., Y.R. and Y.W. supported experimental testing. L.Z. and Y.Chen carried out transmission electron microscopy characterization. S.X. performed atomic force microscopy characterization. H.H., S.S., J.Z., Y.Chai, Z.W., and C.T. co-wrote the manuscript. All authors participated in manuscript discussion and revision.

Corresponding authors

Correspondence to Yang Chai, Zhongrui Wang or Chaoliang Tan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Vishwanath Saragadam, Yang Wang, and the other, anonymous, reviewer for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Movie 1

Supplementary Movie 2

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, H., Shi, S., Zha, J. et al. In-sensor compressing via programmable optoelectronic sensors based on van der Waals heterostructures for intelligent machine vision. Nat Commun 16, 3836 (2025). https://doi.org/10.1038/s41467-025-59104-7

Download citation

Received: 14 October 2024
Accepted: 11 April 2025
Published: 24 April 2025
DOI: https://doi.org/10.1038/s41467-025-59104-7