Introduction

A variety of atmospheric pollutants are emitted from energy use and industrial processes, including particulate matter, nitrogen oxides (NOx), carbon monoxide (CO), and other trace gases. Emitted species may also undergo atmospheric chemical processing and be transported by the wind, which in tandem affects the balance of atmospheric oxidation, chemical properties, and the ultimate concentrations of these species. CO is one such trace gas that impacts air quality, global climate forcing, and the budget and distribution of hydroxyl (OH) radical, which play a crucial role in atmospheric chemistry1,2,3. Although the direct radiative forcing of CO is relatively small, its indirect effects, including its influence on the concentrations of the greenhouse gasses methane and ozone4, as well as its conversion into carbon dioxide (CO2), contribute notably to the global radiative balance5.

CO is produced by incomplete combustion of carbon-based fuels, biomass burning, and chemical oxidation of volatile organic compounds and methane6. Its co-emission with NOx depends on thermodynamic conditions, and oxygen and nitrogen availability at the time of combustion7,8. From the perspective of bottom-up emission inventories, CO and NOx emissions are calculated by applying emission factors to a set of activity data. Previous studies over China have indicated that biofuels and biomass burning contribute nearly half of CO emissions but only a tenth of NOx emissions9,10,11. Due to variations in combustion temperature, oxygen content, and combustion efficiency among different production stages and the generation mechanisms governing pollutants formation related to the combustion process, such as CO and NOx, emission ratios of these gases differ obviously among processes like transportation, power generation, biomass burning, and iron and steel production8,9. Therefore, analyzing the ratio of CO emissions to NOx emissions (CO/NOx) variation can provide valuable insights for further investigation and source attribution2. If the CO/NOx ratio in an emission inventory is not accurately captured, there may be an under/overestimation of CO or NOx emissions. A biased/inaccurate emission inventory, as a crucial input data for atmospheric models, will pose challenges for the simulated results12. This is one possible reason why these models still struggle to accurately reproduce observed long-term changes even after more than a decade development13,14.

Environmental management of CO monitoring and supervision is limited compared with other criteria pollutants, such as NOx, sulfur dioxide, ozone, and particulate matter, which are all collected in “Ambient air quality standards (GB 3095-2012)”. Although many countries monitor CO concentration and enforce ambient air quality standards15,16, the management of CO emissions is practically nonexistent. This is because regulatory efforts prioritize criteria pollutants that have a greater impact on the air quality index, as CO’s ambient standard and index cutoff are relatively high17. Specifically in China, policy-based controls for CO emissions are primarily implemented in waste incineration and a few other industries18, with only a few cities such as Tangshan, Handan, and Linfen, which all have a large number of iron and steel factories, introducing CO controls in the iron and steel industry. Moreover, the continuous emissions monitoring systems (CEMS) in China and other counties do not provide CO data, hindering policy-based controls on a stack-by-stack basis8,19. Therefore, detailed quantification of CO emissions over a region with diverse sources can yield critical insights and potentially enhance environmental management efforts and lead to synergistic reduction of air pollution and greenhouse gasses20,21,22,23,24.

This study employs a model free inversion estimation framework (MFIEF) to estimate daily CO emissions over a mesoscale grid (0.05° × 0.05°) from May 2018 through April 2022 over Shanxi Province, utilizing daily observations from the TROPOspheric Monitoring Instrument (TROPOMI) of CO and HCHO. A spatially and temporally consistent top-down constrained NOx emission inventory, calculated using the same algorithm8 combined with bottom-up CO to NOx ratio estimates over Shanxi, serves as the a priori constraint. MFIEF as a top-down constrained mass-conserving method where emissions are calculated based on geospatial divergence of mass and wind, sink and production. It was established based on the best fit properties of transport distance, and the lifetimes of various sinks and production terms in the atmosphere, similar to but simplified as compared with current chemical transport models25. The geospatial area (Fig. 1) is selected since it is an energy-rich region in Northern China that produces more than a quarter of China’s coal (including about half of China’s coking coal) and consumes nearly a tenth26,27. The geography is also unique with mountains and basins contributing the majority of surface area and generally low cloud cover, leading to intense atmospheric processing and observed concentrations not encountered in previous studies. However, this unique set of conditions present an ideal laboratory to study changes that occur during the combustion and utilization of coal, co-emitted pollutants, and their interrelationships.

Fig. 1: TROPOMI CO column loadings and land use types of Shanxi Province.
figure 1

a Annual mean TROPOMI CO column loadings for 2019 over Shanxi and parts of surrounding provinces; and b urban, natural, industrial, and rural areas over Shanxi at a spatial resolution of 0.05° × 0.05°.

Results

Spatial and temporal distribution of calculated CO emissions

The CO emissions were calculated (hereafter EICO) over the ___domain from 34°N to 41°N and 110°E to 115°E inside the Loess Highlands of China. The research period from May 2018 through April 2022, is divided into four annual segments, namely May 2018 to April 2019, May 2019 to April 2020, May 2020 to April 2021, and May 2021 to April 2022 (hereafter 2018–2019, 2019–2020, 2020–2021, and 2021–2022). Annual average EICO (average total EICO over the four annual segments) converted into policy relevant units, uncertainty range (10th through 90th percentile), and day-to-day variability over Shanxi are 30.4, 15.0, and 17.1 Tg yr-1, respectively (Fig. 2a, b). Year-to-year values for total emissions, uncertainty range, and daily variability are listed in Table 1. Total EICO exhibited yearly decrease, consistent with China’s long-term air quality management strategy. This reduction can be attributed to indirect controls on other co-emitted species28,29,30,31. However, important temporal variation is observed, with the highest emissions occurring in December to January each year, coinciding with the end of the industrial cycle and the start of the Chinese New Year holiday (mostly at late January through early February), consistent with orders for materials behind schedule possibly being rushed before the close of business (Fig. 2c). EICO across Shanxi reveals three distinct emission intensity regions (≥9 µg m-2 s-1, 6 ~ 9 µg m-2 s-1, and <6 µg m-2 s-1, hereafter Cases I, II, and III, detailed in Fig. 2d and Supplementary Note 1). EICO during December to January decreased notably, with values of 14.4 ± 6.4, 9.7 ± 4.3, 6.7 ± 3.2, and 6.4 ± 2.6 µg m-2 s-1 on a year-by-year basis Shanxi-wide, and decreased even more sharply over Case I aeras with 26.5 ± 11.6, 18.2 ± 7.6, 13.3 ± 6.0, and 11.2 ± 4.4 µg m-2 s-1 on a year-by-year basis. The high-value areas show the most pronounced decline, while the benchmark emissions (CASE III) across the province have not shown a notable reduction. Applying a Mann-Kendall test over 48 monthly EICO reveals a statistically significant downward trend for 17 months over Case I and for 12 months Shanxi-wide, with December and January being present twice over Case I aeras and only once at Shanxi-wide, shown in Supplementary Fig. 1. A small peak of EICO is also observed around May, with year-to-year variations in its start and end points. When data from December, January, and May are removed, the emissions have shown a small and continuous decrease (6.3 ± 3.2, 5.6 ± 2.8, 5.1 ± 2.7, and 4.4 ± 2.2 µg m-2 s-1). A detailed analysis from a finer spatiotemporal perspective indicates that 2020, despite the COVID-19 outbreak, was not the year with lowest CO emissions, more information in Supplementary Note 1 and Supplementary Figs. 2-3.

Fig. 2: EICO from May 2018 to April 2022 over Shanxi at 0.05° × 0.05° scale.
figure 2

a Year-to-year spatial distribution of daily average emissions; (b) bootstrapping uncertainty range (10th through 90th percentile); (c) day-to-day variation of the Shanxi-average emissions; (d) 4 year average spatial distribution and weekly averaged CO emissions over three different geographic domains (Cases I, II, and III).

Table 1 Yearly values for calculated CO total emissions, uncertainty range, and daily variability

The Multi-resolution Emission Inventory for China (MEIC)22 and the Emission Database for Global Atmospheric Research (EDGAR)21,32 are bottom-up emission inventories used by many in the atmospheric and air pollution modeling and observation communities. MEIC v1.4 and EDGAR v8.1 are selected since they provide bottom-up emissions of anthropogenic air pollutants over China, with a monthly time step and spatial resolutions of 0.25° × 0.25° for MEIC and 0.1° × 0.1° for EDGAR. Although the a priori emissions and EICO have data from May 2018 to April 2022, since MEIC only overlapped in 2019 and 2020, comparisons are made in the year of 2019 and 2020 (monthly from January to December). The total emissions of EICO are 37.5 ± 18.5 Tg yr-1 and 28.8 ± 14.5 Tg yr-1, while MEIC yields 5.7 Tg yr-1, 5.3 Tg yr-1 and EDGAR yields 3.7 Tg yr-1, 3.7 Tg yr-1 in 2019 and 2020, respectively. EICO compared with the total quantities reported by MEIC or EDGAR show noticeable increase, with the mean values around 5 ~ 7 times higher than MEIC and 8 ~ 10 times higher than EDGAR (Fig. 3, Supplementary Figs. 4-5). Assuming all additional CO emissions will ultimately decay into CO2 and considering the increases of EICO to MEIC are 31.8 and 23.5 Tg yr-1 of 2019 and 2020, respectively, leads to a corresponding increase in CO2 emissions of 50.0 and 37.0 Tg yr-1. The reported emissions of MEIC CO2 are 507.6 Tg yr-1 in Shanxi and 10118.9 Tg yr-1 for China in 2019, and 521.5 Tg yr-1 and 10051.1 Tg yr-1 in 202033. The underestimation of CO emissions may finally imply an underestimation of CO2 by ~9.9%, 7.1% over Shanxi and 0.5%, 0.4% over China based on MEIC CO2 in 2019 and 2020.

Fig. 3: Probability density functions of emission differences between EICO and MEIC/EDGAR.
figure 3

a EICO minus MEIC from January 2019 to December 2020; (b) EICO minus EDGAR from January 2019 to December 2020. c EICO minus MEIC during the four high emission months of January and December in both 2019 and 2020; and (d) EICO minus EDGAR during the four high emission months of January and December in both 2019 and 2020. Spatial sampling of three cases according to EICO, blue are grids where EICO is >9 µg m-2 s-1, red are grids where EICO is between 6 µg m-2 s-1 and 9 µg m-2 s-1, and orange are grids where EICO is <6 µg m-2 s-1.

From Fig. 3 there are three important sub-caveats observed in the above differences. First, EICO exceeds MEIC/EDGAR more in December and January for Cases I and II, indicating that there is a genuine increase during these months compared with the rest of year, and these temporal changes were not well captured by the bottom-up datasets. Second, the differences in December and January in Case III are still positive, but smaller than in Cases I, II, and the yearly average of Case III, which represents that there is a spatial change in the area covered by low polluting sites during different times. Third, EICO covers 6701 grid points at 0.05° × 0.05° resolution over Shanxi. In 2019–2020, MEIC contains 116 grid points, and EDGAR contains 113 grid points with emissions higher than EICO. In December and January, these numbers drop to 71 for MEIC and 90 for EDGAR. The percentage of MEIC/EDGAR higher than EICO and its uncertainty range are low (but the emissions of MEIC/EDGAR at these grids are extremely high), indicating that these overestimated grids likely have insufficient spatial spread. A detailed comparison over different cases has been made. Over Case I, MEIC/EDGAR always reveal high bias compared with EICO (EICO does not reproduce such high values), consistent with the bottom-up emissions on these grids likely being outdated and lacking consideration of the effects of continually stricter policies on the well-known large sites (like MEIC reports the largest sources are concentrated in the steel-producing regions near Taiyuan)34. The bottom-up datasets also underestimate CO emissions from the rapid increase in iron and steel enterprises in Yuncheng and Linfen35. However, these ultra-high emission grids from MEIC/EDGAR are not necessarily larger than the EICO computed in December and January, indicating possible issues in capturing temporal variations (Table 2, Supplementary Fig. 6). Over Case III, EICO is consistently higher than MEIC/EDGAR on overlapping grids, implying missing or underestimated sources. There are many very low values close to zero in monthly MEIC (70.5% and 72.9% of pixels below 1.0 µg m-2 s-1 in 2019 and 2020) and EDGAR (88.4% pixels below 1.0 µg m-2 s-1 in both year), or more grids in rural areas with non-insignificant sources that are not found. Over Case II, averaged emissions seem similar, but geographical shift may exist among different emission inventories. In those grids characterized by rural residential burning and wildfires, small and moderate sources rise, consistent with rural economic activity and energy use. Comparing annual data with December and January results shows EICO generally increases, while MEIC/EDGAR decrease in Case I and stay stable in Cases II and III. This once more highlights EICO’s improved representation of temporal variability.

Table 2 Monthly averaged emission and uncertainty range of EICO and MEIC/EDGAR at different areas of Cases I, II, and III

Sensitivity and robustness of calculated CO emissions

Sensitivity tests are performed to actively account for the ranges of uncertainties of four different input variables used in MFIEF to compute CO emissions for the year 2020 (from January to December), as shown in Fig. 4a–g for eight cases (±30% TROPOMI CO, ±40% TROPOMI HCHO, ±30% a priori emission of CO, and two different pressure levels of wind). The remaining eight cases (pairs of variables changed together) can be found in Supplementary Fig. 7. None of the cases recalculated had a perturbed recalculated emissions value larger than the magnitude of the perturbation, indicating that the model and observations tested were all sufficiently stable and robust to produce changes smaller than the applied perturbation. In particular, two things were found. First, the emissions computed in all cases are stable: there are no new peaks or troughs in the spatial results, and no significant changes in the temporal profiles. Second, the recalculated emissions in all cases are robust: the computed emissions always have a smaller difference than the magnitude of the perturbations (the range of fractional changes of the eight cases in Fig. 4 are 1.03–1.26, 1.06–1.27, 0.98–1.33, 0.82–1.21, 0.78–0.98, 0.76–0.95, 0.93–1.13, and 0.82–1.28, respectively).

Fig. 4: Ratios between CO emissions computed due to adjustments of input variables and the original EICO of the whole year 2020 with the ranges of fractional changes (inside parentheses).
figure 4

Calculated CO emissions when (a) TROPOMI CO plus 30% (1.03–1.26); (b) a priori emission of CO plus 30% (1.06–1.27); (c) TROPOMI HCHO plus 40% (0.98–1.33); (d) wind data changed to 900 hPa (0.82–1.21); (e) TROPOMI CO minus 30% (0.78-0.98); (f) a priori emission of CO minus 30% (0.76-0.95); (g) TROPOMI HCHO minus 40% (0.93–1.13); and (h) wind data changed to 800 hPa (0.82-1.28).

Among all terms analyzed above, the term exhibiting the largest differences between the positive and negative uncertainties is the change of a priori emissions (only marginally larger than TROPOMI CO). The ratios between CO emissions computed due to adjustments of input variables and the original EICO of 2020 over three emission cases chosen by the original EICO are listed in Supplementary Table 1. When compared between TROPOMI CO plus 30% and a priori emission of CO plus 30%, recalculated EICO changed more with respect to the perturbation of TROPOMI CO in both Cases I and III, while the recalculated EICO changed more due to the a priori emission perturbation in Case II. However, there is substantial spatial inhomogeneity in the sensitivity (Fig. 4a, b, e, and f). In the positive perturbation cases, in parts of north-western Shanxi, changes due to priori emissions are larger than those due to CO observations, while overall, they are smaller in south-eastern Shanxi. In the negative perturbation cases, changes due to priori emissions are generally larger in eastern regions, with only some areas in Yuncheng, Changzhi, and Jincheng Cities relatively smaller. When the a priori emission of CO minus 30% are introduced, Cases I and II were impacted more or the same by the change in TROPOMI than the a priori emissions. The influence of the a priori inventory has a somewhat larger influence in certain regions, but not consistently. The comparison between the a priori emissions and EICO is shown in Supplementary Figs. 4a, 4d, 5a, and 5d. The priori inventory was 22.5 Tg yr-1 and 15.1 Tg yr-1 for 2019 and 2020, respectively, which is 40.0% and 47.6% smaller than EICO. From the perspective of the distribution of differences, Case I area show a noticeable increase in EICO, while the opposite holds around Taiyuan, Yangquan, and Shuozhou. Additionally, the ratios between EICO and the a priori emissions have also been compared (EICO divided by a priori emissions on a grid-by-grid and day-by-day basis), range from 0.77 ~ 7.50, 0.68 ~ 7.75, and 0.57 ~ 11.35 for Cases I, II, and III. Case III exhibits the broadest range of ratios. To further validate the impact of the a priori inventory, additional simulations were conducted using the current EICO as a new a priori input, as computed and presented in Supplementary Fig. 8. This shows that there is very little change over Case III, and decrease at some mountain areas. Despite spatial irregularities between the two a priori inputs, their impact on emissions seems minimal, indicating method robustness, with multiple iterations aiding convergence and optimization.

Adjusting the wind pressure level from 850 hPa–900 hPa (Fig. 4d) leads to elevated emissions within the high-altitude areas of northern Shanxi simultaneously and a reduction of emissions in the basin areas, with all changes < ± 21%. Adjusting the wind pressure level from 850 hPa–800 hPa (Fig. 4h) leads to the largest emissions changes around mountain peaks and generally higher elevation areas, especially an increase in Xinzhou and decrease in Northern Yuncheng, with all changes <±28%. Intriguingly, in the regions such as Linfen, Yuncheng, and Jincheng, where CO emissions are relatively high, emissions still increase despite the altitudes being similarly low to the basin areas. When the wind from 750 hPa level were used, the emission changes at the same ___location with 800 hPa level winds, with the most notable difference being a pronounced increase in the magnitude of emissions, particularly for positive changes, seen in Supplementary Fig. 9. The results of one of MFIEF parameter (CO lifetime) followed the perturbations in Fig. 4a–c and e–g has also been investigated. It is precisely because the magnitude of these changes in parameters is relatively small that the recalculation results of the emissions tend to be robust, detailed in Supplementary Fig. 7i. These findings are consistent with the methodology, since any abnormally large or small driving factors computed due to the observational uncertainties are filtered out if physically unrealistic. Similarly, the non-linear effects of uncertainty on the spatial gradient terms are weighted by the linear production and loss terms, ensuring that fluctuations in these components counterbalance one another, bolstering confidence in the reliability and quality of the results.

Decay and production of CO

Due to MFIEF’s flexibility, quantitative information about the two components lifetime reflects a balance of decomposition and diffusion of CO (i.e., sinks of CO) and production of CO and diffusion of HCHO (i.e., sources of CO). Therefore, the best fit coefficients calculated from these two parts are defined as CO lifetime (α2) and HCHO lifetime (α4), respectively. The lifetime of CO is a function of the reaction between CO and the OH radical, forming CO2, and mixing between the high concentration plume of CO both as it is emitted and as it evolves in the atmosphere with respect to the lesser polluted air immediately surrounding it within an observed grid. Therefore, CO lifetime here include the residence lifetime and chemical lifetime. The furthest distance that the plumes from major source regions in the province can travel is around 750 km if it goes from Yuncheng to Datong, and about 600 km if it goes from Yuncheng via Taiyuan to Yangquan. This does not include the fact that major sources in Changzhi and Jincheng are located in a completely closed basin and may outflow or not in a more complex manner. To account for diffusion effects, an explicit term has been incorporated into the model (Supplementary Fig. 10). The impact on the emissions is relatively small, with a variation range from −7.1% to 7.6% within a 90% confidence interval. This will make some change to the total CO lifetimes, but not substantially.

Due to the extremely high level of local methane36 and the emissions of volatile organic compounds due to coke and other coal chemical industries in Shanxi37,38, it is important to also consider the production of CO in-situ. However, due to limited validated global observations from remote sensing platforms, this work uses HCHO as a proxy for CO production, given HCHO’s reasonable retrieval properties39. The mean, 10th and 90th percentile of the CO and HCHO atmospheric in-situ net processing decay time are [2.9,1.0, 5.7] days, and [1.0, 0.2, 2.3] hours. The computed lifetime of CO is consistent with observed values of OH over other highly polluted and relatively drier areas40,41,42,43, as well as downwind from major forest fires44. Figure 5 illustrates the monthly distribution of CO and HCHO lifetime in Shanxi. In August and December there is an obvious increase in the lifetime of CO year-by-year from 2018 through 2022, which is not obvious in HCHO lifetime. However, during other months (May and November) the lifetimes of both CO and HCHO have been observed to increase year-by-year from 2018–2022. There are two reasons why sometimes (especially in the high emission months) CO’s lifetime has increased from 2018–2022 while HCHO’s lifetime has not. First, there is an overall decrease in OH concentration45,46 consistent with the documented rise in methane emissions due to increased coal production from high coal-bed methane sources in Shanxi over the past decades25. Second, changes in the compositions of air in the surrounding basins due to different controls and economic growth play a non-linear role in the heavily mountainous environment in terms of sub-grid mixing between the different atmospheric environments. These factors emphasize the importance of accurately constraining CO emissions and associated processes in tandem.

Fig. 5: Monthly lifetimes of CO (days) and HCHO (hours) calculated by MFIEF for 2018–2022.
figure 5

Boxes show the interquartile range (25th, median, and 75th percentiles), upper and lower ends of the lines represent nonoutlier maximum and minimum.

The lifetime of CO is observed to be relatively longer in May and July, and relatively shorter in December and January, consistent with local observations. First, Shanxi has a less stable atmosphere in the warmer months (June–August), coupled with a relatively dry atmosphere year-round, leading to a higher cloud optical depth in May and July47. Second, there is a relatively large number of absorbing aerosols emitted here due to coal consumption48,49, combined with enhanced particle aging in the higher UV seasons (July of 2019, and April, June and July of 2020)50 and under higher temperature conditions51 leading to a relatively higher column absorption of UV as secondary aerosol coats and mixes with locally emitted black carbon52, which in turn absorbs more UV from the column when the solar zenith angle is higher, maximizing in May8. Observations of the monthly average absorbing aerosol optical depth (AAOD) from Multi-Angle Imaging SpectroRadiometer (MISR) over Shanxi (0.0039 March 2019 to February 2021) confirm that the value is slightly larger in May (0.0063 in 2019 and 0.0071 in 2020), while lower than or similar to the background in December and January (0.0041 in 2019–2020 and 0.0032 in 2020–2021)53, as detailed in Supplementary Fig. 11. These factors lead to reduced UV within the lower troposphere, which in turn leads to a decrease in OH production due to the joint limitations on UV and water vapor54. Third, NOx loadings are observed to be lower during the warmer months. Fourth, the rate of CO decrease is far smaller than that of NOx. This combination leads to a lower net atmospheric oxidation potential. The results are consistent with both the underlying theory and the actual observations in this region, raising the question whether modeled oxidant and CO fields over this region should be examined more carefully.

α4 is relatively stable, with just an obvious change in September and October. The shorter lifetime of HCHO will lead to more CO production. However, the ratio between the lifetime of CO and HCHO in two specific sub-regions within Shanxi is larger (Supplementary Fig. 12). Those sub-regions are known to be upwind from other major sources adjacent to Shanxi and have topographic gaps. This pattern aligns with the major factors in these regions being influenced by emissions and subsequent long-range transport from far-away industrial and urban areas44. Diurnal changes in mountain winds near polluted air outside Shanxi’s surrounding mountains cause vertical enhancement and transport into Shanxi, altering the local lifetime of pollutants. This finding is further supported by the high observed loadings of absorbing aerosol co-emitted and observed to obviously reduce UV at the surface as the air is transported over multiple days from these sources to Shanxi34.

Attributing sources using emission ratios

CO and NOx emissions depend on energy efficiency, combustion temperature, oxygen availability, and others, with notable variations. It is of interest to investigate the relationship between both species and use this to identify and attribute different source types2,55. On average, CO/NOx varies consistently across four land use types (Fig. 6a). Urban and industrial areas generally have lower CO/NOx than rural areas, which in turn are lower than in natural areas. Since industrial and urban areas have more high-temperature combustion sources associated with power, transportation, and gas burning sector9, they emit more NOx and less CO. Rural areas are mixed small industry, boilers, and biomass burning56, leading to less NOx and more CO. Natural areas have minimal human influence and pollution. A detailed analysis of CO/NOx is performed on different industrial sub-types provided by CEMS (Fig. 6b). Cement and power57 have the lowest median CO/NOx, boilers and coke show intermediate median CO/NOx, while iron and steel is highest. Notably, the CO/NOx ranges (between 75th and 25th percentile) for coke and boilers are wider than others, while the range for iron and steel is smallest. Iron and steel has both the highest 25th percentile and median value, with its 25th percentile value similar to the 75th percentile for cement and power and similar to the median for coke and boiler, while its 75th percentile is slightly lower than coke. Despite high combustion temperatures in iron and steel production, resulting in high NOx emissions, the CO/NOx ratio remains high due to even higher CO emissions from sintering and blast furnaces, since carbonaceous material (including metallurgical coke, coal and natural gas) are used to reduce iron oxide to iron, in addition to being directly used as fuel for combustion. Cement’s CO/NOx ratio slightly exceeds power across all percentiles due to higher NOx emissions from rotary kilns. Considering the length, consistency, and temperature of combustion, the overall lesser availability of oxygen leads to more CO than power and slightly larger observed CO/NOx. Coke oven and boilers have similar median and 25th percentile values, the values above the median (75th percentile and upper line) and the lower line are smaller for boilers. Both coke and boilers have larger 25th and 50th percentile values compared to power and cement, but lower than iron and steel. The main fuel during coke production is natural gas, wherein the combustion process is meant to cook the coal, turning it into coke. The produced CO is retained in the coke oven gas instead of being emitted directly. Furthermore, some coke plants have low CO emissions due to the lack of oxygen in charring chambers, which results in a large amount of lost coal (not converted into coke) being converted into black carbon instead of CO. Leakage from combustion chamber to charring chamber in coke ovens, influenced by their design, age, and efficiency, can result in extra CO generation due to the influx of additional oxygen. The combination leads to wide range of coke CO/NOx. Boilers, operating at lower capacity and efficiency than power plants, hence contribute less NOx per unit coal consumed and a greater potential for incomplete combustion and produce normal to high CO emissions58, resulting in a relatively wide CO/NOx range. Variations in the month-to-month CO/NOx detailed in Supplementary Note 2 and Fig. 6c, shown high values in Spring and Autumn.

Fig. 6: Different types of CO/NOx ratios.
figure 6

a annual average CO/NOx for different land use types, (b) annual average CO/NOx for different industrial types, (a) and (b) the boxes show the interquartile range (25th, median, 75th percentiles), the upper and lower lines represent nonoutlier maximum and minimum, while the points are outliers. c monthly average CO/NOx ratios for different industrial types, the dots are means with the variabilities of each month ratios.

The relationship between CO/NOx and source type is complementary. First, the CO/NOx calculated based on the distribution of CEMS reveals obvious differences among different industries, specifically discerned from both the average values and percentile distributions. Subsequent application of CO/NOx to the whole ___domain (include the grids with or without CEMS data) allows a weighted determination of the pollution source types. The following uses power plants as an example to illustrate the attribution steps accomplished using CO/NOx. First, the grids were selected based on CO/NOx from CEMS power plants (central 90% confidence interval), with 455 individual grids chosen. Second, these results were compared with the Emission Permit dataset59 (EP) from the government, and it was found that 183 grids were correctly identified as power plants among the grids selected in the first step, with details presented in Table 3. Third, the number of grids obtaining localized permitting but lacking CEMS was determined by subtracting the CEMS grid numbers from the correct grid numbers. Four industrial source types (cement, power, iron/steel, and coke) have been attributed. Power plants are found to have the most grids lacking CEMS (50.8%) and the highest accuracy rate (40.2%). Coke factories have the highest relationship based on magnitude, with 81.7% of known grids successfully identified according to EP. Similar positive identification has been made for other subsets of data types, even though they may have smaller emissions or be more geographically constrained.

Table 3 Source identification using CO/NOx on four kinds of industrial factories

Impacts of variability and long-range transport

Due to its relatively longer lifetime, CO serves as an indicator of dynamical transport. With a daily average transport distance of CO is ~58–471 kilometers (10th and 90th percentile), there is a wide range of variation, up to hundreds of kilometers on specific days in specific parts of Shanxi. Since typical zonal and meridional transport within Shanxi occurs on the order of 1–2 weeks, the calculated lifetime of CO is capable of representing long range transport when its chemical lifetime is similarly long60,61,62. The divergence component of the transport followed MFIEF is calculated based on \({\alpha }_{3}\cdot {\nabla ({{{\bf{u}}}}\cdot {V}_{{CO}})}\). Comparative analyses indicate that divergence accounted for 8.2% of the impact on EICO of 2020, detailed in Supplementary Fig. 13.

Divergence refers to a scalar-valued measure of the amount by which a vector field is spreading out or converging at a particular point. As the divergence term adds/subtracts linearly to/from the emissions, the temporal mean emissions can be determined from the temporal mean sinks plus the divergence of the temporal mean flux, which in general has a smooth spatial distribution, allowing for the calculation of spatial derivatives. The divergence of positive spikes indicates strong localized sources, while negative divergence indicates chemical and diffusive loss63.\(\,{ \nabla ({{{\bf{u}}}}\cdot {V}_{{CO}})}\) in Eq. (1) represents the daily zonal and meridional divergence of CO, which peaks during December and January and weakens in May64. A north-south corridor is observed from Linfen/Yuncheng to Taiyuan/Datong in the central basin area of Shanxi, particularly evident during December to January (Fig. 7a). This corridor is surrounded by relatively highly polluted regions, continuously receiving inputs from various sources. The starting point of this corridor is found in Shaanxi Province, where there are strong urban and industrial sources in the megacity of Xi’an City and the surrounding area. There is an observed weakening of the transport from 2018 to 2022. However, there is also an observed increase in the local source transport from Weinan, a smaller city in Shaanxi found just west of the Shanxi boarder, into Shanxi. This transition is consistent with policies to combat pollution in megacities such as Xi’an, and to encourage more development in smaller urban areas, such as Weinan. There is similarly a strong source within Yuncheng, that transports CO further upwind into Shanxi. The combination collectively contributes to the high emissions in Linfen and Yuncheng, as well as transport into the observed north-south corridor. In eastern Shanxi bordering Hebei Province, cities like Shijiazhuang, Xingtai, and Handan exhibit strong CO sources, partly due to centralized steel enterprises and residential sources. Transport through Taihang Mountain only occurs via small gaps in Yangquan, Changzhi, and Jincheng at relatively low latitudes.

Fig. 7: Daily average CO emission and (u∙VCO) during special time periods from 2018 to 2022.
figure 7

a With CO emission (top) and \({\nabla ({{{\bf{u}}}}\cdot {V}_{{CO}})}\) (bottom) from December to January. b With CO emission (top) and \({\nabla ({{{\bf{u}}}}\cdot {V}_{{CO}})}\) (bottom) in May. Red regions signify net sources due to transport and blue regions indicates the area for pollutant input.

Each year between 2018 and 2021, the emission intensities in May were consistently lower than the averaged emission intensities for the following December and January period, while at the same time the long-range transport channels were reduced (Fig. 7b). Looking at the second and fourth rows of Fig. 7, the highlighted areas of spatially continuous high transport values have overlap with these transport channels, however the strength and spatial extent of this overlap are weaker in May than in December and January. Furthermore, additional and stronger pollution controls in Hebei show reduction in spilling into Shanxi via reduction in the values along the eastern mountains, while the effects on the western side and through the internal channel are either stable or increasing. All of this is consistent with the sources being in Shanxi itself, and possibly to some small extent in areas to the West and/or Southwest, in/or other rapidly developing areas in Shaanxi. The entirety of Yuncheng city and adjacent regions in Shaanxi have turned into sources, the central transport corridor has an insignificant amount of transport, and the neighboring regions in Hebei turn to sinks. Correspondingly, there is little to no transport either from outside or within Shanxi occurring in May.

Discussion

Daily CO emissions are computed using MFIEF based on remotely sensed CO and HCHO, considering first-order atmospheric processing and transport. High CO emissions are observed in densely populated and economically active areas, particularly in the lower Fen River valley, with the highest emissions concentrated around iron and steel factories in parts of Linfen and Yuncheng. Under multi-year air pollution control, CO emissions dropped more in December and January than other months, especially in areas with the highest EICO values. This is attributed to factories increasing production before/after the Chinese New Year and winter residential heating. Differences between EICO, MEIC, and EDGAR suggest decreases in heavily industrialized areas and increases in low-emission zones. Compared with the total quantities reported by MEIC/EDGAR for the same years, EICO has shown a noticeable increase, as around 5 ~ 7 times higher than MEIC and 8 ~ 10 times higher than EDGAR. The total difference between EICO and MEIC may result in additional CO2 production around 7% of Shanxi and 0.4% of China. These results are consistent with Chinese policies over the past decade effectively addressing the low-hanging fruit. Future policies will need to focus more deeply on rural areas, rapidly changing areas, and new industries.

The results regarding the lifetimes of CO loss and production are based on the actual atmospheric conditions in Shanxi, which are characterized by dry weather, high air pollutant emissions, and substantially increasing methane emissions from Shanxi’s high coal production leading to active and variable atmospheric photochemistry, influenced by mountainous terrain leading to active and variable diffusion and sub-grid transport. These conditions uniquely affect atmospheric OH loadings and CO lifetime. While these atmospheric conditions are not unique to Shanxi, they have not frequently been observed or studied in the USA, EU, and Eastern China, and are not widely discussed in the literature. Long-range transport of CO, consistent with the region’s topography and upwind economic development, is observed, highlighting the importance of considering energy consumption patterns and high-resolution topography in future planning.

The average CO/NOx calculated using MFIEF aligns with different land use types, with urban and industrial areas generally showing lower ratios compared to rural and natural areas. Furthermore, Given the value for the CO/NOx over certain industrial areas, specifically iron and steel (high and narrow ratio), power and cement (less high and less narrow ratio), and coking and boilers (not so high, but wide ratio), it is possible to identify and attribute. This finding is closely related to combustion temperature (NOx) and efficiency (CO). By utilizing land use data and CO/NOx, the source types of industries within some grid points could be identified. This opens up an important point that emission inventories need to be more explicit in terms of the driving factors and their respective uncertainties, and how these ultimately lead to changes in the emissions products. Over certain categories of regions in which the data is not sufficient, the noise is large, or that are substantially changing over time, attribution may be highly uncertain. To support future attribution improvement and identification of additional pollution sources, it is recommended to continue to reduce the uncertainty of emissions and integrate additional parameters (like CO2, black carbon34, nitrogen dioxide, nitric oxide8, and sulfur dioxide), requiring improvements in the observational uncertainty, observational type, and modeling.

Future improvements in emissions calculations involve enhancing ground-based and satellite observations, reducing retrieval uncertainties, and expanding the application of MFIEF to other source regions. This includes refining a priori emissions data and expanding the range of species analyzed. Furthermore, expanding into regions with few or no a priori emissions is possible using trained values from Shanxi, possibly adding considerable value in the Global South in areas with similar climatologically conditions and industrial development. The method’s adaptability holds promise for use in air pollution and emergency response efforts to identify and predict extreme pollution events rapidly.

Methods

Satellite and wind data

The TROPOMI spectrometer on board Sentinel-5 Precursor follows a sun-synchronous, low-Earth orbit with an equator overpass around 13:30 LT, and allows for daily measurements globally65,66. This work uses all available CO and HCHO columns which have at least one pass per day over Shanxi during 1st May 2018 to 30th April 2022. Data quality is assured by filtering each pixel with a “qa_value” <0.75, “cloud radiance fraction” >0.5, and scenes that are covered by snow/ice, errors and similar problematic retrievals67,68. Then, each individual TROPOMI CO and HCHO column pixel is resampled using weighted polygons to a common 0.05° × 0.05° grid (http://stcorp.github.io/harp/doc/html/index.html). Annual mean TROPOMI CO columns for 2019 over Shanxi and parts of surrounding provinces are given in Fig. 1a.

The wind data used in this work is from the ERA-5 reanalysis product69, introduced at https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5. This work specifically used the 6:00 UTC u and v wind products (closest in terms of time to the TROPOMI overpass) at 850 hPa, 800 hPa, 750 hPa, and 900 hPa and 0.25° × 0.25° resolution8. Wind data and satellite column concentration data have been unified by linear interpolation to the same 0.05°×0.05° resolution.

Land use data

The types of land-use data employed in this study encompass industrial, urban, rural, and natural areas (Fig. 1b). The scope of industrial areas is obtained by overlaying the grids covered by CEMS and the key industry (thermal power, steel, coking, and cement) involved in EP (they contain the most factories). CEMS is an observation network providing real-time monitoring of the stack concentration and flow of particulate matter, sulfur dioxide, and NOx from large power plants, iron and steel plants, aluminum smelters, coke plants, coal-fired boilers, and others8. Daily flux observations of large industry sources and their locations are provided by CEMS (http://www.envsc.cn). EP (this dataset is more localized and has greater coverage than CEMS with respect to plant type) which has 269 grids having power plants Shanxi-wide, while CEMS only covers 90 grids, and other industrial and factory types similarly more widely represented, and allows a different dataset to be used as a constraint. The data underlying the land use types come from the Department of Civil Affairs of Shanxi Province and the Shanxi Provincial Platform for Common Geospatial Information Services (http://sxch.com.cn/portal/index.html) in 2021, which was used to respectively categorize those grids which are urban and rural, based on their respective administrative level. The remaining grids are classified as natural.

A priori emissions

There are no existing CEMS or other flux observations of CO to set a baseline dataset in Shanxi. For this reason, the a priori emissions were driven based on our previously computed NOx emissions dataset8, scaled by the CO to NOx emission ratio from MEIC version 1.4 (http://meicmodel.org.cn)22,70. The a priori emissions of 2021 and 2022 are also calculated based on 2020 CO to NOx emission ratios from MEIC. To match with the higher resolution of TROPOMI grids, the MEIC data are mapped uniformly to a 0.05 ° × 0.05 ° grid. The average a priori emission values and probability density functions of the grid-by-grid data over Shanxi of the 4 years are given in Supplementary Fig. 14, demonstrating a decreasing trend from 2018 to 2022, with an increase in the probability of lower emission intensities occurring.

Experimental design

Changes in the stock of column CO are simultaneously impacted by emissions (increases the stock), chemical loss (decreases the stock), chemical production (increases the stock), mixing into cleaner air around the edges (decreases the stock), pressure induced and advective transport (increases or decrease the stock)63. This work uses MFIEF to estimate CO emissions. This approach utilizes first-order approximations for each term, ensuring mass conservation, and integrates remotely sensed observations, as shown in Eq. (1). Detailed information is available in previous works8,71,72.

$${{{{\rm{E}}}}}_{{{{\rm{CO}}}}}=\frac{{{dV}}_{{CO}}} {{dt}}+{{\alpha }_{2}\cdot V}_{{CO}}+{\alpha }_{3}\cdot {\nabla ({{{\bf{u}}}}\cdot {V}_{{CO}}) }{{-\alpha }_{4}\cdot V}_{{HCHO}}$$
(1)

Where the computed daily average CO column emissions (natural and anthropogenic sources) on a grid-by-grid and day-by-day basis are denoted as ECO [µg m-2 d-1]. VCO represents the TROPOMI CO column loading after converting the unit to [µg m-2]. \(\frac{{{dV}}_{{CO}}}{{dt}}\) computes the CO concentration change rate with a unit of [µg m-2 d-1], assuming the day-to-day temporal derivative of CO exists in the TROPOMI data on the respective days. \({\alpha }_{2}{\cdot V}_{{CO}}\) represents CO’s sink, with α2 [d-1] as the inverse CO lifetime. Since CO in clean parts of the atmosphere is many times less concentrated than in polluted areas, there is a local transfer around the edges of these plumes, which are roughly similar to the concentration \({V}_{{CO}}\). This value of α2 is dependent on both the chemical loss \(\frac{1}{{\alpha }_{{chem}}}\) and this mixing loss into the background air \(\frac{1}{{\alpha }_{{atm}}}\), computed following Eq. (2). A balance of both effects is found to occur in the real world, in which big forest fire plumes of sufficient concentration eventually are not identifiable using satellite after only traveling for one to 2 weeks23,24,44,61. \({{\alpha }_{4}\cdot V}_{{HCHO}}\) represents CO production, with α4 [d-1] as the inverse HCHO lifetime. \({\nabla ({{{\bf{u}}}}\cdot {V}_{{CO}})}\) represents the daily zonal and meridional divergence of CO with a unit of [µg m-2 d-1]. The u and v wind data were converted into a vector wind \({{{\bf{u}}}}\) with the unit of m s-1. α3 [m-1] represents the transport distance, which quantifies how far emitted CO will travel spatially before it is removed from the equation due to chemical loss, mixing with the background, or otherwise unrecognizable from the background due to uncertainty in the observations themselves.

$$\frac{1}{{\alpha }_{2}}=\frac{1}{{\alpha }_{{chem}}}+\frac{1}{{\alpha }_{{atm}}}$$
(2)

Statistical analysis

This work employs multiple linear regression to fit α2, α3, and α4 on a month-by-month, grid-by-grid basis using all available measurements and Eq. (1). Values of α2, α3, and α4 are filtered based on statistics (p < 0.1); removal of outliers, defined as being more than three scaled median absolute deviations from the median8; and chemical realism (α2 < 0 or α4 < 0). Bootstrapping is applied to create a new sample representing the parent sample distribution through multiple repetitions, herein generating pseudo α2, α3, and α4 across the central 80% of their probability distributions, filling gaps where no a priori data exists. The resulting mean is presented as the daily emission, and the standard deviation is calculated as the uncertainty of this daily emission.

Sensitivity and uncertainty analysis

Emission uncertainties arise from combined uncertainties in satellite data, a priori emissions, and model development73. The regression uncertainties range are computed to range from 32%–73% (95% confidence interval) via Eq. (1), which is still lower than traditional bottom-up inventories74. In addition to the previously introduced uncertainties in computed emissions being a function of various uncertainties in model parameters and configuration using true observations, this work now performs a set of sensitivity tests, in which the uncertainty in the observations is included. This is done to analyze the robustness of the various computed emissions, consistent with how the most robust approaches of inverse modeling are done such as Kalman Filter48,75 and 3D or 4D variational methods76,77. To ensure that the results are explainable and that the process does not take too much computational power, the tests are done using values of uncertainty that are near the upper bounds of what the community currently considers reasonable78,79 following an approach used by refs. 71,72. This work specifically tests each of four input observations: TROPOMI CO column loadings (uncertainty range of ±30%), TROPOMI HCHO column loadings (uncertainty range of ±40%), a priori emissions of CO (uncertainty range of ±30%), and the vertical level of the reanalysis wind (selected from 750 hPa, 800 hPa, 850 hPa, and 900 hPa). EICO was recalculated over 16 different perturbation cases following adjustments to individual combinations of input observations or assumptions. The newly obtained EICO were then compared against the original EICO. To assess the robustness of MFIEF to the perturbations, differential analysis was conducted between the changes in emissions and the corresponding changes in parameters.