Abstract
Ranking sectors and countries within global value chains is of paramount importance to estimate risks and forecast growth in large economies. However, this task is often non-trivial due to the lack of complete and accurate information on the flows of money and goods between sectors and countries, which are encoded in input–output (I–O) tables. In this work, we show that an accurate estimation of the role played by sectors and countries in supply chain networks can be achieved without full knowledge of the I–O tables, but only relying on local and aggregate information, e.g., the total intermediate demand per sector. Our method, based on a rank-1 approximation to the I–O table, shows consistently good performance in reconstructing rankings (i.e., upstreamness and downstreamness measures for countries and sectors) when tested on empirical data from the world input–output database. Moreover, we connect the accuracy of our approximate framework with the spectral properties of the I–O tables, which ordinarily exhibit relatively large spectral gaps. Our approach provides a fast and analytically tractable framework to rank constituents of a complex economy without the need of matrix inversions and the knowledge of finer intersectorial details.
Similar content being viewed by others
Introduction
The introduction of input–output (I–O) analysis as a fundamental tool to analyze the inter-relationship between economic sectors of a country was pioneered by W. Leontief, who proposed the construction of the first I–O tables for the United States for the years 1919 and 19291,2. An I–O table summarizes how the products (outputs) of a given industry or economic sector are used as input to other industries or sectors within the same, or different, economies (for instance, in the case of import/export exchanges with other countries)3. Understanding the structure and relevance of industrial sectors and countries within the so-called global value chains (GVCs), encompassing the different stages of the production process across different countries, is of central importance4. To achieve this, a number of indicators and measures have been devised that characterize the relative positioning of industries and economic sectors in the economy. These rely on the calculation of the following technical object,
the so-called Leontief inverse (or resolvent) matrix. Here, \(\mathbbm {1}_N\) is the \(N\times N\) identity matrix (where N is the number of industrial sectors) and A is a (row) sub-stochastic matrix, which is related in a simple fashion to the original I–O table. A sub-stochastic matrix A is such that its entries are non-negative and \(\sum _{j}A_{ij}\le 1\) for each row i. Notably, the upstreamness and downstreamness metrics proposed by Antrás, Chor and collaborators (see Sect. 2 for mathematical definitions) have become widely used and mainstream in recent years5,6,7. They are meant to represent the average distance of a sector from final demand, and from primary factors of production, respectively. One of the main practical challenges of the I–O analysis lies in the accurate and reliable compilation of inter-sectorial I–O tables from which the matrix A in formula (1) is derived. This issue is particularly felt at firm-level, where often only aggregate information is available8.
The main contribution of our paper is to show that up-/downstreamness measures and similar resolvent-like metrics can be approximated with high accuracy even when possessing only aggregate and local information about the inter-sectorial dependencies encoded within the I–O table. In this case, the required information only amounts to the row (or column) sums of the matrix A, representing the total intermediate demand per industry (or the total value of all inputs required by each industry).
More specifically, we propose an approach rooted in complexity science that reconstructs the most likely matrix A derived from I–O tables on the basis of limited/aggregated information and uses this surrogate information to compute the Leontief inverse and related indicators (e.g., upstreamness and downstreamness). These indicators can be derived from the aggregate information available in a fast—as this procedure does not require to perform a full matrix inversion– and accurate way. Moreover, in this work we connect the accuracy of our approximate framework with the spectral properties of the I–O tables.
Related literature
There is a vast literature concerning I–O models and how inaccuracies and noise in I–O tables may affect the determination of the relative ranking of industrial sectors and countries within the economy. One strand focuses on the accuracy of the empirical I–O matrix denoted by \(A_{emp}\) with respect to the true matrix \(A_{true}\). The main question is about how errors occurring in the compilation of the I–O tables propagate and affect measurements and predictions based on nonlinear functions of \(A_{emp}=A_{true}+H\) (for instance, the Leontief Inverse \((\mathbbm {1}_N-A_{emp})^{-1}\)), where H encodes the stochastic sources of error. Compiling the entries of the matrix \(A_{emp}\) is subject to many issues, for instance the difficulty in sampling and surveying firms and flows of goods with great accuracy9,10. This has provided the motivation to study stochastic models for the I–O analysis.
Evans11 and Quandt12 are among the first to look at this problem by constructing random models. Evans11 assumed that the error matrix H had only one non-zero row and that the errors could be propagated on a row-by-row basis. Quandt12 assumed that the errors \(H_{ij}\) on the matrix elements are independent and normally distributed with mean zero, solved the error propagation problem for a small-size system (e.g. \(2\times 2\)), and determined the confidence intervals on the expected Leontief Inverse. Later, Simonovits13 deduced the fundamental inequality \(\langle (\mathbbm {1}_N-A_{emp})^{-1}\rangle _H\ge (\mathbbm {1}_N-\langle A_{emp}\rangle _H)^{-1}\), where the average is taken with respect to independent matrix elements of H. This inequality circumvents the problem of inverting the matrix \(\mathbbm {1}_N-A_{emp}\), where the non-linearity involved in the Leontief matrix inversion makes it challenging to study how modifications (or inaccurate determinations) of the entries of the matrix \(A_{emp}\) would propagate.
One of the first comprehensive theoretical studies of stochastic I–O models is due to West14. His starting point is a random matrix H, of which the expected value and the standard error of all the elements are known, with the aim to provide approximating formulas for the expected value and the standard errors of the Leontief Inverse in terms of these known quantities. Some of the assumptions (for instance, that the errors \(H_{ij}\) be independent and normally distributed) are however not realistic or plainly incompatible with the sub-stochasticity constraint, and only lead to a closed-form solution for the mean and variances of the deviations from the “true” matrix under very restrictive choices for the variances of the errors in H.
More recently, this approach has been re-evaluated by Kogelschatz15—who assumed that the \(a_{ij}\) are Beta-distributed and derived estimates for the elements of the Leontief Inverse—and Kozicka16—who postulated more realistic distribution for the matrix entries, but provided explicit formulae only for small-size systems.
Within the empirical literature, a number of studies have been also undertaken to characterize the regional inter-sectorial dependence of industries and to discuss the challenges of reconstructing regional data from national accounts and surveys17.
Given the practical difficulties associated with compiling I–O tables, especially at the regional level, earlier scholars devised “shortcut” methods to estimate the Leontief inverse from incomplete or unreliable information, or even foregoing I–O tables altogether. Katz and Burford18,19 derived a formula under the assumption that the matrix A is uniformly drawn from the set of sub-stochastic matrices, and under the rather questionable technical condition that the covariance between the entries of the matrix and the output multipliers be null. Their work hinges on an earlier formula empirically derived by Drake20. The general approach based on finding “shortcuts” and foregoing a painstaking compilation of I–O tables was criticized on both technical and conceptual grounds21,22,23,24 before this line of investigation was dropped and even ignored altogether in the subsequent related literature.
The Leontief inverse and the associated indicators have also been looked at through the prism of complexity and network science. Cerina et al.25 analyzed the properties of the (global and regional) network of industries in different economies reconstructing the monetary goods flows (edges) using the I–O matrix. McNerney et al.26 used average national output multipliers to predict future economic growth and price changes. In27, a model for the propagation and amplification of idiosyncratic shocks along the I–O network is provided. In28, a network analysis of the World I–O Data set is undertaken to analyze the temporal interdependence between countries and industrial sectors.
In recent years the interest in I–O models has grown steadily29, also in view of a rather compelling connection to models of complexity and networks28,30. Moreover, many of these ideas can in principle be extended to more general sector-product spaces, which saw many uses for the study of the connection between complexity measures, productivity and economic growth31,32,33,34 (see however35,36 for mathematical issues surrounding the Economic Complexity Index and resolutions thereof).
Another strand of the literature looks at entropic measures of inter-sectorial complexity. Jacquemin and Berry37 introduce an entropy-based measure of corporate diversification, highlighting its additivity across different levels of product or industry aggregation. This metric is shown to better capture nuanced diversification patterns compared to alternatives like the Herfindahl index, particularly when assessing contributions of diversification within and across industry sectors. Their empirical analysis of 460 large U.S. manufacturing corporations demonstrates that diversification into closely related industries, as well as more distant sectors, correlates positively with corporate growth, emphasizing the utility of entropy measures for understanding diversification’s role in economic dynamics. The study38 explores the dynamics of economic growth through a model of export evolution derived from global trade network data. It links economic complexity to the diversity and specialization of national export baskets by employing stochastic differential equations to simulate resource transfer between exports. The authors introduce a novel complexity measure based on Shannon entropy, integrated with specialization metrics, and demonstrate its alignment with GDP per capita and growth trajectories across 223 countries over 21 years. This framework unveils the interplay of cooperative and competitive forces in trade, offering insights into growth potentials via counterfactual analyses. The subsequent work39 expands upon this by refining economic complexity measures using an iterative, entropy-based methodology. Their approach captures the diversity and ubiquity of exports within a bipartite network of countries and products, employing Shannon entropy to estimate the bare diversity of products and sectors. The study introduces intra- and inter-sectorial decomposition, providing nuanced assessments of economic efficiency and specialization. The results highlight the advantages of retaining full trade data granularity and demonstrate the utility of these measures in distinguishing national economic structures and developmental pathways. In the following section, we will focus on the works by Antràs and Chor4, Fally et al.6 and Miller et al.7, where different incarnations of the so-called upstreamness and downstreamness measures have been first proposed. An early example of a direct application of those measures for the analysis of empirical data on global value chains can be found in40, now used in multiple contexts41,42.
Definition of upstreamness and downstreamness
Antràs et al.4 considered a closed economy of N industries. For each industrial sector \(i= 1, \dots , N\) we indicate the value of gross output with \(Y_i\) and the total intermediate demand (i.e., the use of the output of an industry as a final good) with \(F_i\). Then the following equality holds in I–O tables:
with \(Z_i = \sum _{j=1}^N d_{ij}Y_j\) corresponding to the output of industry i used as intermediate input to other industries (intermediate demand) as shown in the scheme in Fig. 1. In Eq. (2), \(a_{ij}\) is the total value in monetary units (e.g. US dollars) of i’s output used to produce j’s output, while \(\{d_{ij}\}\) in Eq. (3) corresponds to the monetary amount of sector i’s output used to produce one monetary unit’s worth of sector j’s output, and it is related to the matrix A via the relationship \(d_{ij}Y_j = a_{ij}\). The final demand, as detailed in Sect. 4, comprises contributions from different factors including, among others, the final consumption expenditure by households and government, and exports.
Iterating the identity Eq. (2) within Eq. (3), one obtains an infinite sequence of contributions, each representing the use of sector i’s output at different levels within the value chain3
We can finally rewrite Eq. (4) as follows
using \(\sum _{k\ge 0} D^k=[\mathbbm {1}_N-D]^{-1}\). In this case, \(\mathbbm {1}_N\) is the \(N\times N\) identity matrix, \(D=(d_{ij})\) contains each sector’s output in dollar values, and \(\varvec{F}\) is the vector of final demands. Antràs et al.4 hence proposed the following measure of upstreamness of the i-th industrial sector
where each term contributing to Eq. (4) is weighted by their distance from final use and divided by the output of the sector \(Y_i\). The notation \((\cdot )_i\) is used to indicated the i-th component of the vector. By construction, the terms of the sum that are further upstream in the value chain carry larger weight in the calculation of the upstreamness. Inserting Eq. (4) in Eq. (6), we can rewrite the upstreamness as
where
and \(Y =\textrm{diag}(Y_1,\dots ,Y_N)\). The vector \({\varvec{1}}_N\) is a column vector of N ones. The matrix \(A_U\) has non-negative elements, and in this convention it is row-substochastic, i.e., \(\sum _{j}(A_U)_{ij}\le 1 \ \forall i\). By construction \(U_{1i}\ge 1\), and it is precisely equal to 1 if no output of industry i is used as input to other industries, but it is only used to satisfy the final demand.
Later, Antràs et al.5 also established an equivalence between their upstreamness measure and a measure—defined in a recursive fashion—of the “distance” of an industry from the final demand proposed independently by Fally et al.6. Fally’s upstreamness \(U_2\) is defined as follows:
The idea is that \(\varvec{U}_2\) aggregates information on the extent to which a sector in a given country produces goods that are sold directly to final consumers, or that are sold to other sectors that themselves mainly sell to final consumers. Sectors selling a large share of their output to relatively upstream industries should be therefore considered to be more upstream themselves. Using the fact that \(d_{ij}Y_j = a_{ij}\) we obtain
where \(A_U\) is defined in Eq. (8) as presented in5.
On the input side, there exists an analogous accounting identity stating that sector i’s total input \(Y_i\) is equal to the value of its primary inputs (the so-called value added) \(V_i\) plus its intermediate input purchased from all other sectors, namely
and
Similarly to Antràs et al. (cf. Eq. (6)), Miller and Temurshoev7 introduced the so-called downstreamness, measuring the “average distance between suppliers of primary inputs and sectors as input purchaser along the input demand supply chain” as follows:
As before, using Eq. (12), we obtain
with
The matrix \(A_D\) has non-negative elements, and it is row-substochastic, i.e., \(\sum _{j}(A_D)_{ij}\le 1 \ \forall i\). Finally, as in the upstreamness case, also for the downstreamness, Fally6 introduced an analogous iterative definition of the form
which can be again mapped with simple manipulations onto Eq. (14) using \(Y_i d_{ji}=a_{ji}\).
Rank-1 approximation with local and aggregate information
In this section, we will discuss how to derive an approximation for the upstreamness and downstreamness metrics discussed in Sect. 2. Let us consider the resolvent \(G(A)=(\mathbbm {1}_N - A)^{-1}\), where the matrix A stands for \(A_U\) or \(A_D\) as defined in the previous section. Therefore, A has non-negative entries and is sub-stochastic. Recall that the vectors of upstreamness and downstreamness are defined as \({\varvec{U}}_1 = G(A_U){\varvec{1}}_N\) and \({\varvec{D}}_1= G(A_D){\varvec{1}}_N\), respectively (cf. Eq. (10), (14)). We are going now to assume that a detailed and accurate knowledge of all the entries of A is not available. The only available aggregate information is given by the 2N constants \(\varvec{r}=(r_1,\ldots ,r_N)\) and \(\varvec{c}=(c_1,\ldots ,c_N)\), namely the sums of the N rows and columns of A. This corresponds to knowing only the total intermediate demand per industry and the total value of all inputs required by each industry respectively. In the following we will analyse the single (row-sum only) and double (row- and column-sum) constraint cases. For the single constraint case, the knowledge of row sums of the I-O matrix (total intermediate demand of the associated sector) and of the vector of final demands is sufficient to infer the row sums of the matrix \(A_U\). Similarly, the knowledge of column sums of the I-O matrix (total inputs of the associated sector) and of the vector of value added is sufficient to infer the row sums of the matrix \(A_D\). For the double constraints case, the knowledge of row and column sums of the I-O matrix and of the vector of final demands/values added is not sufficient to infer the rows and column sums of either matrix \(A_U\) or \(A_D\), however this level of knowledge can be approximately achieved by positing that \(Y_i\approx \bar{Y}\), where \(\bar{Y}\) is the average of the \(Y_i\). In the following, we will assume that the row/column sums (single constraint) or row and column sums (double constraints) of the matrices \(A_U\) and \(A_D\) are known or retrievable from the corresponding row/column sums of the original I-O matrix. This lack of detailed information is actually quite common in supply chain and intrafirm network analysis8, which in turn leads to the need for inference and reconstruction methods to fill the gaps.
A simple rank-1 approximation \({\hat{A}}\) for the matrix A is
where the entries of the column vectors \(\varvec{g} = (g_1,\ldots ,g_N)\) and \(\varvec{q}=(q_1,\ldots ,q_N)\) are determined imposing the constraint that A and \({\hat{A}}\) share the same row and column sums
This yields eventually the unique matrix
with \(m=\frac{1}{N} \sum _{ij} A_{ij}=\frac{1}{N}\sum _j c_j=\frac{1}{N}\sum _i r_i\). The rank-1 matrix \({\hat{A}}\) in (20) is the so-called Maximum Entropy reconstructed matrix (see e.g.45,46) subject to the row and column constraints in (18) and (19) (see also47,48,49,50,51 for related works).
If the only information we have is about row sums, then the corresponding rank-1 approximation is even simpler
Clearly, \({\hat{A}}\) has a single non-zero, real and positive eigenvalue \(\lambda _1=\frac{1}{mN}\sum _j r_j c_j\) (or \(\lambda _1=\frac{1}{N}\sum _j r_j\) in the case of only-row constraints) due to the Perron-Frobenius theorem, and \(N-1\) zero eigenvalues, therefore we may expect that this approximation will work better the larger the spectral gap (or equivalently the smaller the spectral radius in the bulk) of the original matrix A is52,53. The spectral gap is defined as \(\Gamma =\lambda _1-\Xi\), with \(\lambda _1\) real and \(<1\) being the Perron-Frobenius eigenvalue. The spectral radius is \(\Xi =\max \{|\lambda _2|,\ldots ,|\lambda _{N-1}|\}\). The empirical I–O matrices \(A_U,A_D\) typically show a large spectral gap, suggesting that the rank-1 approximation described in this section should be very effective.
As the empirical I–O matrices \(A_U,A_D\) are rather small (\(N=35\)), it is more informative to look at their spectral radius. In Sect. 5, we perform a thorough analysis of the spectra of the I–O matrices at the country level, and we study how the accuracy of our rank-1 formula is related to the spectral radius. We indeed find that there is a clear negative correlation between the two, i.e. the error made using our approximation increases with \(\Xi\). This said, even in the worst cases, the relative errors remain fairly negligible, and the formulae work very well across the entire dataset.
Employing this rank-1 approximation, we can now evaluate the approximate resolvent
using the Sherman-Morrison formula54 for the inverse of a rank-1 matrix, from which it follows that the upstreamness and downstreamness of the i-th industry are respectively approximated by
where \(r_i,c_i\) and \({\tilde{r}}_i,{\tilde{c}}_i\) represent respectively the sum of rows and columns of \(A_U\) and \(A_D\). If only the constraints on rows are imposed, the formulae above reduce to
The approximate formulae above show that, within our rank-1 approximation, the upstreamness (downstreamness) of sector i is fully determined by the interplay of (i) local and aggregate information, namely of the total intermediate demand per sector (and/or the total value of all inputs required by a each sector), and (ii) a suitable average of the total intermediate demand (and/or the total value of all inputs) across all sectors in the economy.
In spite of the seemingly drastic approximation, which neglects a significant amount of finer intersectorial details, we will show that the aggregate information featuring in our rank-1 formulae is sufficient to determine with high accuracy the relative positioning of countries and sectors within the global value chains.
In the next sections, we will then calculate upstreamness and downstreamness measures on I–O tables from the NIOT Dataset (see Sect. 4), comparing the results obtained via our approximation with the full calculation using the original formulae, namely Eq. (10) and (14).
Dataset
The empirical I–O matrices used for the experiments have been constructed using the 2013 release of the National I–O tables by the World I–O Database (WIOD)44. The NIOT dataset comprises 39 countries –representing a large fraction of the major world economies – over the years 1995–2011. The list of countries and their codes considered in our empirical analysis is presented in Table 1. The structure of the I–O table of each country is schematically shown in Fig. 1. The intermediate demand for each country is reported for \(N=35\) economic sectors in terms of the flow (in US million dollars) between sectors. The full list of economic sectors and their codes included in our analysis is summarized in Table 2. The final demand is characterized in terms of (i) final consumption expenditure by households, (ii) final consumption expenditure by non-profit organizations serving households (NPISH), (iii) final consumption expenditure by government, (iv) gross fixed capital formation, (v) changes in inventories and valuables and (vi) exports. In the dataset sometimes the change in Inventories and Valuables can be negative, and were assumed to contribute to imports. The entries \(a_{ij}\) of each row of the full I–O table are then normalized by the vector outputs \(Y_j\). The normalized intermediate demand sub-matrix is sub-stochastic and represents the matrix \(A_U\). The \(r_i\) used in the model are simply the sums over the rows of the matrix \(A_U\) [or equivalently if normalized by columns the matrix \(A_D\), respectively in Eqs. (8) and (15)] .
Results
In this section, we compare our approximate formulae for downstreamness and upstreamness with single [Eqs. (25) and (26) respectively] and double contraints [Eqs. (23) and (24) respectively] with the measures obtained via direct inversion of the empirical I–O matrix [Eqs. (10) and (14) respectively].
Given the very weak temporal dependence of the empirical upstreamness and downstreamness measures as shown in Fig. 2 (consistent with previous analyses in40), in the following we will be able to aggregate together the analyses across all years in a robust way.
In Fig. 3 we plot the empirical average over all sectors (cyan squares) of the upstreamness for 39 countries (listed in Table 1) for all years (1995–2011) versus the approximate value with single (top panel) and double constraints (bottom panel), respectively obtained in Eqs. (25) and (23). We see that the empirical data (663 data points—39 countries \(\times\) 17 years) nicely collapse on top of the theoretical benchmark (blue dashed line). In the single constraint case, this implies that the average upstreamness coefficient for a country is determined with high accuracy by the knowledge of a single quantity \(\bar{z} = 1 - \frac{1}{N}\sum _j{r_j}\), corresponding to one minus the average total intermediate demand. We also show the upstreamness values for each sector in each country across the entire period (red full circles) constituting in total \(\sim 23k\) data points—35 sectors \(\times\) 39 countries \(\times\) 17 years. At the sector level, we observe a similar good agreement of the empirical exact upstreamness with the approximate values.
There are occasional deviations (including a systematic upward deviation for large values of the empirical downstreamness), whose origin can be traced back to a higher degree of heterogeneity in the A matrix with respect to the “flat” rank-1 model introduced in Eq. (21).
To identify the sectors that are typically less accurately captured by our approximation, we computed a simple indicator, \(\langle |\Delta _U^{\textrm{sect}}|\rangle\). This metric represents the average absolute difference between the empirical and approximated upstreamness values for each sector, aggregated across all years and all countries (see Fig. 4). The mining and agricultural sectors, among others, appear to exhibit greater heterogeneity in their input–output relationships with other sectors, as suggested by the higher differences values. This indicates that the structural differences in these sectors across countries may pose challenges for the accuracy of our approximation. Consequently, our method may perform less effectively for countries with economies that rely heavily on these sectors, as their heterogeneity is less well captured in the A-matrix approximation. In contrast, sectors such as housing, public administration, and education display lower values, suggesting more consistent and predictable input–output relationships, making them better suited for our approximation approach.
Empirical upstreamness versus approximated upstreamness. Cyan squares represent the upstreamness per country (39 countries) per year (11 year) averaged over 35 industrial sectors from the WIOD dataset (Release 2013). Red full circles represent the upstreamness for all industry sectors in all countries/all years. Top panel: Empirical upstreamness compared with single-constraint approximation in Eq. (25). Bottom panel: Empirical upstreamness compared with double-constraints approximation in Eq. (23).
We have calculated a similar metric for the upstreamness at country-level (see Fig. 5), \(\langle |\Delta _U^{\textrm{country}}|\rangle\), averaging absolute differences over the period 1995-2011. The countries consistently more divergent (with respect to our approximation) are Spain, Korea, Russia and China.
Empirical downstreamness versus approximated downstreamness. Light green squares represent the downstreamness per country (39 countries) per year (11 year) averaged over 35 industrial sectors from the WIOD dataset (Release 2013). Blue full circles represent the downstreamness of all industry sectors in all countries/all years. Top panel: Empirical upstreamness compared with single-constraint approximation in Eq. (26). Bottom panel: Empirical upstreamness compared with double-constraints approximation in Eq. (24).
In the following we will also analyze more closely the relation between the error—discrepancy between the actual values of upstreamness (and downstreamness) calculated via direct inversion and those obtained via our approximate formula—and the spectral properties of the empirical I–O matrix A.
In Fig. 6, we repeat a similar analysis for the downstreamness, comparing the values obtained via direct inversion (Eq. (14)) with the approximate values of downstreamness imposing the single or double constraint on the knowledge of row sums, or row and column sums, respectively. Also for this measure, we observe a good agreement between exact and approximate values, both at the sectors (red full circles) and at the aggregate country level (cyan squares).
To assess the accuracy of the approximations, we quantify the correlation between the empirical and approximate measures using Pearson and Spearman correlation coefficients as summarised in Table 3. The results show that the double-constraints approximation provides a visible improvement for countries, with correlations nearly perfect in both upstreamness and downstreamness measures. However, for sectors, the improvement is marginal, as the single-constraint approximation already achieves high correlations.
In the following, we analyze more closely the error made in the estimation of the upstreamness/downstreamness coefficients via our approximate formulae and link it to spectral properties of the underlying I–O matrix A. In particular, we define the following metric for assessing the error52
where \(\mathcal {R}_i\) represents either the upstreamness or the downstreamness values computed via direct inversion (\(\mathcal {R}_i^{(\textrm{emp})}\)) and via our approximate formula (\(\mathcal {R}_i^{(\textrm{approx})}\)) respectively. The average \(\langle \cdots \rangle\) is calculated over all sectors of a given country. Concerning the spectral properties, as shown in52,53 the accuracy of the approximation is related to the spectral gap of the matrix A. The matrix A has non-negative entries, therefore it has one real eigenvalue of largest magnitude \(\lambda _1\) (the Perron-Frobenius eigenvalue), and its spectral gap is defined as \(\Gamma =\lambda _1-\max \{|\lambda _2|,\ldots ,|\lambda _{N-1}|\}\). As the empirical I–O matrices are rather small (\(N=35\)) it is more informative to look at the spectral radius. We then introduce the spectral radius excluding the Perron-Frobenius \(\lambda _1\) as
This definition is consistent with the approach used in the case of Gaussian matrices perturbed with a rank-1 matrix that may force an outlier to split off from the circular bulk53,55. In Fig. 7, we display the error \(\sigma\) made on the approximation for all countries in all years as a function of the spectral radius \(\Xi\) of the \(A_U\) matrix characterizing each country in each year. As expected, the error grows with the spectral radius, as the rank-1 approximation becomes less accurate in reproducing the underlying intersectorial interactions. In Fig. 8, we show the same relationship labelling the countries for a single year (2011). In the bottom panel, we show the eigenvalue spectrum of two selected countries—namely China and Mexico—displaying respectively among the maximal and minimal errors in the estimation, to highlight spectral differences in the displacement of eigenvalues in the bulk.
In this analysis, we find a clear negative correlation between the accuracy of the estimation and the spectral radius, i.e., the error made using our approximation increases (equivalently the accuracy of the approximation decreases) with \(\Xi\). In general though, even in the worst cases, the relative errors remain fairly small (\(\sim 5-6\%\)), and the approximation works very well across the entire sample.
Upstreamness under aggregation
In this section, we briefly consider how our approximation performs after the I–O data matrix has been subject to aggregation (consolidation) of different industrial sectors. The effects of aggregation—i.e. the procedure by which the data are looked at and lumped together at different “granularity” level—have been considered in many works (see56 for a comprehensive review). Here we consider the axiomatic formulation of aggregation provided in57, which is summarized below. Furthermore, our treatment will be confined to the upstreamness, and the row-only rank-1 approximation, as generalizations to the other cases are straightforward.
Consider the definition of upstreamness given in Eq. (7)
To make contact with Ref.57, we rewrite (29) as
in terms of row vectors \({\varvec{U}_1}^T\) and \({\varvec{1}}_N^T\), and a column-substochastic \(N\times N\) matrix \(A_U^T\). The notation \([\ldots ]_N\) indicates that the vector has length N.
Let us assume that we wish to aggregate the N “micro” industrial sectors or commodities into a set of \(M<N\) “macro” sectors or commodities. Formally, we can define two matrices, S and T, of size \(M\times N\) and \(N\times M\) respectively. The \(\{0,1\}\) matrix S indicates which micro-sectors should be combined together: \(S_{ij}=1\) if micro-sector j is to be included in macro-sector i. Thus, S is a column stochastic matrix with exactly one 1 in every column, and at least one 1 in every row. The matrix T indicates the proportional weights of each micro-sector within its macro-aggregate. The element \(T_{ji}\in (0,1)\) represents the weight \(w_{ji}\) that micro-sector j carries within macro-sector i, and therefore is such that \(\sum _j T_{ji}=1\). It follows that T is also column stochastic.
Forming the aggregate \(M\times M\) matrix \(A_U^\prime =SA_U^T T\) is the most common way used in the literature to create a smaller sub-stochastic matrix from the original matrix \(A_U\), which retains (at a coarser level of detail) some of the information about industrial sectors and commodities provided by \(A_U\). Although other choices of aggregation are possible, it was proven in57 that the aggregator \(A_U^\prime\) is the only one that satisfies three natural axioms of linearity, value added neutrality, and partitioning, therefore in the following we will confine ourselves to this case (the so called standard aggregator). It follows from the definition of S and T that \(ST=\mathbbm {1}_M\) and TS is a column stochastic, idempotent matrix of rank M (see57 for a proof).
Although in principle any non-negative column-stochastic matrix could play the role of T, in practice it makes most sense to define it as
where \(\varvec{w}\) is a vector of N non-negative numbers, and \(\textrm{diag}(\varvec{w})\) is the diagonal matrix having the vector entries on the diagonal (in their natural order). According to Charnes and Cooper, “The main justification for this mode of consolidation is that it conforms to the way data would be synthesized ab initio if SAT rather than A were the objective”58. To better understand how standard aggregation works, consider as an example a \(6\times 6\) matrix \(A_U^T\) (whose elements we denote \(\alpha _{ij}\) for simplicity, so \(\alpha _{ij} = a_{ji}/Y_j\)). Let
and \(\varvec{w} = (w_1,w_2,w_3,w_4,w_5,w_6)\). Then
and the aggregator becomes
Now, let us assume that the vector of N upstreamness values in Eq. (30) can be faithfully approximated by our formula in Eq. (25), which can be written as
where \(\varvec{r}\) is the (column) vector of row sums of the matrix \(A_U\) (or the column sums of \(A_U^T\), \(r_j = \sum _{i=1}^N \alpha _{ij}\)), and \({\bar{r}}_N\) is their average. Let us further assume that the original data matrix \(A_U\) is not known in its entirety (only its row sums are known), but the sectors/commodities in \(A_U\) have been aggregated using a known pair of matrices S, T—in other words, we are aware of what sectors/commodities have been lumped together (and with which relative weights) and what their aggregate outputs are, but we do not have more detailed information. We ask whether the knowledge of \(\varvec{r}, S\) and T is sufficient to determine \([\hat{\varvec{U}_1}^T]_M\), namely a faithful approximation for the M upstreamness values of the aggregate model. The answer is affirmative.
First, define
the vector of M upstreamness values, obtained using the aggregate matrix \(A_U^\prime\) as a source. The Leontief matrix on the r.h.s. of (36) is equal to the aggregate of the Leontief matrix of the so called companion matrix \({\bar{A}}_U= A_U^T TS\)57, namely
The proof follows by expanding \([\mathbbm {1}_M-SA_U^T T]^{-1}=\mathbbm {1}_M +SA_U^T T+(SA_U^T T)^2+\ldots\), and using \((SA_U^T T)^n=S(A_U^T TS)^nT\) and \(TST=T\).
Imagine now that the true matrix \(A_U^T\) appearing on the l.h.s. of (37) is replaced by its best rank-1 approximation, given by \({\hat{A}}^T\) (see Eq. (21)). From the fact that the rank of the product of two matrices (\({\hat{A}}\) and TS) is smaller or equal than the smallest rank of the two factors, and that TS is rank-M (and of course none of the matrices involved is a null matrix), it is easy to deduce that in this case the companion matrix will also be rank-1. Applying Sherman-Morrison on the r.h.s. of (37), we get
where we used \(S\mathbbm {1}_N T=ST=\mathbbm {1}_M\), and
Eq. (38) shows how to construct a faithful rank-1 approximation for the upstreamness of the aggregate model starting from the knowledge of row sums of the original model, as well as of the matrices T and S implementing the aggregation.
Summary and outlook
In this paper, we have shown that the upstreamness and downstreamness measures introduced in the context of I–O analysis at both the inter-sectorial and country level can be faithfully recovered from the knowledge of aggregate and local information about the I–O table. In other words, the precise determination of the elements of the I–O matrix does not matter much, as long as their distribution does not deviate significantly from the “homogeneous” (flat) model (described in Eq. (21)), and the total intermediate demand per sector is ordinarily sufficient to provide an accurate estimate of the sector’s multipliers.
Our rank-1 approximation has been successfully tested on National I–O tables obtained from WIOD, where an excellent correlation is obtained between the empirical multipliers and the theoretical formulae (see Figs. 3 and 6). Small deviations from this remarkably robust regularity are readily attributed to stronger heterogeneity in the empirical sectorial data, which would require refinements to the (single or doubly constrained) rank-1 approximation presented here.
Indeed, sparser or more heterogeneous I–O matrices tend to have a larger spectral radius (or equivalently a smaller spectral gap), as demonstrated in Figs. 7 and 8. The quality of our rank-1 approximation is very high across the sectors and countries considered, but may be inferior for emprical matrices with larger spectral radii – as more eigenvalues besides the largest (Perron-Frobenius) start to play an important role.
In Section 6, we have also shown how our rank-1 approximation is well-behaved with respect to aggregation of sectorial data: knowing what sectors/commodities are lumped together, and what their aggregate outputs are, is sufficient to determine a faithful approximation for the upstreamness values of the aggregate model, as the rank-1 nature of the approximation is preserved upon aggregation.
In a recent paper59, we further employ the rank-1 approximation as a proxy to investigate the “puzzling” correlations observed between upstreamness and downstreamness at aggregate level40. More generally, our approach based on a rank-1 approximation demonstrates that local and aggregate information about I–O tables is ordinarily sufficient to determine the upstreamness and downstreamness at sectorial and country level with high accuracy, while at the same time providing analytically tractable formulae (Eq. (14), (7)) that avoid matrix inversions altogether. The rank-1 formulae prove also useful to approximate centrality values of nodes in complex networks52,60. As an outlook for future research, it will be interesting to test the accuracy of our formulae on firm-level data, where data availability and sparsity are greater concerns. In spite of the sparser nature of the data, we would expect our approximation to work well, as recently shown on experiments conducted on synthetic data52.
Data availability
The datasets analysed during the current study are publicly available at https://www.rug.nl/ggdc/valuechain/wiod/wiod-2013-release?lang=en. The codes written for the analysis will be made available upon request to the corresponding author.
References
Leontief, W. Input–Output Economics (Oxford University Press, Oxford, 1986).
Leontief, W. Quantitative input–output relations in the economic system of the United States. Rev. Econ. Stat. 18, 105–125 (1936).
United Nations Department for Economic and Social Affairs Statistics Division. Handbook of input–output Table Compilation and Analysis (1999).
Antràs, P. & Chor, D. Organizing the global value chain. Econometrica 81(6), 2127–2204 (2012).
Antràs, P., Chor, D., Fally, T. & Hillberry, R. Measuring the upstreamness of production and trade flows. Am. Econ. Rev. Pap. Proc. 102(3), 412–416 (2012).
Fally, T. Production Staging: Measurement and Facts (unpublished). Available at: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8ff103e6b2573a063bdfcac61ef73550b79467c7 (2012).
Miller, R. E. & Temurshoev, U. Output upstreamness and input downstreamness of industries/countries in world production. Int. Reg. Sci. Rev. 40(5), 443–475 (2017).
Bacilieri, A. & Austudillo-Estevez P. Reconstructing firm-level input–output networks from partial information. ArXiv preprint arXiv:2304.00081 (2023).
Kop Jansen, P. S. M. Analysis of multipliers in stochastic input–output models. Reg. Sci. Urban Econ. 24, 55–74 (1994).
Kop Jansen, P. & Ten Raa, T. The choice of model in the construction of input–output coefficients matrices. Int. Econ. Rev. 31(1), 213–227 (1990).
Evans, W. D. The effect of structural matrix errors on interindustry relations estimates. Econometrica 22(4), 461–480 (1954).
Quandt, R. E. Probabilistic errors in the Leontief system. Naval Res. Logist. Q. 5, 155–170 (1958).
Simonovits, A. A note on the underestimation and overestimation of the leontief Inverse. Econometrica 43, 493–498 (1975).
West, G. R. A stochastic analysis of an input–output model. Econometrica 54(2), 363–374 (1986).
Kogelschatz, H. On the Solution of Stochastic input–output-Models. Discussion Paper Series n. 447, University of Heidelberg (2007).
Kozicka, M. Novel approach to stochastic input–output modeling. RAIRO-Oper. Res. 53, 1155–1169. https://doi.org/10.1051/ro/2018046 (2019).
Sargento, A.L. Introducing input–output analysis at the regional level: Basic notions and specific issues. The Regional Economics Application Laboratory (REAL) https://api.semanticscholar.org/CorpusID:158457048 (2009).
Katz, J. L. & Burford, R. L. Shortcut formulas for output, income and employment multipliers. Ann. Reg. Sci. 19(2), 61–76 (1985).
Burford, R. L. Regional input–output multipliers without a full IO table. Ann. Reg. Sci. 11(3), 21–38 (1977).
Drake, R. L. A short-cut to estimates of regional input–output multipliers: Methodology and evaluation. Int. Reg. Sci. Rev. 1(2), 1–17 (1976).
Phibbs, P. J. & Holsman, A. J. An evaluation of the Burford Katz short cut technique for deriving input–output multipliers. Ann. Reg. Sci. 15(3), 11–19 (1981).
Jensen, R. C. & Hewings, G. J. D. Shortcut ‘input–output’ multipliers: The resurrection problem (a reply). Environ. Plan A 17(11), 1551–1552 (1985).
Jensen, R. C. & Hewings, G. J. D. Shortcut ‘input–output’ multipliers: A requiem. Environ. Plan A 17(6), 747–759 (1985).
Burford, R. L. & Katz, J. L. Shortcut ‘input–output’ multipliers, alive and well: Response to Jensen and Hewings. Environ. Plan A 17(11), 1541–1549 (1985).
Cerina, F., Zhu, Z., Chessa, A. & Riccaboni, M. World input–output network. PLoS ONE 10(7), e0134025. https://doi.org/10.1371/journal.pone.0134025 (2015).
McNerney, J., Savoie, C., Caravelli, F. & Farmer J. D. How production networks amplify economic growth. PNAS119(1), e2106031118 (2021) (2018).
Moran, J. & Bouchaud, J.-P. May’s instability in large economies. Phys. Rev. E 100, 032307 (2019).
del Rio-Chanona, R. M., Grujić, J. & Jensen, H. J. Trends of the World input and output network of global trade. PLoS ONE 12(1), e0170817. https://doi.org/10.1371/journal.pone.0170817 (2017).
Carvalho V. M. input–output networks: A survey. A report for the European Commission under the CRISIS consortium agreement. https://cordis.europa.eu/docs/projects/cnect/1/288501/080/deliverables/001-CRISISD31InputOutput.pdf (2012).
Acemoglu, D., Carvalho, V., Ozdaglar, A. & Tahbaz-Salehi, A. The network origins of aggregate fluctuations. Econometrica 80, 1977–2016 (2012).
Hidalgo, C., Bailey, K., Barabási, A.-L. & Hausmann, R. The product space conditions the development of nations. Science 317, 482–487 (2007).
Hidalgo, C. & Hausmann, R. The building blocks of economic complexity. PNAS 106, 10570–10575 (2009).
Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L. A new metrics for countries fitness and products complexity. Sci. Rep. 2, 723 (2012).
Caldarelli, G. et al. A network analysis of countries’ export flows: Firm grounds for the building blocks of the economy. PLoS ONE 7(10), e47278. https://doi.org/10.1371/journal.pone.0047278 (2012).
Morrison, G. et al. On economic complexity and the fitness of nations. Sci. Rep. 7(1), 15332 (2017).
Servedio, V. D. P. et al. A new and stable estimation method of country economic fitness and product complexity. Entropy 20(10), 783 (2018).
Jacquemin, A. P. & Berry, C. H. Entropy measure of diversification and corporate growth. J. Ind. Econ. XXVI I, 359 (1979).
Teza, G. et al. Growth dynamics and complexity of national economies in the global trade network. Sci. Rep. 8, 15230 (2018).
Teza, G., Caraglio, M. & Stella, A. L. Entropic measure unveils country competitiveness and product specialization in the World trade web. Sci. Rep. 11(1), 10189 (2021).
Antràs, P. & Chor D. On the measurement of upstreamness and downstreamness in global value chains. In Working Paper 24185. http://www.nber.org/papers/w24185 (2018).
López, L. A., Arce, G. & Osorio, P. Foreign multinationals affiliates and countries’ carbon upstreamness. How could these firms support the fulfilment of emissions reduction targets?. J. Environ. Manage. 326, 116714 (2023).
Caraiani, P., Dutescu, A., Hoinaru, R. & Stănilă, G. O. Production network structure and the impact of the monetary policy shocks: Evidence from the OECD. Econ. Lett. 193, 109271 (2020).
Suganuma, K. Upstreamness in the global value chain: Manufacturing and services. Monetary Econ. Stud. 34, p. 39-66. https://EconPapers.repec.org/RePEc:ime:imemes:v:34:y:2016:p:39-66 (2016).
Timmer, M. P., Dietzenbacher, E., Los, B., Stehrer, R. & de Vries, G. J. An illustrated user guide to the world input–output database: The case of global automotive production. Rev. Int. Econ. 23, 575–605 (2015).
Squartini, T., Caldarelli, G., Cimini, G., Gabrielli, A. & Garlaschelli, D. Reconstruction methods for networks: The case of economic and financial systems. Phys. Rep. 757, 1–47 (2018).
Cimini, G., Squartini, T., Garlaschelli, D. & Gabrielli, A. Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 5, 15758 (2015).
Bianconi, G. Mean field solution of the Ising model on a Barabási-Albert network. Phys. Lett. A 303, 166–168 (2002).
Dorogovtsev, S. N., Goltsev, A. V. & Mendes, J. F. F. Critical phenomena in complex networks. Rev. Mod. Phys. 80, 1275 (2008).
Park, J. & Newman, M. E. J. The statistical mechanics of networks. Phys. Rev. E 70, 066117 (2004).
Caldarelli, G., Capocci, A., De Los Rios, P. & Muñoz, M. A. Scale-free networks from varying vertex intrinsic fitness. Phys. Rev. Lett. 89, 258702 (2002).
Thibeault, V., Allard, A. & Desrosiers, P. The low-rank hypothesis of complex systems. Nat. Phys. 20, 294–302.https://doi.org/10.1038/s41567-023-02303-0 (2024).
Bartolucci, S., Caccioli, F., Caravelli, F. & Vivo, P. Ranking influential nodes in networks from aggregate local information. Phys. Rev. Res. 5, 033123 (2023). https://doi.org/10.1103/PhysRevResearch.5.033123.
Bartolucci, S., Caccioli, F., Caravelli, F. & Vivo, P. “Spectrally gapped’’ random walks on networks: A mean first passage time formula. SciPost Phys. 11(5), 088. https://doi.org/10.21468/SciPostPhys.11.5.088 (2021).
Sherman, J. & Morrison, W. J. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann. Math. Stat. 21(1), 124–127 (1950).
Mosam, F., Vidaurre, D. & De Giuli, E. Breakdown of random matrix universality in Markov models. Phys. Rev. E 104(2), 024305 (2021).
Kymn, K. O. Aggregation in input–output models: A comprehensive review, 1946–71. Econ. Syst. Res. 2(1), 65–93 (1990).
Howe, E. C. & Johnson, C. R. Linear aggregation of input–output models. SIAM J. Matrix Anal. Appl. 10(1), 65–79 (1989).
Charnes, A. & Cooper, W. W. Management Models and Industrial Applications of Linear Programming Vol. I (Wiley, New York, 1961).
Bartolucci, S., Caccioli, F., Caravelli, F. & Vivo P. Correlation between upstreamness and downstreamness in random global value chains. arXiv preprint arXiv:2303.06603 (2023).
Bartolucci, S., Caccioli, F., Caravelli, F., & Vivo, P. Distribution of centrality measures on undirected random networks via the cavity method. Proc. Natl. Acad. Sci. 121(40), e2403682121. https://doi.org/10.1073/pnas.2403682121 (2024).
Acknowledgements
We gratefully acknowledge insightful conversations with J. D. Farmer, F. Lafond, L. P. Garcia-Pinto and J. McNerney. The work of F. Caravelli was carried out under the auspices of the NNSA of the U.S. DoE at LANL under Contract No. DE-AC52-06NA25396. F. Caravelli was also financed via DOE LDRD grant 20240245ER. P.V. acknowledges support from UKRI Future Leaders Fellowship Scheme (No. MR/X023028/1).
Author information
Authors and Affiliations
Contributions
All authors conceived the idea, performed the numerical tests, wrote and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bartolucci, S., Caccioli, F., Caravelli, F. et al. Upstreamness and downstreamness in input–output analysis from local and aggregate information. Sci Rep 15, 2727 (2025). https://doi.org/10.1038/s41598-025-86380-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-86380-6