Challenges of Energy Statistics

Vlad Shkolyar

Independent Energy Consultant

(London, Ontario, Canada)


Challenges of energy statistics

Discussion paper by Vlad Shkolyar[1]

This version: April 20, 2012

1. Issues with energy statistics

A bystander looking on the field of energy statistics from outside may think that budgetary cuts for statistical agencies, expected across the western world, constitute its only area of concern. Such belief seems natural as energy data users are more often overwhelmed with the abundance of numerical information rather than upset with its paucity. As usual, the reality is more complex than it appears. Energy statistics does face numerous challenges that energy users may be unaware. Given the complexity of this story, it is expedient, first, to explain the situation along three dimensions common to statistical space (aggregation, linkages, and time frequency) and, second, to present certain ideas how energy statistics can be improved.

Usually, data users operate with aggregated numbers, such as GDP or energy consumption, that come from underlying micro data and that reporting agencies transform according to established methodologies. Micro data are collected by agencies located upstream of the statistical chain who deal directly with establishments either because they monitor their compliance with public regulations or because they manage a network of voluntary contributors to a statistical program. Agencies may decide to share their data aggregates with downstream data providers such as the International Energy Agency (IEA), Joint Organisations Data Initiative (JODI), or the United Nations Statistical Division (UNSD) who are in the downstream of data chain. At this level of aggregation, reported data lose their link to original sources and become compilations. The role of international agencies is twofold. They provide convenient platforms for data dissemination and conduct quality checks for underlying data coming from different sources. Inconsistencies are ubiquitous and some of them are observable in published sources. For example, the UNSD shows the following importers’ and exporters’ numbers on crude oil trade measured in metric tons for 2010:


Table 1

Crude Oil Trade for 2010

(thousands of metric tons)

Source: Comtrade (UNSD)


Linkages among industries trading one with another (or linked in commodity chains) are observable through the System of National Accounts (SNA). The role played by SNA is immense. It is the sole source of information on GDP per industry and for modeling the propagation of shocks across the economy. The input-output (I/O) tables constitute the key SNA component. They map the flow of products along the chain in a single matrix. Filling I/O table cells is a challenging statistical enterprise. It requires detailed information on inter-industrial transactions that are often hard to find. Gaps in energy reporting are observable in I/O tables when they are compared with other sources. For example, the Canadian I/O tables contain numbers on consumption of natural gas in dollar terms and physical units are inconsistent. If they were consistent, the ratios of values to quantities, which stand for average prices paid by different industries, would be comparable across consumers, see Table 2.


Table 2

Value and quantity of natural gas input for different Canadian industries for 2005.

Sources: Statistics Canada, Table 381-0009 ‘Inputs and outputs, by industry and commodity’ and

Report 57-003-X ‘Energy Supply and Demand’, Table 1-1.


Time series expose their own deficiencies even though this time-honored type of statistical observations seems to be the most error-prone. For example, Figure 1 shows that monthly injection of natural gas in Canadian storage facilities has remained about the same since 1985 but its withdrawal experienced two significant increases in 1994 and 2001.


Figure 1

Monthly Injection and Withdrawal of Natural Gas in Canada

Source: Statistics Canada, Table 131-0001 ‘Supply and disposition of natural gas monthly’.


The examples above show that energy users should be aware of the problem of coherence when referring to independent sources. The existence of differences does not indicate necessarily erroneous data collections but potential inconsistencies of data aggregation techniques. Statistical agencies may use different points for data measurement and the numbers they report are not directly comparable. For example, Figure 2 illustrates how data may become misinterpreted by observers who compare natural gas consumption along the natural gas commodity chain. Its volume changes due to removal of impurities, gas losses or gas consumption as fuel as the product moves downstream. The relevant question at this point regards the appropriate platform for organizing energy data in such a way that inconsistencies are flagged and resolved.



Figure 2

Natural gas commodity chain and points of data collection along the chain.


Another challenge, common to energy statistics, is that new policy issues are not addressed in timely manner. Statisticians are not usually known for their propensity to respond to new energy challenges. Usually, they wait for external requests on new data collection to arrive assuming that data users know better their data needs than they do. In theory, this approach makes sense as users of energy data work closely with issues that they plan to resolve with the help of data. In reality, the process of new data development is a two-way road. Data users may be as much constrained in their choice of issues with available data as statisticians are bound with the list of issues that data users ask them to describe numerically. Consequently, every side blames the other for the lack of policy guidance or the dearth of necessary statistical information. This force of inertia is difficult to overcome when stakeholders in energy statistics do not come with methods for identification of new issues and reconciliation of existing datasets.


2. A conceptual framework of energy statistics

Unfortunately, there seems to be no theory to guide the process of addressing the gaps in energy statistics and, hence, stakeholders in energy statistics have to agree on a set of concepts that provide guidance about how a plan of actions should look like. The paper suggests accepting the following principles:

◦                The definition of energy statistical issues comes from policy debates;

◦                Energy statistics follows modular approach to data integration; and

◦                Optimal energy statistics involve minimal cost per unit of information and clear identification of who bears their costs.

Consider, first, the state of policy debates. A casual look on energy publications reveals three recurring topics. It is common for analysts to refer to numbers on GDP, employment, or international energy trade making a case for or against a particular energy project that can be construction of a new pipeline of development of a new petroleum play. In this case, data users require information on energy commodity chain from upstream down to allow finding numbers on direct and indirect impacts. The latter extend along two dimensions. The geographic or industrial dimension embraces effects that a project makes on jurisdictions or on industries who are stakeholders in the project. The second dimension involves comparing actual situation with forecasted situation or with a desired outcome.

The next topic relates to the determination of optimal energy balance that depends, among other things, on the identification of qualitative social preferences. Data users may express concerns over various issues. For example, they can discuss energy security or problems with trade deficit. These revealed preferences indicate to statisticians the “digitization” of new items they are to initiate.

The final theme has specificity that justifies its inclusion as a separate category although it overlaps with the previous two. Data users reassess constantly the list of traditional energy numbers checking if new items of social importance appear. For example, monitoring carbon dioxide emissions by industrial installations was uncommon a few years ago but growing concern over global climate change has made it a staple of energy statistics nowadays. Debates over items to be included in energy reporting are a controversial topic as proponents and opponents of an energy project may have different perception of their social costs (e.g. environmental costs).

Establishing data priorities constitutes necessary but insufficient condition for defining a statistical system. It has to respond quickly to new policy challenges and to indicate to policy makers omissions in their reasoning. The system must be internally consistent and provide support a single option among opposite policy choices. The System of National Accounts (SNA) is the arrangement that satisfy these conditions. It follows the principle of modularity that is each of its datasets constitutes a separate module linked with the rest of datasets through appropriate statistical methodologies. A grouping of data forms an input-output (I/O) table, which is the key SNA component. They are built on the accounting principle of double entry that precludes the situation when incoherence among interconnected data goes unnoticed. Datasets enter I/O tables in aggregated form but if their totals are inconsistent, the tables become imbalanced indicating discrepancies.

I/O tables cover only market operations and, to some extent, public activities presentable as quasi-market operations with their costs being proxies for their revenue. Still, the same principle of matrix coverage applies to SNA satellite accounts that represent energy flows in physical units and to any quantifiable socio-environmental effects that energy flows have on producers and consumers. Increase in SNA complexity is beneficial for data quality as data expressed in physical units are confronted with I/O data expressed in dollar terms. Their ratio is analogous to market price and if it is unreasonable, inconsistency is suspected.

As new policy issues appear, it is possible that existing data will not address them properly. However, being a modular structure the SNA allows introducing interfaces in advance of data collection. Thus, as soon as a new issue appears, e.g. it becomes important to measure the amount of carbon dioxide; statisticians can immediately plug in new data to the existing system.

Minimization of data collection costs goes hand in hand with the identification of who pays their cost. In the first respect, administrative data (e.g., tax data or regulatory data) are the cheapest source of energy information that is often underused because of miscommunication among public agencies. As a result, statistical surveys may duplicate information already collected by regulatory bodies. Apart from duplication, surveys end up frequently with datasets of lower quality due to the fact that the response rate is lower and less accurate for statistical reporting. Proper identification of funds for data programs presents another problem. Assuming that statistics is subordinate to social preferences, it follows that statistics can be mandatory only for activities that the law requires to monitor. Consequently, proponents of a new data program that is not supported with the law (carbon dioxide comes again to mind) can rely only on voluntary data reporting.

This paper has been written to attract attention of energy data users to statistical problems and to invite them to share ideas about how the problems can be resolved either at the pages of this publication or during the forthcoming 2012 USAEE North American Conference.


[1] Author is a PhD economist working as an independent energy consultant in London, Canada; email:

Click to view a printable version of this article.