Skip to content

Capacity update process

Florian Scheidl edited this page May 23, 2024 · 12 revisions

Table of contents

Context

In an effort to increase the quality of the data published on our app or API, we have started a whole initiative to enable us to track outliers in the source data.

Over the years, we have noticed that real-time production/exchanges data published by the different data sources can be completely outside the distribution of the historical time-series. These data points are outliers and need to be detected before any data processing.

One way to perform outlier detection is to check that each incoming data point is not higher than the installed capacity for a given mode. The power output cannot be above the energy input as the efficiency of power plants is always below 100%. In other words, for a given zone and a given mode, the power production for that mode at any given time cannot exceed the installed capacity for that mode.

Example In 2023, the wind capacity in DK-DK1 was 5233 MW. The average wind production in the first 3 quarters of 2023 was 1455 MW.

The goal here is to validate each incoming production parser event by comparing each mode production against the installed capacity available in the zone configuration. If a mode production is higher than the installed capacity, the data point will be flagged as an outlier and will be corrected by our data pipelines.

To achieve this goal, we need robust and consistent capacity data. We also need to be able to capture the evolution of capacity data over time. As renewable capacity increases in most zones, this means that the power production will also increase.

Capacity sources

(back to top)

Of all electricity data available, capacity is probably the least consistent (e.g. different reporting standards, different update frequencies, accessibility). A review of available capacity data was done in order to manage the number of different data sources used for capacity and to ensure that the capacity data has been reviewed and has an overall consistent quality level.

The main organisations that published capacity data are:

  • EIA: The EIA publishes generator-level specific information about existing and planned generators of at least 1 megawatt of nameplate capacity.
  • EMBER: EMBER aggregates data from different sources:
    • IRENA for non-fossil generation,
    • Global Energy Monitor for coal and gas generation,
    • World Resource Institute, although this database is incomplete is can be used to verify information from the other sources.
  • ENTSO-e: Net generation capacity is published on an annual basis on the ENTSO-e Transparency platform. This will be the preferred data source for European zones as the capacity breakdown is more detailed
  • IRENA: For most countries and technologies, the data reflects the capacity installed and connected at the end of the calendar year. Data has been obtained from a variety of sources, including an IRENA questionnaire, official national statistics, industry association reports, other reports and news articles.

In the case of countries divided in subzones, capacity data is collected directly from the main data source. This is the case for Brasil, Australia or Spain for instance.

Capacity update process

(back to top)

There are two ways of updating capacity configuration files:

  • The zone has a capacity parser
  • The update must be done manually.

When should capacity data be updated?

(back to top)

Depending on the source, capacity data can be updated at a more or less regular frequency.

In the case of EMBER, IRENA and ENTSO-e, capacity data is updated once a year with data for the previous year. This update usually happens in the third quarter of the year (June to September of Y+1). The capacity for these zones should therefore be updated once per year.

The EIA updates their capacity dataset on a monthly basis so updates can happen every semester or every quarter.

We would like to update the capacity data for all zones once per year, around the 3rd quarter. This can be done more for capacity that are updated every month or quarter but it is not absolutely required.

Format of the capacity configuration

(back to top)

The capacity configuration should include the date from which the value is valid.

For a chosen mode, a data point needs to include the following fields:

  • value: the installed capacity for the chosen mode,
  • datetime: from this date forward, the value is considered to be the most up-to-date
  • source: the data source

This format will enable us to track the evolution of capacity across different zones over time such as the increase of renewables or phase out of fossil power plants.

Looking at the example of DK-DK1 mentioned above, the capacity configuration format would be the following:

capacity
├── wind
    ├── datetime: "2023-01-01"
    ├── source: "ENTSOE"
    └── value: 5233

Opening a PR

(back to top)

Before opening a PR to update capacity data, you should check the following:

  • Do not update all capacities at once! Smaller PRs will help us make sure that no error slips through the cracks. We recommend updated a few zones at once or by group of zones (EIA, ENTSOE, EMBER, IRENA etc.)
  • The new data points are consistent with the previous ones. Big breaks in trends are rare for capacity data. You should check whether the variation between two data points is realistic. We expect that renewable capacity will increase in the coming years and fossil capacity to decrease, so these are patterns to look out for.
  • Reference main changes in the PR description. If you spot a major change in values, please mention it and verify it. This will make the reviewer's job easier!

The zone capacity can be updated automatically

(back to top)

For some zones, we have developed capacity parsers which collect the data automatically.

The update of capacity configurations can be done in the contrib repo using poetry run update_capacity.

The update_capacity function has the following arguments:

Argument Description
--zone A specific zone (e.g. DK-DK1)
--source A group of zones. The capacity update will run for all the zones that have capacity from this data source. The groups of zones are: EIA, EMBER, ENTSOE, IRENA, ONS, OPENNEM, REE
--target_datetime Date for the capacity data (e.g. "2023-01-01")
--update_aggregate Boolean to update the aggregate zone (for instance DK should be updated if we change the capacity for DK-DK1). This value is set to False by default

Here is a list of examples:

poetry run capacity_update --zone DK-DK1 --target_datetime "2023-01-01 --update_aggregate True"
poetry run capacity_update --source EIA --target_datetime "2023-06-01"

Note: Remember to run

poetry install -E parsers

before you run the capacity update script

The following zones can be updated with a parser: EIA

  • US-CAL-BANC
  • US-CAL-CISO
  • US-CAL-IID
  • US-CAL-LDWP
  • US-CAL-TIDC
  • US-CAR-CPLE
  • US-CAR-CPLW
  • US-CAR-DUK
  • US-CAR-SC
  • US-CAR-SCEG
  • US-CAR-YAD
  • US-CENT-SPA
  • US-CENT-SWPP
  • US-FLA-FMPP
  • US-FLA-FPC
  • US-FLA-FPL
  • US-FLA-GVL
  • US-FLA-HST
  • US-FLA-JEA
  • US-FLA-SEC
  • US-FLA-TAL
  • US-FLA-TEC
  • US-MIDA-PJM
  • US-MIDW-AECI
  • US-MIDW-LGEE
  • US-MIDW-MISO
  • US-NE-ISNE
  • US-NW-AVA
  • US-NW-BPAT
  • US-NW-CHPD
  • US-NW-DOPD
  • US-NW-GCPD
  • US-NW-GRID
  • US-NW-GWA
  • US-NW-IPCO
  • US-NW-NEVP
  • US-NW-NWMT
  • US-NW-PACE
  • US-NW-PACW
  • US-NW-PGE
  • US-NW-PSCO
  • US-NW-PSEI
  • US-NW-SCL
  • US-NW-TPWR
  • US-NW-WACM
  • US-NW-WAUW
  • US-NW-WWA
  • US-NY-NYIS
  • US-SE-SEPA
  • US-SE-SOCO
  • US-SW-AZPS
  • US-SW-EPE
  • US-SW-GRIF
  • US-SW-PNM
  • US-SW-SRP
  • US-SW-TEPC
  • US-SW-WALC
  • US-TEN-TVA
  • US-TEX-ERCO

EMBER

  • AR
  • AW
  • BA
  • BD
  • BO
  • BY
  • CO
  • CR
  • CY
  • DO
  • GE
  • GT
  • HN
  • KR
  • KW
  • MD
  • MN
  • MT
  • MX
  • NG
  • PA
  • PE
  • RU
  • SG
  • SV
  • TH
  • TR
  • TW
  • UY
  • ZA

ENTSOE

  • AL
  • AT
  • BA
  • BE
  • BG
  • CZ
  • DE
  • DK-DK1
  • DK-DK2
  • EE
  • ES
  • FI
  • FR
  • GR
  • HR
  • HU
  • IE
  • LT
  • LU
  • LV
  • ME
  • MK
  • NL
  • NO-NO1
  • NO-NO2
  • NO-NO3
  • NO-NO4
  • NO-NO5
  • PL
  • PT
  • RO
  • RS
  • SI
  • SK
  • UA
  • XK

IRENA

  • GF
  • IL
  • IS
  • LK
  • NI
  • PF

OPENNEM

  • AU-NSW
  • AU-NT (only for solar generation)
  • AU-QLD
  • AU-SA
  • AU-TAS
  • AU-VIC
  • AU-WA

REE

  • ES
  • ES-CE
  • ES-CN-FVLZ
  • ES-CN-GC
  • ES-CN-HI
  • ES-CN-IG
  • ES-CN-LP
  • ES-CN-TE
  • ES-IB-FO
  • ES-IB-IZ
  • ES-IB-MA
  • ES-IB-ME
  • ES-ML

Note: For the Canary Islands and the Baleares Islands, only total installed capacity is available.

ONS

  • BR-CS
  • BR-N
  • BR-NE
  • BR-S Note: The capacity parser for Brasil only get connected solar capacity. Distributed solar capacity needs to be added manually and can be found here (ONS - Installed Capacity Dashboard). See below for more instructions.

Distributed solar generation is found on the ONS - Installed Capacity Dashboard.

  • In the tab Comparativo, update the start and end dates.
  • Select Geracao Distribuida in the dropdown Tipo de Usina and collect the latest data point for each zone.
  • Add these values to solar capacity in the zone configuration.  

The following other zones can also be updated automatically:

  • CA-ON
  • CA-QC
  • CL-SEN
  • GB
  • MY-WM

The zone capacity is updated manually

(back to top)

For the following zones, a capacity parser is not available. You will find the instructions to extract the latest capacity information below. Once the data is collected, the capacity configuration should be updated using the above mentioned format.

Each zone has a dedicated wiki page with guidelines.

Technical requirements for adding a new data source

(back to top)

If a new data source becomes available for a zone that does not have a capacity parser:

  • Verify the data source. Please refer to our wiki page Verify data sources. The data should come from an authoritative data source, the criteria are listed on the wiki page.
  • Update this document with the new data source. For maintainability and transparency reasons, the data should be easily accessible. This will enable another contributor to update the capacity breakdown in the future. You can create a new subsection in The zone capacity is updated manually
  • Add the guidelines to collect the data. This should also be done for maintainability and transparency reasons.

If the capacity for the zone in question is collected using a capacity parser:

  • Verify the data source.
  • Compare the new data with the existing data. As explained above, we want to limit the number of data sources used and wish to use sources for which a certain level of quality is implied.
  • Discuss with the Electricity Maps team. If the new data source is indeed of higher quality and meets all the requirements, feel free to ask the Electricity Maps team. We will find the best way forward otgether :)

You can create an issue on contrib if you find a new data source or if an existing link is broken.

Building a new capacity parser

(back to top)

If data can be parsed from an online source, you can build a parser to automatically get this data.

Here are the following steps to build a capacity parser:

  • Building the parser. The parser should include a fetch_production_capacity function.
def fetch_production_capacity(zone_key: ZoneKey, target_datetime: datetime, session: Session)-> Dict[str:Any]:
  capacity_dict = ....
  return capacity_dict
  • Update the zone configuration. Add the productionCapacity parser to the parsers items.
parsers:
  consumption: FR.fetch_consumption
  production: FR.fetch_production
  productionCapacity: ENTSOE.fetch_production_capacity
Clone this wiki locally