Cumulative emissions
This example will walk through calculating and visulaizing cumulative emissions.
[1]:
from itertools import cycle
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
from openclimate import Client
import numpy as np
import pandas as pd
We will first initialize a Client()
object.
[2]:
client = Client()
If you are using a jupyter enviornment, you will need to first client.jupyter
. This patches the asyncio
library to work in Jupyter envionrments using nest-asyncio.
[3]:
client.jupyter
Get country codes
OpenClimate references each country by its two-letter ISO-3166 code. To access this in openclimate
we can use the .parts()
method to get all the “parts” of EARTH. Other codes we use are UN/LOCODEs for cities and LEI for companies. As a catch-all term, we call them an
actor_id
.
[4]:
df_country = client.parts('EARTH')
Looking at the dataframe that’s returned, we have a column with each country’s actor_id
.
[5]:
df_country.head()
[5]:
actor_id | name | type | has_data | has_children | children_have_data | |
---|---|---|---|---|---|---|
5 | AD | Andorra | country | True | None | None |
234 | AE | United Arab Emirates | country | True | None | None |
0 | AF | Afghanistan | country | True | None | None |
9 | AG | Antigua and Barbuda | country | True | None | None |
7 | AI | Anguilla | country | True | None | None |
Let’s save just the actor_id
to a list
[6]:
iso_and_name = list(zip(df_country['actor_id'], df_country['name']))
Which datasets are available?
To get a list of datasets available for an actor you can use the .emissions_datasets()
method. Here I am asking for datasets with Candian emissions.
[7]:
client.emissions_datasets('CA')
[7]:
actor_id | datasource_id | name | publisher | published | URL | |
---|---|---|---|---|---|---|
0 | CA | BP:statistical_review_june2022 | Statistical Review of World Energy all data, 1... | BP | 2022-06-01T00:00:00.000Z | https://www.bp.com/en/global/corporate/energy-... |
1 | CA | EDGARv7.0:ghg | Emissions Database for Global Atmospheric Rese... | JRC | 2022-01-01T00:00:00.000Z | https://edgar.jrc.ec.europa.eu/dataset_ghg70 |
2 | CA | GCB2022:national_fossil_emissions:v1.0 | Data supplement to the Global Carbon Budget 20... | GCP | 2022-11-04T00:00:00.000Z | https://www.icos-cp.eu/science-and-impact/glob... |
3 | CA | PRIMAP:10.5281/zenodo.7179775:v2.4 | PRIMAP-hist_v2.4_no_extrap (scenario=HISTCR) | PRIMAP | 2022-10-17T00:00:00.000Z | https://zenodo.org/record/7179775 |
4 | CA | UNFCCC:GHG_ANNEX1:2019-11-08 | UNFCCC GHG total without LULUCF, ANNEX I count... | UNFCCC | 2019-11-08T00:00:00.000Z | https://di.unfccc.int/time_series |
5 | CA | climateTRACE:country_inventory | climate TRACE: country inventory | climate TRACE | 2022-12-02T00:00:00.000Z | https://climatetrace.org/inventory |
6 | CA | WRI:climate_watch_historical_ghg:2022 | Climate Watch Historical GHG Emissions | WRI | 2022-01-01T00:00:00.000Z | https://www.climatewatchdata.org/ghg-emissions |
7 | CA | IEA:GHG_energy_highlights:2022 | Greenhouse Gas Emissions from Energy Highlights | IEA | 2022-09-01T00:00:00.000Z | https://www.iea.org/data-and-statistics/data-p... |
You can return datasets for multiple actors at once by passing them as a callable, such as a list or tuple. Here I am asking for Canadian and Italian emission datasets, but only returning a sample of 5 records.
[8]:
client.emissions_datasets(['CA', 'IT']).sample(5)
[8]:
actor_id | datasource_id | name | publisher | published | URL | |
---|---|---|---|---|---|---|
7 | CA | IEA:GHG_energy_highlights:2022 | Greenhouse Gas Emissions from Energy Highlights | IEA | 2022-09-01T00:00:00.000Z | https://www.iea.org/data-and-statistics/data-p... |
4 | CA | UNFCCC:GHG_ANNEX1:2019-11-08 | UNFCCC GHG total without LULUCF, ANNEX I count... | UNFCCC | 2019-11-08T00:00:00.000Z | https://di.unfccc.int/time_series |
10 | IT | GCB2022:national_fossil_emissions:v1.0 | Data supplement to the Global Carbon Budget 20... | GCP | 2022-11-04T00:00:00.000Z | https://www.icos-cp.eu/science-and-impact/glob... |
16 | IT | openGHGmap:R2021A | European OpenGHGMap | NTNU | 2021-01-01T00:00:00.000Z | https://openghgmap.net/data/ |
14 | IT | climateTRACE:country_inventory | climate TRACE: country inventory | climate TRACE | 2022-12-02T00:00:00.000Z | https://climatetrace.org/inventory |
Get emissions
If we just pass an actor_id
to the .emissions()
method, all the emissions will be returned.
[9]:
df_tmp = client.emissions(actor_id='US')
df_tmp.head()
[9]:
actor_id | year | total_emissions | datasource_id | |
---|---|---|---|---|
0 | US | 1990 | 5275397531 | BP:statistical_review_june2022 |
1 | US | 1991 | 5225911642 | BP:statistical_review_june2022 |
2 | US | 1992 | 5308410257 | BP:statistical_review_june2022 |
3 | US | 1993 | 5412149078 | BP:statistical_review_june2022 |
4 | US | 1994 | 5505379237 | BP:statistical_review_june2022 |
Keep in mind that this will return all the data for that actor. Below are the datasets available.
[10]:
set(df_tmp['datasource_id'])
[10]:
{'BP:statistical_review_june2022',
'EDGARv7.0:ghg',
'GCB2022:national_fossil_emissions:v1.0',
'IEA:GHG_energy_highlights:2022',
'PRIMAP:10.5281/zenodo.7179775:v2.4',
'UNFCCC:GHG_ANNEX1:2019-11-08',
'WRI:climate_watch_historical_ghg:2022',
'carbon_monitor:2022_12_14',
'climateTRACE:country_inventory'}
In most cases, we want to filter this and use a particular dataset. We can do that with the datasource_id
parameter.
[11]:
df_tmp = client.emissions(actor_id='US', datasource_id='PRIMAP:10.5281/zenodo.7179775:v2.4')
As a sanity check, let’s look at which datasets are returned
[12]:
set(df_tmp['datasource_id'])
[12]:
{'PRIMAP:10.5281/zenodo.7179775:v2.4'}
As you see, only PRIMAP was returned.
Get emissions for all countries
Now let’s get emissions for all countries
[13]:
%%time
iso_codes = [iso_code[0] for iso_code in iso_and_name]
df_emissions = client.emissions(
actor_id=iso_codes,
datasource_id='PRIMAP:10.5281/zenodo.7179775:v2.4'
)
CPU times: user 5.52 s, sys: 289 ms, total: 5.81 s
Wall time: 20.3 s
This takes about 30 seconds to retrieve all that data, even with asyncio
working behind the scenes. This outputs a massive dataframe with the data from all countries concatenated together
[14]:
df_emissions.sample(5)
[14]:
actor_id | year | total_emissions | datasource_id | |
---|---|---|---|---|
492 | BG | 2015 | 62400000 | PRIMAP:10.5281/zenodo.7179775:v2.4 |
215 | GT | 1832 | 549000 | PRIMAP:10.5281/zenodo.7179775:v2.4 |
117 | GN | 1751 | 1050000 | PRIMAP:10.5281/zenodo.7179775:v2.4 |
240 | PW | 1908 | 20300 | PRIMAP:10.5281/zenodo.7179775:v2.4 |
384 | IL | 1907 | 406000 | PRIMAP:10.5281/zenodo.7179775:v2.4 |
Calculate cumulative emissions
let’s first make sure all the datasets have the same starting year
[15]:
all([df_emissions.loc[df_emissions['actor_id']==iso_code, 'year'].min() for iso_code in set(df_emissions['actor_id'])])
[15]:
True
Now we can calculate cumulative emissions
[16]:
df_out = df_emissions.assign(cumulative_emissions = df_emissions.groupby('actor_id')['total_emissions'].cumsum())
Now we have a column for cumulative emissions
[17]:
df_out.head()
[17]:
actor_id | year | total_emissions | datasource_id | cumulative_emissions | |
---|---|---|---|---|---|
32 | AD | 1750 | 3740 | PRIMAP:10.5281/zenodo.7179775:v2.4 | 3740 |
33 | AD | 1751 | 3750 | PRIMAP:10.5281/zenodo.7179775:v2.4 | 7490 |
34 | AD | 1752 | 3760 | PRIMAP:10.5281/zenodo.7179775:v2.4 | 11250 |
35 | AD | 1753 | 3770 | PRIMAP:10.5281/zenodo.7179775:v2.4 | 15020 |
36 | AD | 1754 | 3780 | PRIMAP:10.5281/zenodo.7179775:v2.4 | 18800 |
Rank country by cumulative emissions
Now that we now the cumulative emission, we can rank the countries by the cumulative emissions in the most recent year.
[18]:
last_year = df_out['year'].max()
df_sorted = (
df_out.loc[df_out['year'] == last_year, ['actor_id', 'cumulative_emissions', 'year']]
.sort_values(by='cumulative_emissions', ascending=False)
)
df_sorted['rank'] = df_sorted['cumulative_emissions'].rank(ascending=False)
Here are the top 10 cumulative emitters
[19]:
pd.merge(df_sorted.loc[df_sorted['rank'] <= 10], df_country[['actor_id', 'name']], on='actor_id')
[19]:
actor_id | cumulative_emissions | year | rank | name | |
---|---|---|---|---|---|
0 | US | 561240060000 | 2021 | 1.0 | United States of America |
1 | CN | 375048000000 | 2021 | 2.0 | China |
2 | RU | 179731600000 | 2021 | 3.0 | Russian Federation |
3 | IN | 132717000000 | 2021 | 4.0 | India |
4 | DE | 117760000000 | 2021 | 5.0 | Germany |
5 | GB | 104375500000 | 2021 | 6.0 | United Kingdom of Great Britain and Northern I... |
6 | JP | 78204570000 | 2021 | 7.0 | Japan |
7 | FR | 64192400000 | 2021 | 8.0 | France |
8 | UA | 52563900000 | 2021 | 9.0 | Ukraine |
9 | BR | 47231630000 | 2021 | 10.0 | Brazil |
The United States and China are the top two emitters, with the U.S. emitting about 50% more emissions than China over the period from 1750 to 2021.
[20]:
561240060000 / 375048000000
[20]:
1.4964486145773341
Plot cumulative emissions
Now that we know the top emitters, we can plot a time series
[21]:
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111)
# top 8 emitters
top_emitters = list(df_sorted.head(8).actor_id)
# wong color palette (https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40)
colors = ['#000000', '#E69F00', '#56B4E9', '#009E73', '#F0E442', '#0072B2', '#D55E00', '#CC79A7']
for actor_id, color in zip(top_emitters, cycle(colors)):
actor_name = df_country.loc[df_country['actor_id'] == actor_id, 'name'].values[0]
filt = df_out['actor_id'] == actor_id
df_tmp = df_out.loc[filt]
ax.plot(np.array(df_tmp['year']), np.array(df_tmp['cumulative_emissions']) / 10**9,
linewidth=4,
label = actor_name,
color=color)
ylim = [0, 600]
ax.set_ylim(ylim)
ax.set_xlim([1850, 2022])
# Turn off the display of all ticks.
ax.tick_params(which='both', # Options for both major and minor ticks
top='off', # turn off top ticks
left='off', # turn off left ticks
right='off', # turn off right ticks
bottom='off') # turn off bottom ticks
# Remove x tick marks
plt.setp(ax.get_xticklabels(), rotation=0)
# Hide the right and top spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
# Only show ticks on the left and bottom spines
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
# major/minor tick lines
ax.xaxis.set_minor_locator(AutoMinorLocator(5))
ax.grid(axis='y',
which='major',
color=[0.8, 0.8, 0.8], linestyle='-')
ax.set_ylabel("Cumulative Emissions (GtCO$_2$e)", fontsize=12)
ax.legend(loc='upper left', frameon=False)