Cumulative emissions

This example will walk through calculating and visulaizing cumulative emissions.

[1]:
from itertools import cycle
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
from openclimate import Client
import numpy as np
import pandas as pd

We will first initialize a Client() object.

[2]:
client = Client()

If you are using a jupyter enviornment, you will need to first client.jupyter. This patches the asyncio library to work in Jupyter envionrments using nest-asyncio.

[3]:
client.jupyter

Get country codes

OpenClimate references each country by its two-letter ISO-3166 code. To access this in openclimate we can use the .parts() method to get all the “parts” of EARTH. Other codes we use are UN/LOCODEs for cities and LEI for companies. As a catch-all term, we call them an actor_id.

[4]:
df_country = client.parts('EARTH')

Looking at the dataframe that’s returned, we have a column with each country’s actor_id.

[5]:
df_country.head()
[5]:
actor_id name type has_data has_children children_have_data
5 AD Andorra country True None None
234 AE United Arab Emirates country True None None
0 AF Afghanistan country True None None
9 AG Antigua and Barbuda country True None None
7 AI Anguilla country True None None

Let’s save just the actor_id to a list

[6]:
iso_and_name = list(zip(df_country['actor_id'], df_country['name']))

Which datasets are available?

To get a list of datasets available for an actor you can use the .emissions_datasets() method. Here I am asking for datasets with Candian emissions.

[7]:
client.emissions_datasets('CA')
[7]:
actor_id datasource_id name publisher published URL
0 CA BP:statistical_review_june2022 Statistical Review of World Energy all data, 1... BP 2022-06-01T00:00:00.000Z https://www.bp.com/en/global/corporate/energy-...
1 CA EDGARv7.0:ghg Emissions Database for Global Atmospheric Rese... JRC 2022-01-01T00:00:00.000Z https://edgar.jrc.ec.europa.eu/dataset_ghg70
2 CA GCB2022:national_fossil_emissions:v1.0 Data supplement to the Global Carbon Budget 20... GCP 2022-11-04T00:00:00.000Z https://www.icos-cp.eu/science-and-impact/glob...
3 CA PRIMAP:10.5281/zenodo.7179775:v2.4 PRIMAP-hist_v2.4_no_extrap (scenario=HISTCR) PRIMAP 2022-10-17T00:00:00.000Z https://zenodo.org/record/7179775
4 CA UNFCCC:GHG_ANNEX1:2019-11-08 UNFCCC GHG total without LULUCF, ANNEX I count... UNFCCC 2019-11-08T00:00:00.000Z https://di.unfccc.int/time_series
5 CA climateTRACE:country_inventory climate TRACE: country inventory climate TRACE 2022-12-02T00:00:00.000Z https://climatetrace.org/inventory
6 CA WRI:climate_watch_historical_ghg:2022 Climate Watch Historical GHG Emissions WRI 2022-01-01T00:00:00.000Z https://www.climatewatchdata.org/ghg-emissions
7 CA IEA:GHG_energy_highlights:2022 Greenhouse Gas Emissions from Energy Highlights IEA 2022-09-01T00:00:00.000Z https://www.iea.org/data-and-statistics/data-p...

You can return datasets for multiple actors at once by passing them as a callable, such as a list or tuple. Here I am asking for Canadian and Italian emission datasets, but only returning a sample of 5 records.

[8]:
client.emissions_datasets(['CA', 'IT']).sample(5)
[8]:
actor_id datasource_id name publisher published URL
7 CA IEA:GHG_energy_highlights:2022 Greenhouse Gas Emissions from Energy Highlights IEA 2022-09-01T00:00:00.000Z https://www.iea.org/data-and-statistics/data-p...
4 CA UNFCCC:GHG_ANNEX1:2019-11-08 UNFCCC GHG total without LULUCF, ANNEX I count... UNFCCC 2019-11-08T00:00:00.000Z https://di.unfccc.int/time_series
10 IT GCB2022:national_fossil_emissions:v1.0 Data supplement to the Global Carbon Budget 20... GCP 2022-11-04T00:00:00.000Z https://www.icos-cp.eu/science-and-impact/glob...
16 IT openGHGmap:R2021A European OpenGHGMap NTNU 2021-01-01T00:00:00.000Z https://openghgmap.net/data/
14 IT climateTRACE:country_inventory climate TRACE: country inventory climate TRACE 2022-12-02T00:00:00.000Z https://climatetrace.org/inventory

Get emissions

If we just pass an actor_id to the .emissions() method, all the emissions will be returned.

[9]:
df_tmp = client.emissions(actor_id='US')
df_tmp.head()
[9]:
actor_id year total_emissions datasource_id
0 US 1990 5275397531 BP:statistical_review_june2022
1 US 1991 5225911642 BP:statistical_review_june2022
2 US 1992 5308410257 BP:statistical_review_june2022
3 US 1993 5412149078 BP:statistical_review_june2022
4 US 1994 5505379237 BP:statistical_review_june2022

Keep in mind that this will return all the data for that actor. Below are the datasets available.

[10]:
set(df_tmp['datasource_id'])
[10]:
{'BP:statistical_review_june2022',
 'EDGARv7.0:ghg',
 'GCB2022:national_fossil_emissions:v1.0',
 'IEA:GHG_energy_highlights:2022',
 'PRIMAP:10.5281/zenodo.7179775:v2.4',
 'UNFCCC:GHG_ANNEX1:2019-11-08',
 'WRI:climate_watch_historical_ghg:2022',
 'carbon_monitor:2022_12_14',
 'climateTRACE:country_inventory'}

In most cases, we want to filter this and use a particular dataset. We can do that with the datasource_id parameter.

[11]:
df_tmp = client.emissions(actor_id='US', datasource_id='PRIMAP:10.5281/zenodo.7179775:v2.4')

As a sanity check, let’s look at which datasets are returned

[12]:
set(df_tmp['datasource_id'])
[12]:
{'PRIMAP:10.5281/zenodo.7179775:v2.4'}

As you see, only PRIMAP was returned.

Get emissions for all countries

Now let’s get emissions for all countries

[13]:
%%time
iso_codes = [iso_code[0] for iso_code in iso_and_name]
df_emissions = client.emissions(
    actor_id=iso_codes,
    datasource_id='PRIMAP:10.5281/zenodo.7179775:v2.4'
)
CPU times: user 5.52 s, sys: 289 ms, total: 5.81 s
Wall time: 20.3 s

This takes about 30 seconds to retrieve all that data, even with asyncio working behind the scenes. This outputs a massive dataframe with the data from all countries concatenated together

[14]:
df_emissions.sample(5)
[14]:
actor_id year total_emissions datasource_id
492 BG 2015 62400000 PRIMAP:10.5281/zenodo.7179775:v2.4
215 GT 1832 549000 PRIMAP:10.5281/zenodo.7179775:v2.4
117 GN 1751 1050000 PRIMAP:10.5281/zenodo.7179775:v2.4
240 PW 1908 20300 PRIMAP:10.5281/zenodo.7179775:v2.4
384 IL 1907 406000 PRIMAP:10.5281/zenodo.7179775:v2.4

Calculate cumulative emissions

let’s first make sure all the datasets have the same starting year

[15]:
all([df_emissions.loc[df_emissions['actor_id']==iso_code, 'year'].min() for iso_code in set(df_emissions['actor_id'])])
[15]:
True

Now we can calculate cumulative emissions

[16]:
df_out = df_emissions.assign(cumulative_emissions = df_emissions.groupby('actor_id')['total_emissions'].cumsum())

Now we have a column for cumulative emissions

[17]:
df_out.head()
[17]:
actor_id year total_emissions datasource_id cumulative_emissions
32 AD 1750 3740 PRIMAP:10.5281/zenodo.7179775:v2.4 3740
33 AD 1751 3750 PRIMAP:10.5281/zenodo.7179775:v2.4 7490
34 AD 1752 3760 PRIMAP:10.5281/zenodo.7179775:v2.4 11250
35 AD 1753 3770 PRIMAP:10.5281/zenodo.7179775:v2.4 15020
36 AD 1754 3780 PRIMAP:10.5281/zenodo.7179775:v2.4 18800

Rank country by cumulative emissions

Now that we now the cumulative emission, we can rank the countries by the cumulative emissions in the most recent year.

[18]:
last_year = df_out['year'].max()
df_sorted = (
    df_out.loc[df_out['year'] == last_year, ['actor_id', 'cumulative_emissions', 'year']]
    .sort_values(by='cumulative_emissions', ascending=False)
)

df_sorted['rank'] = df_sorted['cumulative_emissions'].rank(ascending=False)

Here are the top 10 cumulative emitters

[19]:
pd.merge(df_sorted.loc[df_sorted['rank'] <= 10], df_country[['actor_id', 'name']], on='actor_id')
[19]:
actor_id cumulative_emissions year rank name
0 US 561240060000 2021 1.0 United States of America
1 CN 375048000000 2021 2.0 China
2 RU 179731600000 2021 3.0 Russian Federation
3 IN 132717000000 2021 4.0 India
4 DE 117760000000 2021 5.0 Germany
5 GB 104375500000 2021 6.0 United Kingdom of Great Britain and Northern I...
6 JP 78204570000 2021 7.0 Japan
7 FR 64192400000 2021 8.0 France
8 UA 52563900000 2021 9.0 Ukraine
9 BR 47231630000 2021 10.0 Brazil

The United States and China are the top two emitters, with the U.S. emitting about 50% more emissions than China over the period from 1750 to 2021.

[20]:
561240060000 / 375048000000
[20]:
1.4964486145773341

Plot cumulative emissions

Now that we know the top emitters, we can plot a time series

[21]:
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111)

# top 8 emitters
top_emitters = list(df_sorted.head(8).actor_id)

# wong color palette (https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40)
colors = ['#000000', '#E69F00', '#56B4E9', '#009E73', '#F0E442', '#0072B2', '#D55E00', '#CC79A7']

for actor_id, color in zip(top_emitters, cycle(colors)):
    actor_name = df_country.loc[df_country['actor_id'] == actor_id, 'name'].values[0]
    filt = df_out['actor_id'] == actor_id
    df_tmp = df_out.loc[filt]

    ax.plot(np.array(df_tmp['year']), np.array(df_tmp['cumulative_emissions']) / 10**9,
            linewidth=4,
            label = actor_name,
            color=color)

    ylim = [0, 600]
    ax.set_ylim(ylim)
    ax.set_xlim([1850, 2022])

    # Turn off the display of all ticks.
    ax.tick_params(which='both',     # Options for both major and minor ticks
                   top='off',        # turn off top ticks
                   left='off',       # turn off left ticks
                   right='off',      # turn off right ticks
                   bottom='off')     # turn off bottom ticks

    # Remove x tick marks
    plt.setp(ax.get_xticklabels(), rotation=0)

    # Hide the right and top spines
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)

    # Only show ticks on the left and bottom spines
    ax.yaxis.set_ticks_position('left')
    ax.xaxis.set_ticks_position('bottom')

    # major/minor tick lines
    ax.xaxis.set_minor_locator(AutoMinorLocator(5))
    ax.grid(axis='y',
            which='major',
            color=[0.8, 0.8, 0.8], linestyle='-')

    ax.set_ylabel("Cumulative Emissions (GtCO$_2$e)", fontsize=12)
    ax.legend(loc='upper left', frameon=False)
../_images/notebooks_cumulative_emissions_44_0.png