Rainfall anomalies are deviations of rainfall from long-run averages. They are useful for identifying wet and dry periods which can be linked to climatically influenced patterns such as flooding, river flows, and agricultural production.
In this real world example we will calculate rainfall anomalies for a selected African country using the CHIRPS monthly rainfall dataset. Standardised anomaly is calculated as:
\begin{equation} \text{Standardised anomaly }=\frac{x-m}{s} \end{equation}x is the seasonal mean, m is the long-term mean, and s is the long-term standard deviation.
This means we need a long-term reference period (m) and a period of interest (x) for which we'll calculate the anomalies. This notebook names datasets ds_rf_m
and ds_rf_x
accordingly.
The notebook outlines:
To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.
%matplotlib inline
import datacube
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datacube.utils.geometry import Geometry, CRS
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.dask import create_local_dask_cluster
Dask can be used to better manage memory use and conduct the analysis in parallel. For an introduction to using Dask with Digital Earth Africa, see the Dask notebook.
Note: We recommend opening the Dask processing window to view the different computations that are being executed; to do this, see the Dask dashboard in DE Africa section of the Dask notebook.
To use Dask, set up the local computing cluster using the cell below.
create_local_dask_cluster()
Client
|
Cluster
|
The following cell sets important parameters for the analysis:
country
: In this analysis, we'll select an African country to mask the dataset and analysis.time_m
: CHIRPS monthly rainfall is available from 1981. The long-term mean for rainfall anomalies is often calculated on a 30-year period, so we'll use 1981 to 2011 in this example.time_x
: This is the period for which we want to calculate anomalies.resolution
: We'll use 5,000 m, which is approximately equal to the default resolution shown above.dask_chunks
: the size of the dask chunks, dask breaks data into manageable chunks that can be easily stored in memory, e.g. dict(x=1000,y=1000)Standardised anomaly is calculated as:
\begin{equation} \text{Standardised anomaly }=\frac{x-m}{s} \end{equation}$x$ is the seasonal mean, $m$ is the long-term mean, and $s$ is the long-term standard deviation.
This means we need a long-term reference period (m) and a period of interest (x) for which we'll calculate the anomalies. This notebook names datasets ds_rf_m
and ds_rf_x
accordingly.
If running the notebook for the first time, keep the default settings below. This will demonstrate how the analysis works and provide meaningful results.
# Select a country, for the example we will use Kenya, a complete list of countries is available below.
country = "Kenya"
# Set the range of dates for the climatology, this will be the reference period (m) for the anomaly calculation.
# Standard practice is to use a 30 year period, so we've used 1981 to 2011 in this example.
time_m = ('1981', '2011')
# time period for monthly anomaly (x)
time_x = ('1981', '2021')
# CHIRPS has a spatial resolution of ~5x5 km
resolution = (-5000, 5000)
#size of dask chunks
dask_chunks = dict(x=500,y=500)
Connect to the datacube so we can access DE Africa data.
The app
parameter is a unique name for the analysis which is based on the notebook file name.
dc = datacube.Datacube(app='rainfall_anomaly')
This shapefile contains polygons for the boundaries of African countries and will allows us to calculate rainfall anomalies within a chosen country
african_countries = gpd.read_file('../Supplementary_data/Rainfall_anomaly_CHIRPS/african_countries.geojson')
african_countries.explore()
You can change the country in the analysis parameters cell to any African country. A complete list of countries is printed below.
print(np.unique(african_countries.COUNTRY))
['Algeria' 'Angola' 'Benin' 'Botswana' 'Burkina Faso' 'Burundi' 'Cameroon' 'Cape Verde' 'Central African Republic' 'Chad' 'Comoros' 'Congo-Brazzaville' 'Cote d`Ivoire' 'Democratic Republic of Congo' 'Djibouti' 'Egypt' 'Equatorial Guinea' 'Eritrea' 'Ethiopia' 'Gabon' 'Gambia' 'Ghana' 'Guinea' 'Guinea-Bissau' 'Kenya' 'Lesotho' 'Liberia' 'Libya' 'Madagascar' 'Malawi' 'Mali' 'Mauritania' 'Morocco' 'Mozambique' 'Namibia' 'Niger' 'Nigeria' 'Rwanda' 'Sao Tome and Principe' 'Senegal' 'Sierra Leone' 'Somalia' 'South Africa' 'Sudan' 'Swaziland' 'Tanzania' 'Togo' 'Tunisia' 'Uganda' 'Western Sahara' 'Zambia' 'Zimbabwe']
The country selected needs to be transformed into a geometry object to be used in the dc.load()
function.
idx = african_countries[african_countries['COUNTRY'] == country].index[0]
geom = Geometry(geom=african_countries.iloc[idx].geometry, crs=african_countries.crs)
First, let's have a look at the product information for CHIRPS rainfall. We can see that it is stored in monthly timestamps and its native resolution is approximately 0.05 degrees.
dc.list_products().loc[dc.list_products()['name'].str.contains('chirps')]
name | description | license | default_crs | default_resolution | |
---|---|---|---|---|---|
name | |||||
rainfall_chirps_monthly | rainfall_chirps_monthly | Rainfall Estimates from Rain Gauge and Satelli... | None | EPSG:4326 | (-0.05000000074505806, 0.05000000074505806) |
Using the analysis parameters defined above, we will load CHIRPS monthly rainfall data for the 30-year reference period (m).
query = {'geopolygon': geom,
'time': time_m,
'output_crs': 'epsg:6933',
'resolution': resolution,
'measurements': ['rainfall'],
'dask_chunks':dask_chunks
}
ds_rf = dc.load(product='rainfall_chirps_monthly', **query)
ds_rf
<xarray.Dataset> Dimensions: (time: 372, y: 238, x: 155) Coordinates: * time (time) datetime64[ns] 1981-01-16T11:59:59.500000 ... 2011-12... * y (y) float64 5.875e+05 5.825e+05 ... -5.925e+05 -5.975e+05 * x (x) float64 3.272e+06 3.278e+06 ... 4.038e+06 4.042e+06 spatial_ref int32 6933 Data variables: rainfall (time, y, x) float32 dask.array<chunksize=(1, 238, 155), meta=np.ndarray> Attributes: crs: epsg:6933 grid_mapping: spatial_ref
array(['1981-01-16T11:59:59.500000000', '1981-02-14T23:59:59.500000000', '1981-03-16T11:59:59.500000000', ..., '2011-10-16T11:59:59.500000000', '2011-11-15T23:59:59.500000000', '2011-12-16T11:59:59.500000000'], dtype='datetime64[ns]')
array([ 587500., 582500., 577500., ..., -587500., -592500., -597500.])
array([3272500., 3277500., 3282500., 3287500., 3292500., 3297500., 3302500., 3307500., 3312500., 3317500., 3322500., 3327500., 3332500., 3337500., 3342500., 3347500., 3352500., 3357500., 3362500., 3367500., 3372500., 3377500., 3382500., 3387500., 3392500., 3397500., 3402500., 3407500., 3412500., 3417500., 3422500., 3427500., 3432500., 3437500., 3442500., 3447500., 3452500., 3457500., 3462500., 3467500., 3472500., 3477500., 3482500., 3487500., 3492500., 3497500., 3502500., 3507500., 3512500., 3517500., 3522500., 3527500., 3532500., 3537500., 3542500., 3547500., 3552500., 3557500., 3562500., 3567500., 3572500., 3577500., 3582500., 3587500., 3592500., 3597500., 3602500., 3607500., 3612500., 3617500., 3622500., 3627500., 3632500., 3637500., 3642500., 3647500., 3652500., 3657500., 3662500., 3667500., 3672500., 3677500., 3682500., 3687500., 3692500., 3697500., 3702500., 3707500., 3712500., 3717500., 3722500., 3727500., 3732500., 3737500., 3742500., 3747500., 3752500., 3757500., 3762500., 3767500., 3772500., 3777500., 3782500., 3787500., 3792500., 3797500., 3802500., 3807500., 3812500., 3817500., 3822500., 3827500., 3832500., 3837500., 3842500., 3847500., 3852500., 3857500., 3862500., 3867500., 3872500., 3877500., 3882500., 3887500., 3892500., 3897500., 3902500., 3907500., 3912500., 3917500., 3922500., 3927500., 3932500., 3937500., 3942500., 3947500., 3952500., 3957500., 3962500., 3967500., 3972500., 3977500., 3982500., 3987500., 3992500., 3997500., 4002500., 4007500., 4012500., 4017500., 4022500., 4027500., 4032500., 4037500., 4042500.])
array(6933, dtype=int32)
|
155 238 372 |
Below, the country polygon is rasterized so the rainfall dataset is masked within that raster.
african_countries = african_countries.to_crs('epsg:6933')
mask = xr_rasterize(african_countries[african_countries['COUNTRY'] == country], ds_rf)
#mask the rainfall dataset
ds_rf = ds_rf.where(mask)
# Plot the mask
mask.plot()
<matplotlib.collections.QuadMesh at 0x7f0a7f7fd790>
We want to capture both the monthly mean (i.e. each month averaged over thirty years) and the monthly standard deviation of rainfall within the country polygon for each year from 1981 to 2011. Firstly, rainfall is grouped by month and a mean is calculated, then the standard devation in rainfall total for each month is calculated
ds_rf_m = ds_rf.where(ds_rf !=-9999.) #convert missing values to NaN
# monthly means
climatology_mean = ds_rf_m.groupby('time.month').mean('time').compute()
#calculate monthly std dev
climatology_std = ds_rf_m.groupby('time.month').std('time').compute()
Now we can plot the rainfall mean climatology, this is the average rainfall (over 30 years) for each month
climatology_mean['rainfall'].plot.imshow(cmap='viridis', col='month', col_wrap=6, label=False);
Using the analysis parameters defined above, we will load CHIRPS rainfall data for the period over which we want to calculate anomalies (x). We also need to mask this dataset to the country polygon.
#load rainfall data for the anomaly period matching the spatial extent of the climatologies
ds_rf_x = dc.load(product='rainfall_chirps_monthly',
like=ds_rf.geobox,
time=time_x,
dask_chunks=dask_chunks)
#mask with country polygon
ds_rf_x=ds_rf_x.where(mask)
We can visualise the anomalies spatially and see if they are associated with certain landscape features.
Do the spatial anomalies shown in the plots below align with the aggregated anomalies shown above?
stand_anomalies = xr.apply_ufunc(
lambda x, m, s: (x - m) / s,
ds_rf_x.groupby("time.month"),
climatology_mean,
climatology_std,
output_dtypes=[ds_rf_x.rainfall.dtype],
dask="allowed"
).compute()
Below, the spatial mean is taken so we can present the monthly anomalies aggregated across the selected country.
spatial_mean_anoms = stand_anomalies.rainfall.mean(['x','y']).to_dataframe().drop(['spatial_ref', 'month'], axis=1)
Below, we plot a bar graph that will show the average rainfall anomaly across the country
fig, ax = plt.subplots(figsize=(21,6))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.set_ylim(-2,3.5)
ax.bar(spatial_mean_anoms.index,
spatial_mean_anoms.rainfall,
width=35, align='center',
color=(spatial_mean_anoms['rainfall'] > 0).map({True: 'b', False: 'r'}))
ax.axhline(0, color='black', linestyle='--')
plt.title('Monthly standardised rainfall anomalies for '+country)
plt.ylabel('Std. deviations from monthly mean');
Average anomalies across the entire country obscure details on how rainfall anomalies are spatially distrbuted within the country. Below, enter a start
and end
date (in format 'YYYY-MM'
) that is within the time_x
range you entered in the Analysis parameters
section to plot per-pixel anomalies for the range of dates you specify.
#Select a time-range to plot
start='2020-07'
end='2021-06'
stand_anomalies['rainfall'].sel(time=slice(start,end)).plot(cmap='RdBu', label=False, col="time", col_wrap=6, robust=True);
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here).
If you would like to report an issue with this notebook, you can file one on Github.
Compatible datacube version:
print(datacube.__version__)
1.8.6
Last Tested:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
'2021-11-24'