Bias Correction¶

In order to use the bias correction tools in the geoglows package, you need 3 things.

Observed streamflow data
Simulated historical streamflow data from the geoglows model
Data to correct: either the historical data, or any other timeseries of simulated flows from the geoglows model

The simulated historical and predicted streamflow are available through the GEOGloWS ECMWF Streamflow model via the geoglows python package.

Methods for recording streamflow and formats to save them in vary by country and not all streamflow is publically available online. As such, there is not a generic tool for retrieving observed streamflow through the geoglows package. You will need to provide it yourself.

Lets start by installing the geoglows tools in this notebook environment and importing some other dependencies we'll need.

In [ ]:

# Start by installing the package and importing it to your code. Run this cell to do that.
!pip install geoglows
import geoglows
import math
import numpy as np
import pandas as pd
import datetime as dt
import hydrostats as hs
import hydrostats.data as hd
import plotly.graph_objs as go
from google.colab import files
from IPython.core.display import display, HTML

Step 1: Upload a csv on your computer for Bias Correction¶

If you have a copy of a file on your machine you would like to use for bias correction, you can upload it to this notebook environment with the Google Collaboratory files.upload() feature. Please note, this file is not saved once you leave this page. If you revisit this page, you will need to upload again.

Your csv should have 2 columns and both should have names as the first item. The first one should be titled datetime and contain dates in a standard format. The other may have any title but must contain streamflow values in cubic meters per second (m^3/s)

OPTIONAL IF YOU NEED DEMO DATA, DOWNLOAD THIS CSV

https://www.hydroshare.org/resource/d222676fbd984a81911761ca1ba936bf/data/contents/Discharge_Data/23187280.csv

In [ ]:

uploaded = files.upload()
for fn in uploaded.keys():
  uploaded_file_name = fn
  print(f'User uploaded file "{fn}"')

In [ ]:

observed_data = pd.read_csv(uploaded_file_name, index_col=0)
observed_data.index = pd.to_datetime(observed_data.index).tz_localize('UTC')
print('Here is a preview of your data')
print(observed_data.head(10))

Step 2: Get the historical data for correction¶

Hydrologic models number the streams they simulate so that the results can be stored and organized. The GEOGloWS ECMWF model refers to these as reach_id's and has an interface for finding them programatically by providing a latitude and longitude.

Use the latitude and longitude of your stream gauge to find the model's reporting point closest to your gauge for comparison.

OPTIONAL: IF YOU DOWNLOADED THE DEMO DATA, USE THIS LAT/LON PAIR
latitude = 7.81179264
longitude = -73.8105294

In [ ]:

# Edit this cell with the latitude and longitude of your reporting point
latitude = 7.81179264
longitude = -73.8105294

In [ ]:

# This function performs some geoprocessing and may take a few seconds to complete
reach_id = geoglows.streamflow.latlon_to_reach(latitude, longitude)['reach_id']
print(reach_id)

In [ ]:

historical_data = geoglows.streamflow.historic_simulation(reach_id)

Step 3: Get other Forecasted Data for correction¶

We will use the same reach_id as for the historical data to retrieve forecasted streamflow

In [ ]:

stats = geoglows.streamflow.forecast_stats(reach_id)
ensembles = geoglows.streamflow.forecast_ensembles(reach_id)
records = geoglows.streamflow.forecast_records(reach_id)

Step 4: Perform the bias correction¶

Use the geoglows.bias tools to correct the bias using your observed data

In [ ]:

corrected_historical = geoglows.bias.correct_historical(historical_data, observed_data)
corrected_stats = geoglows.bias.correct_forecast(stats, historical_data, observed_data)
corrected_ensembles = geoglows.bias.correct_forecast(ensembles, historical_data, observed_data)
corrected_records = geoglows.bias.correct_forecast(records, historical_data, observed_data, use_month=-1)

Step 5: Plot the results¶

In [ ]:

# You can add more entries to the dicionary and they will appear in the title of the graph
titles = {'Reach ID': reach_id}

In [ ]:

# You can add more entries to the dicionary and they will appear in the title of the graph
titles_bc = {'Reach ID': reach_id, 'bias_corrected': True}

Historical Data - Original Historic Simulation¶

Use the legend on the right of the plot to toggle on/off different layers

In [ ]:

observed_Q = go.Scatter(x=observed_data.index, y=observed_data.iloc[:, 0].values, name='Observed', line=dict(color='#636EFA'))
simulated_Q = go.Scatter(x=historical_data.index, y=historical_data.iloc[:, 0].values, name='Simulated', line=dict(color='#ef553b'))

layout = go.Layout(title='Observed & Simulated Streamflow <br>Reach_ID: {0}'.format(titles['Reach ID']),
                   xaxis=dict(title='Dates', range= [historical_data.index[0], historical_data.index[-1]]),
                   yaxis=dict(title='Streamflow (m<sup>3</sup>/3)', autorange=True), showlegend=True)

chart_obj = go.Figure(data=[observed_Q, simulated_Q], layout=layout)
chart_obj.show()

Historical Data - Bias Corrected Historic Simulation¶

Use the legend on the right of the plot to toggle on/off different layers

In [ ]:

observed_Q = go.Scatter(x=observed_data.index, y=observed_data.iloc[:, 0].values, name='Observed', line=dict(color='#636EFA'))
corrected_Q = go.Scatter(x=corrected_historical.index, y=corrected_historical.iloc[:, 0].values, name='Corrected Simulated', line=dict(color='#00CC96'))

layout = go.Layout(title='Observed & Corrected Simulated Streamflow <br>Reach_ID: {0}'.format(titles_bc['Reach ID']),
                   xaxis=dict(title='Dates', range= [corrected_historical.index[0], corrected_historical.index[-1]]),
                   yaxis=dict(title='Streamflow (m<sup>3</sup>/3)', autorange=True), showlegend=True)

chart_obj = go.Figure(data=[observed_Q, corrected_Q], layout=layout)
chart_obj.show()

In [ ]:

# This is a plot of the Original Simulated, Corrected Simulated, and Observed data
geoglows.plots.corrected_historical(corrected_historical, historical_data, observed_data, titles=titles).show()

Forecasted Data¶

Since there are many lines on the forecast plots, we recommend plotting the adjusted and original forecasts side by side rather than overlaying them together.

Original Forecast¶

In [ ]:

#Getting return periods
rperiods = geoglows.streamflow.return_periods(reach_id)

print(rperiods)

In [ ]:

#Forecast Record
records = records.loc[records.index >= pd.to_datetime(stats.index[0] - dt.timedelta(days=8))]

In [ ]:

# original forecast
hydroviewer_figure = geoglows.plots.forecast_stats(stats=stats, titles=titles)
x_vals = (stats.index[0], stats.index[len(stats.index) - 1], stats.index[len(stats.index) - 1], stats.index[0])
max_visible = max(stats.max())

if len(records.index) > 0:
  hydroviewer_figure.add_trace(go.Scatter(
      name='1st days forecasts',
      x=records.index,
      y=records.iloc[:, 0].values,
      line=dict(
          color='#FFA15A',
      )
  ))

  x_vals = (records.index[0], stats.index[len(stats.index) - 1], stats.index[len(stats.index) - 1], records.index[0])
  max_visible = max(max(records.max()), max_visible)

r2 = int(rperiods.iloc[0]['return_period_2'])

colors = {
    '2 Year': 'rgba(254, 240, 1, .4)',
    '5 Year': 'rgba(253, 154, 1, .4)',
    '10 Year': 'rgba(255, 56, 5, .4)',
    '20 Year': 'rgba(128, 0, 246, .4)',
    '25 Year': 'rgba(255, 0, 0, .4)',
    '50 Year': 'rgba(128, 0, 106, .4)',
    '100 Year': 'rgba(128, 0, 246, .4)',
}

if max_visible > r2:
  visible = True
  hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
else:
  visible = 'legendonly'
  hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)

def template(name, y, color, fill='toself'):
  return go.Scatter(
      name=name,
      x=x_vals,
      y=y,
      legendgroup='returnperiods',
      fill=fill,
      visible=visible,
      line=dict(color=color, width=0))

r5 = int(rperiods.iloc[0]['return_period_5'])
r10 = int(rperiods.iloc[0]['return_period_10'])
r25 = int(rperiods.iloc[0]['return_period_25'])
r50 = int(rperiods.iloc[0]['return_period_50'])
r100 = int(rperiods.iloc[0]['return_period_100'])

hydroviewer_figure.add_trace(template('Return Periods', (r100 * 0.05, r100 * 0.05, r100 * 0.05, r100 * 0.05), 'rgba(0,0,0,0)', fill='none'))
hydroviewer_figure.add_trace(template(f'2 Year: {r2}', (r2, r2, r5, r5), colors['2 Year']))
hydroviewer_figure.add_trace(template(f'5 Year: {r5}', (r5, r5, r10, r10), colors['5 Year']))
hydroviewer_figure.add_trace(template(f'10 Year: {r10}', (r10, r10, r25, r25), colors['10 Year']))
hydroviewer_figure.add_trace(template(f'25 Year: {r25}', (r25, r25, r50, r50), colors['25 Year']))
hydroviewer_figure.add_trace(template(f'50 Year: {r50}', (r50, r50, r100, r100), colors['50 Year']))
hydroviewer_figure.add_trace(template(f'100 Year: {r100}', (r100, r100, max(r100 + r100 * 0.05, max_visible), max(r100 + r100 * 0.05, max_visible)),colors['100 Year']))

hydroviewer_figure['layout']['xaxis'].update(autorange=True);

hydroviewer_figure.show()

Bias Corrected Forecast¶

In [ ]:

'''Adding Return Periods'''
max_annual_wl = corrected_historical.groupby(corrected_historical.index.strftime("%Y")).max()
mean_value = np.mean(max_annual_wl.iloc[:,0].values)
std_value = np.std(max_annual_wl.iloc[:,0].values)

return_periods = [100, 50, 25, 10, 5, 2]

def gumbel_1(std: float, xbar: float, rp: int or float) -> float:
  """
  Solves the Gumbel Type I probability distribution function (pdf) = exp(-exp(-b)) where b is the covariate. Provide
  the standard deviation and mean of the list of annual maximum flows. Compare scipy.stats.gumbel_r
  Args:
  	std (float): the standard deviation of the series
    xbar (float): the mean of the series
    rp (int or float): the return period in years
  Returns:
  	float, the flow corresponding to the return period specified
  """
  # xbar = statistics.mean(year_max_flow_list)
  # std = statistics.stdev(year_max_flow_list, xbar=xbar)
  return -math.log(-math.log(1 - (1 / rp))) * std * .7797 + xbar - (.45 * std)

return_periods_values = []

for rp in return_periods:
  return_periods_values.append(gumbel_1(std_value, mean_value, rp))

d = {'rivid': [reach_id], 'return_period_100': [return_periods_values[0]], 'return_period_50': [return_periods_values[1]], 'return_period_25': [return_periods_values[2]], 'return_period_10': [return_periods_values[3]], 'return_period_5': [return_periods_values[4]], 'return_period_2': [return_periods_values[5]]}
corrected_rperiods = pd.DataFrame(data=d)
corrected_rperiods.set_index('rivid', inplace=True)

print(corrected_rperiods)

Bias Correction based on the Stats Forecast

In [ ]:

#Bias Corrected Forecast Record
corrected_records = corrected_records.loc[corrected_records.index >= pd.to_datetime(stats.index[0] - dt.timedelta(days=8))]

In [ ]:

# corrected data based on the stats
hydroviewer_figure = geoglows.plots.forecast_stats(stats=corrected_stats, titles=titles_bc)

x_vals = (corrected_stats.index[0], corrected_stats.index[len(corrected_stats.index) - 1], corrected_stats.index[len(corrected_stats.index) - 1], corrected_stats.index[0])
max_visible = max(corrected_stats.max())

if len(corrected_records.index) > 0:
  hydroviewer_figure.add_trace(go.Scatter(
      name='1st days forecasts',
      x=corrected_records.index,
      y=corrected_records.iloc[:, 0].values,
      line=dict(
          color='#FFA15A',
      )
  ))

  x_vals = (corrected_records.index[0], corrected_stats.index[len(corrected_stats.index) - 1], corrected_stats.index[len(corrected_stats.index) - 1], corrected_records.index[0])
  max_visible = max(max(corrected_records.max()), max_visible)

r2 = int(corrected_rperiods.iloc[0]['return_period_2'])

colors = {
    '2 Year': 'rgba(254, 240, 1, .4)',
    '5 Year': 'rgba(253, 154, 1, .4)',
    '10 Year': 'rgba(255, 56, 5, .4)',
    '20 Year': 'rgba(128, 0, 246, .4)',
    '25 Year': 'rgba(255, 0, 0, .4)',
    '50 Year': 'rgba(128, 0, 106, .4)',
    '100 Year': 'rgba(128, 0, 246, .4)',
}

if max_visible > r2:
  visible = True
  hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
else:
  visible = 'legendonly'
  hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)

def template(name, y, color, fill='toself'):
  return go.Scatter(
      name=name,
      x=x_vals,
      y=y,
      legendgroup='returnperiods',
      fill=fill,
      visible=visible,
      line=dict(color=color, width=0))

r5 = int(corrected_rperiods.iloc[0]['return_period_5'])
r10 = int(corrected_rperiods.iloc[0]['return_period_10'])
r25 = int(corrected_rperiods.iloc[0]['return_period_25'])
r50 = int(corrected_rperiods.iloc[0]['return_period_50'])
r100 = int(corrected_rperiods.iloc[0]['return_period_100'])

hydroviewer_figure.add_trace(template('Return Periods', (r100 * 0.05, r100 * 0.05, r100 * 0.05, r100 * 0.05), 'rgba(0,0,0,0)', fill='none'))
hydroviewer_figure.add_trace(template(f'2 Year: {r2}', (r2, r2, r5, r5), colors['2 Year']))
hydroviewer_figure.add_trace(template(f'5 Year: {r5}', (r5, r5, r10, r10), colors['5 Year']))
hydroviewer_figure.add_trace(template(f'10 Year: {r10}', (r10, r10, r25, r25), colors['10 Year']))
hydroviewer_figure.add_trace(template(f'25 Year: {r25}', (r25, r25, r50, r50), colors['25 Year']))
hydroviewer_figure.add_trace(template(f'50 Year: {r50}', (r50, r50, r100, r100), colors['50 Year']))
hydroviewer_figure.add_trace(template(f'100 Year: {r100}', (r100, r100, max(r100 + r100 * 0.05, max_visible), max(r100 + r100 * 0.05, max_visible)),colors['100 Year']))


hydroviewer_figure['layout']['xaxis'].update(autorange=True);

hydroviewer_figure.show()

Bias Correction based on the Ensemble Forecast

In [ ]:

# corrected data based on ensemble

ensemble = corrected_ensembles.copy()
high_res_df = ensemble['ensemble_52_m^3/s'].to_frame()
ensemble.drop(columns=['ensemble_52_m^3/s'], inplace=True)
ensemble.dropna(inplace= True)
high_res_df.dropna(inplace= True)

max_df = ensemble.quantile(1.0, axis=1).to_frame()
max_df.rename(columns = {1.0:'flow_max_m^3/s'}, inplace = True)

p75_df = ensemble.quantile(0.75, axis=1).to_frame()
p75_df.rename(columns = {0.75:'flow_75%_m^3/s'}, inplace = True)

p25_df = ensemble.quantile(0.25, axis=1).to_frame()
p25_df.rename(columns = {0.25:'flow_25%_m^3/s'}, inplace = True)

min_df = ensemble.quantile(0, axis=1).to_frame()
min_df.rename(columns = {0.0:'flow_min_m^3/s'}, inplace = True)

mean_df = ensemble.mean(axis=1).to_frame()
mean_df.rename(columns = {0:'flow_avg_m^3/s'}, inplace = True)

high_res_df.rename(columns = {'ensemble_52_m^3/s':'high_res_m^3/s'}, inplace = True)

stats_corrected = pd.concat([max_df, p75_df, mean_df, p25_df, min_df, high_res_df], axis=1)

hydroviewer_figure = geoglows.plots.forecast_stats(stats=stats_corrected, titles=titles_bc)

x_vals = (stats_corrected.index[0], stats_corrected.index[len(stats_corrected.index) - 1], stats_corrected.index[len(stats_corrected.index) - 1], stats_corrected.index[0])
max_visible = max(stats_corrected.max())

if len(corrected_records.index) > 0:
  hydroviewer_figure.add_trace(go.Scatter(
      name='1st days forecasts',
      x=corrected_records.index,
      y=corrected_records.iloc[:, 0].values,
      line=dict(
          color='#FFA15A',
      )
  ))

  x_vals = (corrected_records.index[0], stats_corrected.index[len(stats_corrected.index) - 1], stats_corrected.index[len(stats_corrected.index) - 1], corrected_records.index[0])
  max_visible = max(max(corrected_records.max()), max_visible)

r2 = int(corrected_rperiods.iloc[0]['return_period_2'])

colors = {
    '2 Year': 'rgba(254, 240, 1, .4)',
    '5 Year': 'rgba(253, 154, 1, .4)',
    '10 Year': 'rgba(255, 56, 5, .4)',
    '20 Year': 'rgba(128, 0, 246, .4)',
    '25 Year': 'rgba(255, 0, 0, .4)',
    '50 Year': 'rgba(128, 0, 106, .4)',
    '100 Year': 'rgba(128, 0, 246, .4)',
}

if max_visible > r2:
  visible = True
  hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
else:
  visible = 'legendonly'
  hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)

def template(name, y, color, fill='toself'):
  return go.Scatter(
      name=name,
      x=x_vals,
      y=y,
      legendgroup='returnperiods',
      fill=fill,
      visible=visible,
      line=dict(color=color, width=0))

r5 = int(corrected_rperiods.iloc[0]['return_period_5'])
r10 = int(corrected_rperiods.iloc[0]['return_period_10'])
r25 = int(corrected_rperiods.iloc[0]['return_period_25'])
r50 = int(corrected_rperiods.iloc[0]['return_period_50'])
r100 = int(corrected_rperiods.iloc[0]['return_period_100'])

hydroviewer_figure.add_trace(template('Return Periods', (r100 * 0.05, r100 * 0.05, r100 * 0.05, r100 * 0.05), 'rgba(0,0,0,0)', fill='none'))
hydroviewer_figure.add_trace(template(f'2 Year: {r2}', (r2, r2, r5, r5), colors['2 Year']))
hydroviewer_figure.add_trace(template(f'5 Year: {r5}', (r5, r5, r10, r10), colors['5 Year']))
hydroviewer_figure.add_trace(template(f'10 Year: {r10}', (r10, r10, r25, r25), colors['10 Year']))
hydroviewer_figure.add_trace(template(f'25 Year: {r25}', (r25, r25, r50, r50), colors['25 Year']))
hydroviewer_figure.add_trace(template(f'50 Year: {r50}', (r50, r50, r100, r100), colors['50 Year']))
hydroviewer_figure.add_trace(template(f'100 Year: {r100}', (r100, r100, max(r100 + r100 * 0.05, max_visible), max(r100 + r100 * 0.05, max_visible)),colors['100 Year']))


hydroviewer_figure['layout']['xaxis'].update(autorange=True);

hydroviewer_figure.show()

Step 6: Statistics, Summaries, Averages, etc¶

There are many tools in the geoglows package to analyze how much the bias correction improved the streamflow simulations. These are based on the statistical analysis performed by the hydrostats and HydroErr python packages

In [ ]:

# This is a scatter plot of the original vs simulated data
geoglows.plots.corrected_scatterplots(corrected_historical, historical_data, observed_data, titles=titles).show()

In [ ]:

# This is a plot of the monthly averages
geoglows.plots.corrected_month_average(corrected_historical, historical_data, observed_data, titles=titles).show()

In [ ]:

# This is a plot of the daily averages
geoglows.plots.corrected_day_average(corrected_historical, historical_data, observed_data, titles=titles).show()

In [ ]:

# This is a plot of the cumulative annual volumes
geoglows.plots.corrected_volume_compare(corrected_historical, historical_data, observed_data, titles=titles).show()

In [ ]:

'''Merge Data'''
merged_df = hd.merge_data(sim_df=historical_data, obs_df=observed_data)
merged_df2 = hd.merge_data(sim_df=corrected_historical, obs_df=observed_data)

# This is a table of a few important statistics 
metrics = ['ME', 'RMSE', 'NRMSE (Mean)', 'MAPE', 'NSE', 'KGE (2009)', 'KGE (2012)', "R (Pearson)", "R (Spearman)", "r2"]
# Merge Data
table1 = hs.make_table(merged_dataframe=merged_df, metrics=metrics)
table2 = hs.make_table(merged_dataframe=merged_df2, metrics=metrics)

table2 = table2.rename(index={'Full Time Series': 'Corrected Full Time Series'})
table1 = table1.rename(index={'Full Time Series': 'Original Full Time Series'})
table1 = table1.transpose()
table2 = table2.transpose()

table_final = pd.merge(table1, table2, right_index=True, left_index=True)

display(HTML(table_final.to_html()))

Download your corrected results as CSV¶

In [ ]:

corrected_historical.to_csv('corrected_historical_streamflow.csv')
files.download('corrected_historical_streamflow.csv')

In [ ]:

corrected_records.to_csv('corrected_forecast_records.csv')
files.download('corrected_forecast_records.csv')

In [ ]:

corrected_stats.to_csv('corrected_forecast_stats.csv')
files.download('corrected_forecast_stats.csv')

In [ ]:

corrected_ensembles.to_csv('corrected_forecasted_ensembles.csv')
files.download('corrected_forecasted_ensembles.csv')