In order to use the bias correction tools in the geoglows
package, you need 3 things.
geoglows
modelgeoglows
modelThe simulated historical and predicted streamflow are available through the GEOGloWS ECMWF Streamflow model via the geoglows python package.
Methods for recording streamflow and formats to save them in vary by country and not all streamflow is publically available online. As such, there is not a generic tool for retrieving observed streamflow through the geoglows package. You will need to provide it yourself.
Lets start by installing the geoglows tools in this notebook environment and importing some other dependencies we'll need.
# Start by installing the package and importing it to your code. Run this cell to do that.
!pip install geoglows
import geoglows
import math
import numpy as np
import pandas as pd
import datetime as dt
import hydrostats as hs
import hydrostats.data as hd
import plotly.graph_objs as go
from google.colab import files
from IPython.core.display import display, HTML
If you have a copy of a file on your machine you would like to use for bias correction, you can upload it to this notebook environment with the Google Collaboratory files.upload()
feature. Please note, this file is not saved once you leave this page. If you revisit this page, you will need to upload again.
Your csv should have 2 columns and both should have names as the first item. The first one should be titled datetime
and contain dates in a standard format. The other may have any title but must contain streamflow values in cubic meters per second (m^3/s)
OPTIONAL IF YOU NEED DEMO DATA, DOWNLOAD THIS CSV
uploaded = files.upload()
for fn in uploaded.keys():
uploaded_file_name = fn
print(f'User uploaded file "{fn}"')
observed_data = pd.read_csv(uploaded_file_name, index_col=0)
observed_data.index = pd.to_datetime(observed_data.index).tz_localize('UTC')
print('Here is a preview of your data')
print(observed_data.head(10))
Hydrologic models number the streams they simulate so that the results can be stored and organized. The GEOGloWS ECMWF model refers to these as reach_id's
and has an interface for finding them programatically by providing a latitude and longitude.
Use the latitude and longitude of your stream gauge to find the model's reporting point closest to your gauge for comparison.
OPTIONAL: IF YOU DOWNLOADED THE DEMO DATA, USE THIS LAT/LON PAIR
latitude = 7.81179264
longitude = -73.8105294
# Edit this cell with the latitude and longitude of your reporting point
latitude = 7.81179264
longitude = -73.8105294
# This function performs some geoprocessing and may take a few seconds to complete
reach_id = geoglows.streamflow.latlon_to_reach(latitude, longitude)['reach_id']
print(reach_id)
historical_data = geoglows.streamflow.historic_simulation(reach_id)
We will use the same reach_id as for the historical data to retrieve forecasted streamflow
stats = geoglows.streamflow.forecast_stats(reach_id)
ensembles = geoglows.streamflow.forecast_ensembles(reach_id)
records = geoglows.streamflow.forecast_records(reach_id)
Use the geoglows.bias
tools to correct the bias using your observed data
corrected_historical = geoglows.bias.correct_historical(historical_data, observed_data)
corrected_stats = geoglows.bias.correct_forecast(stats, historical_data, observed_data)
corrected_ensembles = geoglows.bias.correct_forecast(ensembles, historical_data, observed_data)
corrected_records = geoglows.bias.correct_forecast(records, historical_data, observed_data, use_month=-1)
# You can add more entries to the dicionary and they will appear in the title of the graph
titles = {'Reach ID': reach_id}
# You can add more entries to the dicionary and they will appear in the title of the graph
titles_bc = {'Reach ID': reach_id, 'bias_corrected': True}
Use the legend on the right of the plot to toggle on/off different layers
observed_Q = go.Scatter(x=observed_data.index, y=observed_data.iloc[:, 0].values, name='Observed', line=dict(color='#636EFA'))
simulated_Q = go.Scatter(x=historical_data.index, y=historical_data.iloc[:, 0].values, name='Simulated', line=dict(color='#ef553b'))
layout = go.Layout(title='Observed & Simulated Streamflow <br>Reach_ID: {0}'.format(titles['Reach ID']),
xaxis=dict(title='Dates', range= [historical_data.index[0], historical_data.index[-1]]),
yaxis=dict(title='Streamflow (m<sup>3</sup>/3)', autorange=True), showlegend=True)
chart_obj = go.Figure(data=[observed_Q, simulated_Q], layout=layout)
chart_obj.show()
Use the legend on the right of the plot to toggle on/off different layers
observed_Q = go.Scatter(x=observed_data.index, y=observed_data.iloc[:, 0].values, name='Observed', line=dict(color='#636EFA'))
corrected_Q = go.Scatter(x=corrected_historical.index, y=corrected_historical.iloc[:, 0].values, name='Corrected Simulated', line=dict(color='#00CC96'))
layout = go.Layout(title='Observed & Corrected Simulated Streamflow <br>Reach_ID: {0}'.format(titles_bc['Reach ID']),
xaxis=dict(title='Dates', range= [corrected_historical.index[0], corrected_historical.index[-1]]),
yaxis=dict(title='Streamflow (m<sup>3</sup>/3)', autorange=True), showlegend=True)
chart_obj = go.Figure(data=[observed_Q, corrected_Q], layout=layout)
chart_obj.show()
# This is a plot of the Original Simulated, Corrected Simulated, and Observed data
geoglows.plots.corrected_historical(corrected_historical, historical_data, observed_data, titles=titles).show()
Since there are many lines on the forecast plots, we recommend plotting the adjusted and original forecasts side by side rather than overlaying them together.
#Getting return periods
rperiods = geoglows.streamflow.return_periods(reach_id)
print(rperiods)
#Forecast Record
records = records.loc[records.index >= pd.to_datetime(stats.index[0] - dt.timedelta(days=8))]
# original forecast
hydroviewer_figure = geoglows.plots.forecast_stats(stats=stats, titles=titles)
x_vals = (stats.index[0], stats.index[len(stats.index) - 1], stats.index[len(stats.index) - 1], stats.index[0])
max_visible = max(stats.max())
if len(records.index) > 0:
hydroviewer_figure.add_trace(go.Scatter(
name='1st days forecasts',
x=records.index,
y=records.iloc[:, 0].values,
line=dict(
color='#FFA15A',
)
))
x_vals = (records.index[0], stats.index[len(stats.index) - 1], stats.index[len(stats.index) - 1], records.index[0])
max_visible = max(max(records.max()), max_visible)
r2 = int(rperiods.iloc[0]['return_period_2'])
colors = {
'2 Year': 'rgba(254, 240, 1, .4)',
'5 Year': 'rgba(253, 154, 1, .4)',
'10 Year': 'rgba(255, 56, 5, .4)',
'20 Year': 'rgba(128, 0, 246, .4)',
'25 Year': 'rgba(255, 0, 0, .4)',
'50 Year': 'rgba(128, 0, 106, .4)',
'100 Year': 'rgba(128, 0, 246, .4)',
}
if max_visible > r2:
visible = True
hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
else:
visible = 'legendonly'
hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
def template(name, y, color, fill='toself'):
return go.Scatter(
name=name,
x=x_vals,
y=y,
legendgroup='returnperiods',
fill=fill,
visible=visible,
line=dict(color=color, width=0))
r5 = int(rperiods.iloc[0]['return_period_5'])
r10 = int(rperiods.iloc[0]['return_period_10'])
r25 = int(rperiods.iloc[0]['return_period_25'])
r50 = int(rperiods.iloc[0]['return_period_50'])
r100 = int(rperiods.iloc[0]['return_period_100'])
hydroviewer_figure.add_trace(template('Return Periods', (r100 * 0.05, r100 * 0.05, r100 * 0.05, r100 * 0.05), 'rgba(0,0,0,0)', fill='none'))
hydroviewer_figure.add_trace(template(f'2 Year: {r2}', (r2, r2, r5, r5), colors['2 Year']))
hydroviewer_figure.add_trace(template(f'5 Year: {r5}', (r5, r5, r10, r10), colors['5 Year']))
hydroviewer_figure.add_trace(template(f'10 Year: {r10}', (r10, r10, r25, r25), colors['10 Year']))
hydroviewer_figure.add_trace(template(f'25 Year: {r25}', (r25, r25, r50, r50), colors['25 Year']))
hydroviewer_figure.add_trace(template(f'50 Year: {r50}', (r50, r50, r100, r100), colors['50 Year']))
hydroviewer_figure.add_trace(template(f'100 Year: {r100}', (r100, r100, max(r100 + r100 * 0.05, max_visible), max(r100 + r100 * 0.05, max_visible)),colors['100 Year']))
hydroviewer_figure['layout']['xaxis'].update(autorange=True);
hydroviewer_figure.show()
'''Adding Return Periods'''
max_annual_wl = corrected_historical.groupby(corrected_historical.index.strftime("%Y")).max()
mean_value = np.mean(max_annual_wl.iloc[:,0].values)
std_value = np.std(max_annual_wl.iloc[:,0].values)
return_periods = [100, 50, 25, 10, 5, 2]
def gumbel_1(std: float, xbar: float, rp: int or float) -> float:
"""
Solves the Gumbel Type I probability distribution function (pdf) = exp(-exp(-b)) where b is the covariate. Provide
the standard deviation and mean of the list of annual maximum flows. Compare scipy.stats.gumbel_r
Args:
std (float): the standard deviation of the series
xbar (float): the mean of the series
rp (int or float): the return period in years
Returns:
float, the flow corresponding to the return period specified
"""
# xbar = statistics.mean(year_max_flow_list)
# std = statistics.stdev(year_max_flow_list, xbar=xbar)
return -math.log(-math.log(1 - (1 / rp))) * std * .7797 + xbar - (.45 * std)
return_periods_values = []
for rp in return_periods:
return_periods_values.append(gumbel_1(std_value, mean_value, rp))
d = {'rivid': [reach_id], 'return_period_100': [return_periods_values[0]], 'return_period_50': [return_periods_values[1]], 'return_period_25': [return_periods_values[2]], 'return_period_10': [return_periods_values[3]], 'return_period_5': [return_periods_values[4]], 'return_period_2': [return_periods_values[5]]}
corrected_rperiods = pd.DataFrame(data=d)
corrected_rperiods.set_index('rivid', inplace=True)
print(corrected_rperiods)
Bias Correction based on the Stats Forecast
#Bias Corrected Forecast Record
corrected_records = corrected_records.loc[corrected_records.index >= pd.to_datetime(stats.index[0] - dt.timedelta(days=8))]
# corrected data based on the stats
hydroviewer_figure = geoglows.plots.forecast_stats(stats=corrected_stats, titles=titles_bc)
x_vals = (corrected_stats.index[0], corrected_stats.index[len(corrected_stats.index) - 1], corrected_stats.index[len(corrected_stats.index) - 1], corrected_stats.index[0])
max_visible = max(corrected_stats.max())
if len(corrected_records.index) > 0:
hydroviewer_figure.add_trace(go.Scatter(
name='1st days forecasts',
x=corrected_records.index,
y=corrected_records.iloc[:, 0].values,
line=dict(
color='#FFA15A',
)
))
x_vals = (corrected_records.index[0], corrected_stats.index[len(corrected_stats.index) - 1], corrected_stats.index[len(corrected_stats.index) - 1], corrected_records.index[0])
max_visible = max(max(corrected_records.max()), max_visible)
r2 = int(corrected_rperiods.iloc[0]['return_period_2'])
colors = {
'2 Year': 'rgba(254, 240, 1, .4)',
'5 Year': 'rgba(253, 154, 1, .4)',
'10 Year': 'rgba(255, 56, 5, .4)',
'20 Year': 'rgba(128, 0, 246, .4)',
'25 Year': 'rgba(255, 0, 0, .4)',
'50 Year': 'rgba(128, 0, 106, .4)',
'100 Year': 'rgba(128, 0, 246, .4)',
}
if max_visible > r2:
visible = True
hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
else:
visible = 'legendonly'
hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
def template(name, y, color, fill='toself'):
return go.Scatter(
name=name,
x=x_vals,
y=y,
legendgroup='returnperiods',
fill=fill,
visible=visible,
line=dict(color=color, width=0))
r5 = int(corrected_rperiods.iloc[0]['return_period_5'])
r10 = int(corrected_rperiods.iloc[0]['return_period_10'])
r25 = int(corrected_rperiods.iloc[0]['return_period_25'])
r50 = int(corrected_rperiods.iloc[0]['return_period_50'])
r100 = int(corrected_rperiods.iloc[0]['return_period_100'])
hydroviewer_figure.add_trace(template('Return Periods', (r100 * 0.05, r100 * 0.05, r100 * 0.05, r100 * 0.05), 'rgba(0,0,0,0)', fill='none'))
hydroviewer_figure.add_trace(template(f'2 Year: {r2}', (r2, r2, r5, r5), colors['2 Year']))
hydroviewer_figure.add_trace(template(f'5 Year: {r5}', (r5, r5, r10, r10), colors['5 Year']))
hydroviewer_figure.add_trace(template(f'10 Year: {r10}', (r10, r10, r25, r25), colors['10 Year']))
hydroviewer_figure.add_trace(template(f'25 Year: {r25}', (r25, r25, r50, r50), colors['25 Year']))
hydroviewer_figure.add_trace(template(f'50 Year: {r50}', (r50, r50, r100, r100), colors['50 Year']))
hydroviewer_figure.add_trace(template(f'100 Year: {r100}', (r100, r100, max(r100 + r100 * 0.05, max_visible), max(r100 + r100 * 0.05, max_visible)),colors['100 Year']))
hydroviewer_figure['layout']['xaxis'].update(autorange=True);
hydroviewer_figure.show()
Bias Correction based on the Ensemble Forecast
# corrected data based on ensemble
ensemble = corrected_ensembles.copy()
high_res_df = ensemble['ensemble_52_m^3/s'].to_frame()
ensemble.drop(columns=['ensemble_52_m^3/s'], inplace=True)
ensemble.dropna(inplace= True)
high_res_df.dropna(inplace= True)
max_df = ensemble.quantile(1.0, axis=1).to_frame()
max_df.rename(columns = {1.0:'flow_max_m^3/s'}, inplace = True)
p75_df = ensemble.quantile(0.75, axis=1).to_frame()
p75_df.rename(columns = {0.75:'flow_75%_m^3/s'}, inplace = True)
p25_df = ensemble.quantile(0.25, axis=1).to_frame()
p25_df.rename(columns = {0.25:'flow_25%_m^3/s'}, inplace = True)
min_df = ensemble.quantile(0, axis=1).to_frame()
min_df.rename(columns = {0.0:'flow_min_m^3/s'}, inplace = True)
mean_df = ensemble.mean(axis=1).to_frame()
mean_df.rename(columns = {0:'flow_avg_m^3/s'}, inplace = True)
high_res_df.rename(columns = {'ensemble_52_m^3/s':'high_res_m^3/s'}, inplace = True)
stats_corrected = pd.concat([max_df, p75_df, mean_df, p25_df, min_df, high_res_df], axis=1)
hydroviewer_figure = geoglows.plots.forecast_stats(stats=stats_corrected, titles=titles_bc)
x_vals = (stats_corrected.index[0], stats_corrected.index[len(stats_corrected.index) - 1], stats_corrected.index[len(stats_corrected.index) - 1], stats_corrected.index[0])
max_visible = max(stats_corrected.max())
if len(corrected_records.index) > 0:
hydroviewer_figure.add_trace(go.Scatter(
name='1st days forecasts',
x=corrected_records.index,
y=corrected_records.iloc[:, 0].values,
line=dict(
color='#FFA15A',
)
))
x_vals = (corrected_records.index[0], stats_corrected.index[len(stats_corrected.index) - 1], stats_corrected.index[len(stats_corrected.index) - 1], corrected_records.index[0])
max_visible = max(max(corrected_records.max()), max_visible)
r2 = int(corrected_rperiods.iloc[0]['return_period_2'])
colors = {
'2 Year': 'rgba(254, 240, 1, .4)',
'5 Year': 'rgba(253, 154, 1, .4)',
'10 Year': 'rgba(255, 56, 5, .4)',
'20 Year': 'rgba(128, 0, 246, .4)',
'25 Year': 'rgba(255, 0, 0, .4)',
'50 Year': 'rgba(128, 0, 106, .4)',
'100 Year': 'rgba(128, 0, 246, .4)',
}
if max_visible > r2:
visible = True
hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
else:
visible = 'legendonly'
hydroviewer_figure.for_each_trace(lambda trace: trace.update(visible=True) if trace.name == "Maximum & Minimum Flow" else (),)
def template(name, y, color, fill='toself'):
return go.Scatter(
name=name,
x=x_vals,
y=y,
legendgroup='returnperiods',
fill=fill,
visible=visible,
line=dict(color=color, width=0))
r5 = int(corrected_rperiods.iloc[0]['return_period_5'])
r10 = int(corrected_rperiods.iloc[0]['return_period_10'])
r25 = int(corrected_rperiods.iloc[0]['return_period_25'])
r50 = int(corrected_rperiods.iloc[0]['return_period_50'])
r100 = int(corrected_rperiods.iloc[0]['return_period_100'])
hydroviewer_figure.add_trace(template('Return Periods', (r100 * 0.05, r100 * 0.05, r100 * 0.05, r100 * 0.05), 'rgba(0,0,0,0)', fill='none'))
hydroviewer_figure.add_trace(template(f'2 Year: {r2}', (r2, r2, r5, r5), colors['2 Year']))
hydroviewer_figure.add_trace(template(f'5 Year: {r5}', (r5, r5, r10, r10), colors['5 Year']))
hydroviewer_figure.add_trace(template(f'10 Year: {r10}', (r10, r10, r25, r25), colors['10 Year']))
hydroviewer_figure.add_trace(template(f'25 Year: {r25}', (r25, r25, r50, r50), colors['25 Year']))
hydroviewer_figure.add_trace(template(f'50 Year: {r50}', (r50, r50, r100, r100), colors['50 Year']))
hydroviewer_figure.add_trace(template(f'100 Year: {r100}', (r100, r100, max(r100 + r100 * 0.05, max_visible), max(r100 + r100 * 0.05, max_visible)),colors['100 Year']))
hydroviewer_figure['layout']['xaxis'].update(autorange=True);
hydroviewer_figure.show()
There are many tools in the geoglows package to analyze how much the bias correction improved the streamflow simulations. These are based on the statistical analysis performed by the hydrostats
and HydroErr
python packages
# This is a scatter plot of the original vs simulated data
geoglows.plots.corrected_scatterplots(corrected_historical, historical_data, observed_data, titles=titles).show()
# This is a plot of the monthly averages
geoglows.plots.corrected_month_average(corrected_historical, historical_data, observed_data, titles=titles).show()
# This is a plot of the daily averages
geoglows.plots.corrected_day_average(corrected_historical, historical_data, observed_data, titles=titles).show()
# This is a plot of the cumulative annual volumes
geoglows.plots.corrected_volume_compare(corrected_historical, historical_data, observed_data, titles=titles).show()
'''Merge Data'''
merged_df = hd.merge_data(sim_df=historical_data, obs_df=observed_data)
merged_df2 = hd.merge_data(sim_df=corrected_historical, obs_df=observed_data)
# This is a table of a few important statistics
metrics = ['ME', 'RMSE', 'NRMSE (Mean)', 'MAPE', 'NSE', 'KGE (2009)', 'KGE (2012)', "R (Pearson)", "R (Spearman)", "r2"]
# Merge Data
table1 = hs.make_table(merged_dataframe=merged_df, metrics=metrics)
table2 = hs.make_table(merged_dataframe=merged_df2, metrics=metrics)
table2 = table2.rename(index={'Full Time Series': 'Corrected Full Time Series'})
table1 = table1.rename(index={'Full Time Series': 'Original Full Time Series'})
table1 = table1.transpose()
table2 = table2.transpose()
table_final = pd.merge(table1, table2, right_index=True, left_index=True)
display(HTML(table_final.to_html()))
corrected_historical.to_csv('corrected_historical_streamflow.csv')
files.download('corrected_historical_streamflow.csv')
corrected_records.to_csv('corrected_forecast_records.csv')
files.download('corrected_forecast_records.csv')
corrected_stats.to_csv('corrected_forecast_stats.csv')
files.download('corrected_forecast_stats.csv')
corrected_ensembles.to_csv('corrected_forecasted_ensembles.csv')
files.download('corrected_forecasted_ensembles.csv')