The winfapReader
package contains functions to interact
with the information on extremes of instantaneous river flow in the
United Kingdom (UK) made available by the National River Flow Archive (NRFA).
These information underlie most flood risk estimation projects in the
UK, which are typically carried out using the Flood
Estimation Handbook (FEH) statistical method and their updates
as implemented by the software WINFAP.
Consequently, the NRFA publishes and routinely updates files which are
suitable to be read by WINFAP: the collection of these files is referred
to as the peak flow dataset and can be found here. The
winfapReader package allows the user to interact with three different
file-types:
The .AM files which contain the Annual Maximum (AMAX) peaks: these correspond to the largest river flow event in any given water year (which runs from October 1st to September 30th)
The .CD3 files which contain the Catchment Descriptors: these correspond to a set of descriptors for the catchment upstream the gauging station and for the station itself.
The .PT files which contain the peaks over threshold: these correspond to all peaks which are larger than a given threshold. The threshold is fixed by the NRFA and it should be such that there is an average of 3 to 5 peaks over threshold (POT) events per water year. It has often been reported that the POT records in different stations have varying reliability: since most flood frequency estimation methods used in the UK rely on annual maxima the AMAX records go trough a higher scrutiny than then POT records. Users should treat the information about peaks over threshold with caution and thorough quality checks should be performed before analysing them.
The winfapReader
package allows you to read into your R
session the .AM
, .CD3
and.PT
files. Importantly it is aware of the typical structure of the files in
which rejected annual maxima and missing period of records for the peaks
over threshold are recorded, and merges this information with the flow
records. This allows the user to have all useful information to decide
which parts of the record to include in the analysis.
Recently the NRFA has developed an API
which allows for a programmatic interaction with their datasets: the
information about annual maxima, catchment descriptors and peaks over
threshold can also be retrieved using this API. Beside the information
on extremes for flood frequency estimation the NRFA maintains and
distributes daily river flow records and several other river flow
related variables, such as catchment averaged rainfall: the rnrfa
package allows one to retrieve these information and more with its
rnrfa::gdf
and rnrfa::get_ts
functions (see
more on this at the end of the vignette). The winfapReader
package focuses only on handling river flow extremes information and has
two sets of functions:
the read_amax
, read_cd3
and
read_pot
functions read the information from the
.AM
, .CD3
and .PT
files once
these have been downloaded into a local folder
the get_amax
, get_cd
and
get_pot
functions get the information from the API: these
functions therefore only work when an internet connection is
available
It is difficult to showcase the use of the read_*
functions since these rely on the location of the WINFAP files within
the users’ working environment. Only the use of the get_*
function will be showcased below. For the annual maxima and peaks over
threshold the two sets of functions give the same output.
library(winfapReader)
### the get_* functions only works once you are connected to the internet
### they also need one to have the library httr installed
### verify if you have the library with (!requireNamespace("httr", quietly = TRUE))
### if FALSE install it with
### install.packages("httr")
get_amax
function
The get_amax
function allows one to obtain information
on annual maxima from the NRFA. The read_amax
function will
produce the same output as the get_amax
function but is
based on the locally saved files.
if(curl::has_internet()) amaxEx <- get_amax(c(42003,72014))
names(amaxEx); class(amaxEx)
#> [1] "42003" "72014"
#> [1] "list"
# let's look at only one of these
a42003 <- amaxEx[["42003"]]
## what is the output
head(a42003)
#> Station WaterYear Date Flow Stage Rejected
#> 1 42003 1975 1976-09-30 4.16 -9999 TRUE
#> 2 42003 1976 1977-02-10 13.60 -9999 TRUE
#> 3 42003 1977 1977-12-10 14.90 -9999 TRUE
#> 4 42003 1978 1978-12-10 11.60 -9999 TRUE
#> 5 42003 1979 1979-12-27 9.92 -9999 TRUE
#> 6 42003 1980 1980-10-15 11.30 -9999 TRUE
For each station the function outputs a data.frame
with
information on the station number, the water year, the date in which the
highest flow in the water year was recorded, the river flow value and
the river stage value (when available) for all annual maxima recorded at
a station. Moreover it gives the information on whether the NRFA has
deemed the maximum in a given year to be reliable or whether this has
been rejected. The function can query the API for more than one station
at the time: in that case the output is a named list with each element
corresponding to a station id.
get_pot
function
The get_pot
function allows one to obtain information on
peaks over threshold data from the NRFA. The read_pot
function will produce the same output as the get_pot
function but is based on the locally saved files.
if(curl::has_internet()) potEx <- get_pot(c(42003,72014))
names(potEx); class(potEx)
#> [1] "42003" "72014"
#> [1] "list"
# let's look at only one of these
p42003 <- potEx[["42003"]]
## what is the output
class(p42003); names(p42003)
#> [1] "list"
#> [1] "tablePOT" "WaterYearInfo" "dateRange"
For each station the function outputs a list with three elements:
tablePOT
: a data.frame
with all the
recorded exceedances above the threshold in the NRFA record. In
particular information on the exceedance date, water year, peak flow and
river stage are given.
head(p42003$tablePOT)
#> Station Date WaterYear Flow Stage
#> 1 42003 1982-10-14 1982 17.0 1.381
#> 2 42003 1982-10-22 1982 19.4 1.486
#> 3 42003 1982-11-24 1982 15.5 1.314
#> 4 42003 1982-12-09 1982 20.2 1.516
#> 5 42003 1985-01-20 1984 23.6 1.656
#> 6 42003 1986-11-21 1986 14.8 1.283
## notice: several events in the 1982 no events in 1983
WaterYearInfo
: a data.frame
with
information on the percentage of valid record in each water year in the
record. The potPercComplete column is derived by calculating the
percentage of days which are not included in the POT Gaps or the POT
rejected headers in the NRFA .PT files. The column potThreshold gives
the information of the flow threshold used to extract the peaks for the
station: this is a constant for each station.
head(p42003$WaterYearInfo)
#> WaterYear potPercComplete potThreshold
#> 1 1982 0 14.497
#> 2 1983 0 14.497
#> 3 1984 0 14.497
#> 4 1985 0 14.497
#> 5 1986 0 14.497
#> 6 1987 0 14.497
dateRange
gives the range of dates spanned by the POT
record. This range might be wider than the range of the dates in the
tablePOT
table since it records the period in which the
station was operational and no threshold exceedances occurred.
(p42003$dateRange)
#> [1] "1982-10-13" "2021-10-22"
The function has an argument getAmax
which defaults to
FALSE
. If getAmax = TRUE
then information on
the annual maxima is included in the WaterYearInfo
table.
p42003withAmax <- get_pot(42003, getAmax = TRUE)
head(p42003withAmax$WaterYearInfo, 10)
#> Station WaterYear amaxDate amaxFlow amaxStage amaxRejected potPercComplete
#> 1 42003 1975 1976-09-30 4.16 -9999.000 TRUE NA
#> 2 42003 1976 1977-02-10 13.60 -9999.000 TRUE NA
#> 3 42003 1977 1977-12-10 14.90 -9999.000 TRUE NA
#> 4 42003 1978 1978-12-10 11.60 -9999.000 TRUE NA
#> 5 42003 1979 1979-12-27 9.92 -9999.000 TRUE NA
#> 6 42003 1980 1980-10-15 11.30 -9999.000 TRUE NA
#> 7 42003 1981 1981-12-14 8.32 -9999.000 TRUE NA
#> 8 42003 1982 1982-12-09 20.20 1.516 TRUE 0
#> 9 42003 1983 1983-12-22 11.70 1.134 TRUE 0
#> 10 42003 1984 1985-01-20 23.60 1.656 TRUE 0
#> potThreshold
#> 1 NA
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 NA
#> 7 NA
#> 8 14.497
#> 9 14.497
#> 10 14.497
Notice that in the period when no POT records are available all POT
related information are set to NA. On the other hand, the fact that the
annual maximum in water year 1983 is below the threshold confirms that
the fact that no POT record are present for that water year is related
to low flows throughout the water year rather than a mistake in the POT
record. Notice also that for several of the first years in the record
the annual maxima values are rejected and the proportion of valid POT
records (as shown by potPercComplete
) is null: the early
part of the record for this station has been deemed by the NRFA to be
unreliable and any analysis of this flow record should probably discard
the information till water year 1995.
get_cd
function
The get_cd
function allows the user to obtain
information on the station (for example its location) and on the
catchment upstream the station itself (for example the catchment area
and the annual mean altitude for the catchment). More detail on several
of the catchment descriptors can be found on the NRFA website and in the
FEH. The function gives a slightly different set of information than the
read_cd3
function, due to the difference in information
made available by the NRFA API.
if(curl::has_internet()) cdEx <- get_cd(c(42003,72014))
names(cdEx); class(cdEx)
#> [1] "42003" "72014"
#> [1] "list"
# let's look at only one of these
c42003 <- cdEx[["42003"]]
## what is the output
class(c42003); names(c42003)
#> [1] "data.frame"
#> [1] "id" "river" "location"
#> [4] "easting" "northing" "latitude"
#> [7] "longitude" "feh-pooling" "feh-qmed"
#> [10] "feh-neither" "benchmark" "propwet"
#> [13] "bfihost" "farl" "dpsbar"
#> [16] "sprhost" "rmed-1d" "rmed-2d"
#> [19] "rmed-1h" "ldp" "dplbar"
#> [22] "altbar" "aspbar" "aspvar"
#> [25] "ihdtm-height" "ihdtm-catchment-area" "hydrometric-area"
#> [28] "qmed"
The function has an argument fields
which governs the
amount of information obtained from the API. If
fields = "feh"
(the default) only the basic information
used in the FEH methods is output. If fields="all"
a
data.frame with 104 columns is output. This contains several information
about the station and the catchment, including data availability, land
cover information and much more.
if(curl::has_internet()) cd42003all <- get_cd(42003, fields = "all")
names(cd42003all)
#> [1] "id" "name"
#> [3] "catchment-area" "river"
#> [5] "location" "station-level"
#> [7] "measuring-authority-id" "measuring-authority-station-id"
#> [9] "hydrometric-area" "opened"
#> [11] "closed" "station-type"
#> [13] "bankfull-flow" "structurefull-flow"
#> [15] "sensitivity" "nrfa-mean-flow"
#> [17] "nrfa-peak-flow" "feh-pooling"
#> [19] "feh-qmed" "feh-neither"
#> [21] "nhmp" "benchmark"
#> [23] "live-data" "factors-affecting-runoff"
#> [25] "gdf-start-date" "gdf-end-date"
#> [27] "gdf-mean-flow" "gdf-min-flow"
#> [29] "gdf-first-date-of-min" "gdf-last-date-of-min"
#> [31] "gdf-max-flow" "gdf-first-date-of-max"
#> [33] "gdf-last-date-of-max" "gdf-q95-flow"
#> [35] "gdf-q70-flow" "gdf-q50-flow"
#> [37] "gdf-q10-flow" "gdf-q05-flow"
#> [39] "gdf-base-flow-index" "gdf-day-count"
#> [41] "gdf-flow-count" "gdf-percent-complete"
#> [43] "peak-flow-start-date" "peak-flow-end-date"
#> [45] "qmed" "minimum-altitude"
#> [47] "10-percentile-altitude" "50-percentile-altitude"
#> [49] "90-percentile-altitude" "maximum-altitude"
#> [51] "saar-1941-1970" "saar-1961-1990"
#> [53] "lcm2000-woodland" "lcm2000-arable-horticultural"
#> [55] "lcm2000-grassland" "lcm2000-mountain-heath-bog"
#> [57] "lcm2000-urban" "lcm2007-woodland"
#> [59] "lcm2007-arable-horticultural" "lcm2007-grassland"
#> [61] "lcm2007-mountain-heath-bog" "lcm2007-urban"
#> [63] "high-perm-bedrock" "moderate-perm-bedrock"
#> [65] "low-perm-bedrock" "mixed-perm-bedrock"
#> [67] "high-perm-superficial" "low-perm-superficial"
#> [69] "mixed-perm-superficial" "propwet"
#> [71] "bfihost" "farl"
#> [73] "dpsbar" "sprhost"
#> [75] "rmed-1d" "rmed-2d"
#> [77] "rmed-1h" "ldp"
#> [79] "dplbar" "altbar"
#> [81] "aspbar" "aspvar"
#> [83] "ihdtm-height" "ihdtm-catchment-area"
#> [85] "mean-flood-plain-depth" "mean-flood-plain-location"
#> [87] "mean-flood-plain-extent" "urbext-1990"
#> [89] "urbconc-1990" "urbloc-1990"
#> [91] "urbext-2000" "urbconc-2000"
#> [93] "urbloc-2000" "easting"
#> [95] "northing" "latitude"
#> [97] "longitude" "grid-reference.ngr"
#> [99] "grid-reference.easting" "grid-reference.northing"
#> [101] "lat-long.string" "lat-long.latitude"
#> [103] "lat-long.longitude" "peak-flow-rejected-amax-years"
winfapReader
and rnrfa
packages
The rnrfa
package provides a unique way to query several
types of data from the NRFA. Information about extremes can also be
retrieved using the rnrfa
package, although there are some
differences in the output provided when the data of interest are the
peaks over threshold records.
The rnrfa::catalogue
function allows one to pull the
list of stations (and related metadata), falling within a given bounding
box. The metadata retrieved by the function are similar to the ones
derived from winfapReader::get_cd
. This function can be
used to identify the stations in an area for which peak flow information
can be obtained with winfapReader
. The code below for
example identifies stations surrounding the city of Lancaster and then
displays the annual maxima flow with red lines indicating Rejected flow
values. Notice that you would need to have the rnrfa
package installed for the code below to work.
## Lancaster coordinates: 54.04, -2.8
## let's look around the city
rivLanc <- rnrfa::catalogue(bbox = list(lat_min = 54.04-0.2, lat_max = 54.04+0.2,
lon_min = -2.8-0.2, lon_max = -2.8+0.2))
### let's select stations which have been deemed to be suitable for pooling
### that's the highest quality flag for annual maxima
table(rivLanc[,"feh-pooling"]) ### 5 stations are suitable for pooling
rivLanc <- subset(rivLanc,subset = as.vector(rivLanc[,"feh-pooling",drop=TRUE]))
rivLanc[,1:3]
### notice that rnrfa outputs a tibble and not a data.frame
idLanc <- rivLanc[,"id",drop=TRUE] ## a vector of ids
amaxLanc <- winfapReader::get_amax(idLanc)
names(amaxLanc)
Now display the stations all together in a panel.
par(mfrow=c(2,3))
invisible(
sapply(amaxLanc,
function(x) with(x,plot(WaterYear,Flow,
type="h",col=ifelse(Rejected,2,4),
main = unique(Station)))))
The large events which have hit the area in 2015 can be seen in the flow series plots.
The rnrfa
package also allows to pull the annual maximum
flow recorded at any station. To also obtain the information about the
water year which the NRFA has deemed to be of poor quality and therefore
rejected set the full_info
argument to
TRUE
.
par(mfrow=c(1,1))
### the annual maxima for 72014 from rnrfa
maxflow72014 <- rnrfa::get_ts(72014, type = "amax-flow", full_info = TRUE)
### the annual maxima for 72014 from winfapReader
xx <- amaxLanc[["72014"]][,c("Date","Flow","Rejected")]
plot(xx[,"Flow"], maxflow72014[,"amax-flow"],
xlab = "data from winfapReader", ylab = "data from rnrfa"); abline(0,1)
### same information
which(xx$Rejected) ## but two years should be rejected
which(maxflow72014$rejected == 1) ## same two years
To obtain the POT records in rnrfa
use
type = "pot-flow"
: using the full_info = TRUE
option ensures that a rejected flag is given for the periods in which
the POT records have been found to be unreliable or missing (see the
NRFA website for more details on this). The rejected
flag
is built using the same information used to build the
WaterYearInfo
table in the
winfapReader::get_pot
function. The additional information
provided in WaterYearInfo
is useful to identify the years
in which no POT record is found because the records are
missing/unreliable and not because the threshold was never exceeded.
par(mfrow=c(1,1))
# the pot records for 75001 from rnrfa
pot75001 <- rnrfa::get_ts(75001, type = "pot-flow", full_info = TRUE)
pot75001[9:12,]
# using winfapReader
p75001 <- get_pot(75001)
p75001$tablePOT[9:12,]
# the same peaks are identified
p75001$WaterYearInfo[1:5,] ### but notice that 1975 had a low proportion missing records
# the lack of data in 1975 is due to all flow being low
The two packages can be used together to retrieve different type of information about river flow: in the example below daily gauged flow for the Conder at Galgate (station 72014) is displayed together with annual maxima (which are extracted from the instantaneous river flow). The latter are typically larger and can be seen to start further in the past than the daily flow data.
### get daily data from NRFA
daily72014 <- rnrfa::get_ts(72014, type = "gdf")
## make daily data into data.frame
daily72014 <- data.frame(Day = zoo::index(daily72014),
DFlow = as.vector(daily72014))
plot(xx[,c("Date","Flow")], col = ifelse(xx$Rejected, 2, 4),
pch = 4, ylim =c(0,1.05*max(xx$Flow)))
title(main = "The Conder at Galgate")
points(daily72014, type="l")