25 Tracking data: Convert files into format of Seabird Tracking Database
Analyses outlined in this chapter were performed in R version 4.3.2 (2023-10-31 ucrt)
This chapter was last updated on 2024-02-23
25.1 What this chapter covers:
- Convert merged data into the INPUT and OUTPUT format of the Seabird Tracking Database: https://www.seabirdtracking.org/.
NOTE: if your dataset is already hosted on the Seabird Tracking Database, you can download it from there directly and skip the steps below.
25.2 Where you can get example data for the chapter:
This tutorial uses example data from a project led by the BirdLife International partner in Croatia: BIOM
The citation for this data is: Zec et al. 2023
Example data is available upon request
A description of the example data is given in a separate chapter
25.3 Load packages
Load required R packages for use with codes in this chapter:
If the package(s) fails to load, you will need to install the relevant package(s).
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Load libraries --------------------------------------------------------------
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## sf package for spatial data analyses (i.e. vector files such as points, lines, polygons)
library(sf)
## Tidyverse for data manipulation
library(tidyverse)
## ggplot2 for plotting opionts
library(ggplot2)
## rnaturalearth package for geographic basemaps in R
library(rnaturalearth)
## leaflet package for interactive maps in R
library(leaflet)
## lubridate for date time
library(lubridate)
## track2kba for the analysis of important site identification
library(track2KBA)
## speed filter
library(trip)
## linear interpolation
library(adehabitatLT)
##
library(raster)
##
library(viridis)
##
library(readxl)
library(xlsx)
##
library(parsedate)
25.4 Define object names for chapter
Typically, if your data follows the same format as the examples in the chapter, then below should be the only thing(s) you need to change.
25.5 Load file created from previous chapter
## Read the csv file of all the merged data and name the object the same as in the relevant chapter
track.df <- read.csv("./data-testing/tracking-data/Puffinus-yelkouan-Z-tracking-raw-merged.csv")
25.6 Seabird Tracking Database format
Having data standardised into the same format greatly improves reproducible research, and also the ability for data to be used in other studies.
The primary format we recommend is that of BirdLife International’s Seabird Tracking Database: - https://www.seabirdtracking.org/
When you visit the Seabird Tracking Database website, you will find the Instructions page, which:
provides information about the standardised format of data used in this global database, and
provides a dataset template that you can use to support formatting your data into the format used within the Seabird Tracking Database.
For further details:
See the data submission instructions.
Download the dataset template from the Instructions page
We recognise, however, that the format of the Seabird Tracking Database may not be appropriate for all analyses. Nevertheless, we encourage users to standardise their data into a common format. This will facilitate the ease through which data can be reformatted when necessary for other analyses.
25.7 Loading the dataset template from Seabird Tracking Database
By loading the dataset template from the Seabird Tracking Database, this can facilitate what the format of your data should adhere to, to support analyses outlined in the Marine Toolkit.
stdb.df.template <- read_xlsx("./data-testing/tracking-data-stdb/Template_Datapoints.xlsx",
sheet = "Template")
head(data.frame(stdb.df.template))
## [1] BirdId Sex Age Breed.Stage TrackId
## [6] DateGMT TimeGMT Latitude Longitude Equinox
## [11] ArgosQuality
## <0 rows> (or 0-length row.names)
## [1] 0 11
You can see that the dataset template from the Seabird Tracking Database contains 11 columns. How you format your data to match that of the Seabird Tracking Database will be dependent on the type of animal tracking device used (i.e. GPS, PTT, GLS)
SEE: within the Seabird Tracking Database template, the different example datasets for GPS, PTT, and GLS data.
We will update the appendix to include examples of preparing data from different device types for the format of the Seabird Tracking Database.
25.8 INPUT format: Seabird Tracking Database
25.8.1 Formatting own data to align with Seabird Tracking Database INPUT format
Typically, you will need three things to get data ready for the STDB:
“Dataset” level information
Metadata information
“Data point” level information
NOTE: Information stored in the STDB is organized into two levels – “Dataset” level and “Data points” level. The “dataset level” are provided by filling in an online form, and the “data points” level of information is submitted by uploading a csv file.
- Review the data submission instructions indicated above for further guidance.
25.8.1.1 “Dataset” level information required for STDB
The “dataset level” information provides the broad background information about your dataset required for uploading the dataset to the STDB. It is the key information about any set of data that were collected for a given species, in a given colony, with a given type of device, and where data are owned by the same group of contributors.
25.8.1.2 Metadata information
A metadata file is a set of data that describes and gives information about other data.
In this tutorial we provide an example metadata file which helps describe the data associated with the tracking information for each individual bird.
Users can adapt this example template as required.
NOTE: To assist upload of your data to the Seabird Tracking Database, please ensure that entries in your relevant fields match the format used in the STDB.
- Review the data submission instructions indicated above for further guidance.
## Load the relevant metadata file - reading xlsx files can be fiddly - may require different packages
#df_meta <- read_xlsx("./data-testing/tracking-data/Puffinus-yelkouan-metadata/PUFYEL-Z-Metadata.xlsx",
# sheet = "Sheet1")
df_meta <- read.xlsx(file="./data-testing/tracking-data/Puffinus-yelkouan-metadata/PUFYEL-Z-Metadata.xlsx",
sheetName="Sheet1")
## View the contents of the metadata file
head(data.frame(df_meta),2)
## data_co.owners dataset_name
## 1 BIOM-Croatia, name surname PUFYEL-Z-GPS-Chick-rearing-2019-2020
## 2 BIOM-Croatia, name surname PUFYEL-Z-GPS-Chick-rearing-2019-2020
## species common_name bird_id site_location
## 1 Puffinus yelkouan Yelkouan Shearwater 19_Tag17600_Z-9 Lastovo SPA
## 2 Puffinus yelkouan Yelkouan Shearwater 19_Tag17604_Z-7 Lastovo SPA
## colony_code colony_latitude colony_longitude sex age_deployment_start
## 1 Z 42.774893 16.875649 unknown adult
## 2 Z 42.774893 16.875649 unknown adult
## breed_stage_deployment_start tracking_device_type
## 1 chick-rearing GPS
## 2 chick-rearing GPS
## tracking_device_model tracking_data_interpolation
## 1 PathTrack nanoFix GPS/UHF transmitters no
## 2 PathTrack nanoFix GPS/UHF transmitters no
## device_deployment_date_UTC device_deployment_time_UTC
## 1 dd/mm/yyyy hh:mm:ss AM/PM
## 2 dd/mm/yyyy hh:mm:ss AM/PM
## device_retrieval_rate_UTC device_retrieval_time_UTC deployment_mass_kg
## 1 dd/mm/yyyy hh:mm:ss AM/PM 0.41
## 2 dd/mm/yyyy hh:mm:ss AM/PM 0.41
## retrieval_mass_kg other_1 other_2 aux_file_1 aux_file_2 notes
## 1 0.42 bloods isotopes images tdr any relevant details
## 2 0.42 bloods isotopes images tdr any relevant details
Check how many distinct entries you have. This number should match that of the number of unique birds tagged as part of this relevant dataset.
## [1] 34
25.8.1.3 Obtain point data file for “Data points” level information required for STDB
First remind yourself of what the template STDB format looks like:
## [1] BirdId Sex Age Breed.Stage TrackId
## [6] DateGMT TimeGMT Latitude Longitude Equinox
## [11] ArgosQuality
## <0 rows> (or 0-length row.names)
And also consider what your data looks like:
## day month year hour minute second satellites latitude longitude altitude
## 1 24 5 19 0 49 9 5 42.811528 16.885531 -1.50
## 2 24 5 19 1 9 3 5 42.812029 16.886907 5.25
## time_offset accuracy voltage colony_code bird_id
## 1 2.910 4.70376e-07 4.12 Z 19_Tag17600_Z-9
## 2 2.795 9.25368e-07 4.08 Z 19_Tag17600_Z-9
## dttm deploy_year
## 1 2019-05-24 00:49:09 2019
## 2 2019-05-24 01:09:03 2019
Then align own data with input format required by STDB. Do this by using your supporting metadata.
REMINDER: Within the Seabird Tracking Database template, you should recognise the options available to specify arguments within some fields. e.g. When specifying Age, this can only be specified as: adult, immature, juvenile fledgling, OR, unknown. Inputs for fields are case sensitive.
The below code primarily is based around three functions in R:
select
mutate
relocate
Understand how these work if needed.
NOTE: You will need to change the names of your columns if they differ to the example data below.
25.8.1.3.1 Timestamp column
A common problem many people encounter when learning to analyse animal tracking data is dealing with the column that relates to the timestamp of the tracking device.
Typically, this information will be stored as a date and time - in a single column - from tracking devices.
Timestamps typically need to be in the format of a POSIXct object.
The parse_date
function from the parsedate
package, attempts to provide a simple option for standardising timestamp data which can come in multiple different formats.
Understand and review the requirements of timestampe data for processing animal tracking data if required.
## Use the parse_date function to try and standardise a timestamp column
#str(track.df$dttm)
track.df$dttm <- parse_date(track.df$dttm)
#str(track.df$dttm)
Continue with preparing data
## First, select relevant columns of information from your existing datapoint information needed to match the STDB format.
df_stdb <- track.df %>% dplyr::select(bird_id,
dttm,
latitude,
longitude)
## Then, modify and create relevant columns of information - where you have these in your data - to align with STDB format.
## the mutate functions allows you to add a new column of information.
## add the new columns and rename the object to a more standardised name.
df_stdb <- df_stdb %>% dplyr::mutate(BirdId = bird_id,
TrackId = bird_id,
DateGMT = date(dttm),
TimeGMT = format(df_stdb$dttm, format = "%H:%M:%S"),
Latitude = latitude,
Longitude = longitude,
Equinox = NA,
ArgosQuality = NA) %>%
## remove the original columns (note the minus sign "-" in front of each column name you are removing)
dplyr::select(-bird_id,
-dttm,
-latitude,
-longitude)
## Now GET THE relevant metadata information for your tracking data "datapoints" information
## Ensure that your link column has the same name (in this case: it is the BirdID column above)
df_meta_points <- df_meta %>%
## select the relevant columns
dplyr::select(bird_id, sex, age_deployment_start, breed_stage_deployment_start) %>%
## rename the columns if need be to match the format of the STDB
rename(BirdId = bird_id,
Sex = sex,
Age = age_deployment_start,
Breed.Stage = breed_stage_deployment_start)
## Now bind the relevant metadata onto your datapoints data
#head(df_stdb,2)
df_stdb <- left_join(df_stdb, df_meta_points, by = "BirdId")
## review the bind worked
#head(df_stdb,2)
## Reorder the column names to match the format of the STDB
df_stdb <- df_stdb %>%
relocate(BirdId,
Sex,
Age,
Breed.Stage,
TrackId,
DateGMT,
TimeGMT,
Latitude,
Longitude,
Equinox,
ArgosQuality)
25.8.1.4 Review of the INPUT format for the Seabird Tracking Database
## BirdId Sex Age Breed.Stage TrackId DateGMT
## 1 19_Tag17600_Z-9 unknown adult chick-rearing 19_Tag17600_Z-9 2019-05-24
## 2 19_Tag17600_Z-9 unknown adult chick-rearing 19_Tag17600_Z-9 2019-05-24
## TimeGMT Latitude Longitude Equinox ArgosQuality
## 1 00:49:09 42.811528 16.885531 NA NA
## 2 01:09:03 42.812029 16.886907 NA NA
## [1] BirdId Sex Age Breed.Stage TrackId
## [6] DateGMT TimeGMT Latitude Longitude Equinox
## [11] ArgosQuality
## <0 rows> (or 0-length row.names)
For the columns highlighted above, you may notice a few things:
BirdId, and TrackId, are specified with the same code. This is because when data is formatted to align with the format of the STDB:
- we have a code that relates to the bird that was tracked (BirdId)
- we have a unique code that relates to each trip undertaken by the bird, when multiple trips are recorded (TrackId). However, it is often the case that users do not provide data which has been pre-split into unique trips. Therefore, it is often the case that all entries relating to TrackId match that of BirdId
Equinox and ArgosQuality are both specified as NA. This is because our data relates to GPS data which does not have an ArgosQuality estimate (typical of PTT devices) or a measure relating to the Equinox (typical of GLS devices).
- see the Seabird Tracking Database data template for examples of how to specify Equinox and ArgosQuality when necessary.
25.8.2 INPUT STDB format: saving
You should now have a key file:
A single file (a data frame called df_stdb) with all your data standardised into a common format.
The common format of your data should reflect that of the INPUT files associated with uploading data to the Seabird Tracking Database.
PLEASE NOTE: While it is not mandatory to upload your data to the Seabird Tracking Database to perform analyses outlined in this toolkit, we greatly encourage users to do so given the many benefits of curating data in centralised repositories.
25.9 OUTPUT format: Seabird Tracking Database
Formatting data to align with the input format of the seabird tracking database supports your ability to curate your data in a secure online repository.
Typically though, the data file one might use for analysis, will reflect the output format of data from the seabird tracking database.
Here, instead of requiring users to upload data and then download again, we provide code to convert data from the input format of the seabird tracking database to the output format.
25.9.1 Load STDB output template
Load and view the structure of the data according to the output format of the STDB.
## Load the template
stdb.df.template.output <- read.csv("./data-testing/tracking-data-stdb/Template_Datapoints_Output_Format.csv")
## View the column names of the template
head(stdb.df.template.output)
## [1] dataset_id scientific_name common_name site_name
## [5] colony_name lat_colony lon_colony device
## [9] bird_id track_id original_track_id age
## [13] sex breed_stage breed_status date_gmt
## [17] time_gmt latitude longitude argos_quality
## [21] equinox
## <0 rows> (or 0-length row.names)
25.9.2 Convert data to STDB output template
Converting data to the output format of the STDB in this tutorial requires two things:
A metadata file aligned to the format provided in the example earlier
A single data frame matching the input format for the STDB (as created above)
## First, convert the basis of the input format to the output format.
## Essentially, you are just changing columns names here to match the output format
df_stdb_output <- df_stdb %>%
dplyr::select(bird_id = BirdId,
sex = Sex,
age = Age,
breed_stage = Breed.Stage,
track_id = TrackId,
date_gmt = DateGMT,
time_gmt = TimeGMT,
latitude = Latitude,
longitude = Longitude,
equinox = Equinox,
argos_quality = ArgosQuality)
## Second, get the relevant metadata from your metadata file
## Here you are selecting the key metadata, and renaming columns accordingly
## If your columns names differ, you will need to change the relevant inputs here.
df_meta_output <- df_meta %>%
dplyr::select(bird_id = bird_id,
scientific_name = species,
common_name = common_name,
site_name = site_location,
colony_name = colony_code,
lat_colony = colony_latitude,
lon_colony = colony_longitude,
device = tracking_device_type)
## Third, some columns for the STDB output are populated automatically when uploading data
## Here we create the necessary columns of STDB metadata for the purpose of the tutorial, but we populate
## the columns with dummy data only.
df_meta_output <- df_meta_output %>%
mutate(dataset_id = "populated-upon-upload-STDB",
original_track_id = "populated-upon-upload-STDB",
breed_status = "populated-upon-upload-STDB")
## Next, we bind the relevant metadata onto the overall tracking data dataframe
df_stdb_output <- left_join(df_stdb_output, df_meta_output, by = "bird_id")
## Finally, we reorder the columns to match the output format of the STDB
df_stdb_output <- df_stdb_output %>% relocate(colnames(stdb.df.template.output))
## review and compare the column names and order between your data and STDB output example
data.frame(stdb.output.example = colnames(stdb.df.template.output),
data.example = colnames(df_stdb_output))
## stdb.output.example data.example
## 1 dataset_id dataset_id
## 2 scientific_name scientific_name
## 3 common_name common_name
## 4 site_name site_name
## 5 colony_name colony_name
## 6 lat_colony lat_colony
## 7 lon_colony lon_colony
## 8 device device
## 9 bird_id bird_id
## 10 track_id track_id
## 11 original_track_id original_track_id
## 12 age age
## 13 sex sex
## 14 breed_stage breed_stage
## 15 breed_status breed_status
## 16 date_gmt date_gmt
## 17 time_gmt time_gmt
## 18 latitude latitude
## 19 longitude longitude
## 20 argos_quality argos_quality
## 21 equinox equinox
25.9.3 OUTPUT STDB format: saving
You should now have another key file:
df_stdb_output
The format of this file matches that of the output format of the STDB. i.e. the format of the data when you download it from the STDB.
NOTE: if your dataset is already hosted on the Seabird Tracking Database, you can download it from there directly and skip the steps above.