25 Tracking data: Convert files into format of Seabird Tracking Database

Analyses outlined in this chapter were performed in R version 4.3.2 (2023-10-31 ucrt)

This chapter was last updated on 2024-02-23

25.1 What this chapter covers:

Convert merged data into the INPUT and OUTPUT format of the Seabird Tracking Database: https://www.seabirdtracking.org/.

NOTE: if your dataset is already hosted on the Seabird Tracking Database, you can download it from there directly and skip the steps below.

25.2 Where you can get example data for the chapter:

This tutorial uses example data from a project led by the BirdLife International partner in Croatia: BIOM

The citation for this data is: Zec et al. 2023
Example data is available upon request
A description of the example data is given in a separate chapter

25.3 Load packages

Load required R packages for use with codes in this chapter:

If the package(s) fails to load, you will need to install the relevant package(s).

## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Load libraries --------------------------------------------------------------
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## sf package for spatial data analyses (i.e. vector files such as points, lines, polygons)
library(sf)
## Tidyverse for data manipulation
library(tidyverse)
## ggplot2 for plotting opionts
library(ggplot2)
## rnaturalearth package for geographic basemaps in R
library(rnaturalearth)
## leaflet package for interactive maps in R
library(leaflet)
## lubridate for date time
library(lubridate)
## track2kba for the analysis of important site identification
library(track2KBA)
## speed filter
library(trip)
## linear interpolation
library(adehabitatLT)
##
library(raster)
##
library(viridis)
##
library(readxl)
library(xlsx)
##
library(parsedate)

25.4 Define object names for chapter

Typically, if your data follows the same format as the examples in the chapter, then below should be the only thing(s) you need to change.

## Define your species name (avoid spaces by using hashes instead. This can help with later coding steps)
species.name <- "Puffinus-yelkouan"

## Define your colony name
colony.name <- "Z"

25.5 Load file created from previous chapter

## Read the csv file of all the merged data and name the object the same as in the relevant chapter
track.df <- read.csv("./data-testing/tracking-data/Puffinus-yelkouan-Z-tracking-raw-merged.csv")

25.6 Seabird Tracking Database format

Having data standardised into the same format greatly improves reproducible research, and also the ability for data to be used in other studies.

The primary format we recommend is that of BirdLife International’s Seabird Tracking Database: - https://www.seabirdtracking.org/

When you visit the Seabird Tracking Database website, you will find the Instructions page, which:

provides information about the standardised format of data used in this global database, and
provides a dataset template that you can use to support formatting your data into the format used within the Seabird Tracking Database.

For further details:

See the data submission instructions.
Download the dataset template from the Instructions page

We recognise, however, that the format of the Seabird Tracking Database may not be appropriate for all analyses. Nevertheless, we encourage users to standardise their data into a common format. This will facilitate the ease through which data can be reformatted when necessary for other analyses.

25.7 Loading the dataset template from Seabird Tracking Database

By loading the dataset template from the Seabird Tracking Database, this can facilitate what the format of your data should adhere to, to support analyses outlined in the Marine Toolkit.

stdb.df.template <- read_xlsx("./data-testing/tracking-data-stdb/Template_Datapoints.xlsx",
                     sheet = "Template")

head(data.frame(stdb.df.template))

##  [1] BirdId       Sex          Age          Breed.Stage  TrackId     
##  [6] DateGMT      TimeGMT      Latitude     Longitude    Equinox     
## [11] ArgosQuality
## <0 rows> (or 0-length row.names)

## dimensions of the seabird tracking database data template
dim(stdb.df.template)

## [1]  0 11

You can see that the dataset template from the Seabird Tracking Database contains 11 columns. How you format your data to match that of the Seabird Tracking Database will be dependent on the type of animal tracking device used (i.e. GPS, PTT, GLS)

SEE: within the Seabird Tracking Database template, the different example datasets for GPS, PTT, and GLS data.

We will update the appendix to include examples of preparing data from different device types for the format of the Seabird Tracking Database.

25.8 INPUT format: Seabird Tracking Database

25.8.1 Formatting own data to align with Seabird Tracking Database INPUT format

Typically, you will need three things to get data ready for the STDB:

“Dataset” level information
Metadata information
“Data point” level information

NOTE: Information stored in the STDB is organized into two levels – “Dataset” level and “Data points” level. The “dataset level” are provided by filling in an online form, and the “data points” level of information is submitted by uploading a csv file.

Review the data submission instructions indicated above for further guidance.

25.8.1.1 “Dataset” level information required for STDB

The “dataset level” information provides the broad background information about your dataset required for uploading the dataset to the STDB. It is the key information about any set of data that were collected for a given species, in a given colony, with a given type of device, and where data are owned by the same group of contributors.

25.8.1.2 Metadata information

A metadata file is a set of data that describes and gives information about other data.

In this tutorial we provide an example metadata file which helps describe the data associated with the tracking information for each individual bird.

Users can adapt this example template as required.

NOTE: To assist upload of your data to the Seabird Tracking Database, please ensure that entries in your relevant fields match the format used in the STDB.

Review the data submission instructions indicated above for further guidance.

## Load the relevant metadata file - reading xlsx files can be fiddly - may require different packages
#df_meta <- read_xlsx("./data-testing/tracking-data/Puffinus-yelkouan-metadata/PUFYEL-Z-Metadata.xlsx",
#                     sheet = "Sheet1")

df_meta <- read.xlsx(file="./data-testing/tracking-data/Puffinus-yelkouan-metadata/PUFYEL-Z-Metadata.xlsx", 
                     sheetName="Sheet1") 

## View the contents of the metadata file
head(data.frame(df_meta),2)

##               data_co.owners                         dataset_name
## 1 BIOM-Croatia, name surname PUFYEL-Z-GPS-Chick-rearing-2019-2020
## 2 BIOM-Croatia, name surname PUFYEL-Z-GPS-Chick-rearing-2019-2020
##             species         common_name         bird_id site_location
## 1 Puffinus yelkouan Yelkouan Shearwater 19_Tag17600_Z-9   Lastovo SPA
## 2 Puffinus yelkouan Yelkouan Shearwater 19_Tag17604_Z-7   Lastovo SPA
##   colony_code colony_latitude colony_longitude     sex age_deployment_start
## 1           Z       42.774893        16.875649 unknown                adult
## 2           Z       42.774893        16.875649 unknown                adult
##   breed_stage_deployment_start tracking_device_type
## 1                chick-rearing                  GPS
## 2                chick-rearing                  GPS
##                    tracking_device_model tracking_data_interpolation
## 1 PathTrack nanoFix GPS/UHF transmitters                          no
## 2 PathTrack nanoFix GPS/UHF transmitters                          no
##   device_deployment_date_UTC device_deployment_time_UTC
## 1                 dd/mm/yyyy             hh:mm:ss AM/PM
## 2                 dd/mm/yyyy             hh:mm:ss AM/PM
##   device_retrieval_rate_UTC device_retrieval_time_UTC deployment_mass_kg
## 1                dd/mm/yyyy            hh:mm:ss AM/PM               0.41
## 2                dd/mm/yyyy            hh:mm:ss AM/PM               0.41
##   retrieval_mass_kg other_1  other_2 aux_file_1 aux_file_2                notes
## 1              0.42  bloods isotopes     images        tdr any relevant details
## 2              0.42  bloods isotopes     images        tdr any relevant details

Check how many distinct entries you have. This number should match that of the number of unique birds tagged as part of this relevant dataset.

## Check how many distinct entries you have
nrow(df_meta)

## [1] 34

25.8.1.3 Obtain point data file for “Data points” level information required for STDB

First remind yourself of what the template STDB format looks like:

head(data.frame(stdb.df.template),2)

##  [1] BirdId       Sex          Age          Breed.Stage  TrackId     
##  [6] DateGMT      TimeGMT      Latitude     Longitude    Equinox     
## [11] ArgosQuality
## <0 rows> (or 0-length row.names)

And also consider what your data looks like:

head(track.df,2)

##   day month year hour minute second satellites  latitude longitude altitude
## 1  24     5   19    0     49      9          5 42.811528 16.885531    -1.50
## 2  24     5   19    1      9      3          5 42.812029 16.886907     5.25
##   time_offset    accuracy voltage colony_code         bird_id
## 1       2.910 4.70376e-07    4.12           Z 19_Tag17600_Z-9
## 2       2.795 9.25368e-07    4.08           Z 19_Tag17600_Z-9
##                  dttm deploy_year
## 1 2019-05-24 00:49:09        2019
## 2 2019-05-24 01:09:03        2019

Then align own data with input format required by STDB. Do this by using your supporting metadata.

REMINDER: Within the Seabird Tracking Database template, you should recognise the options available to specify arguments within some fields. e.g. When specifying Age, this can only be specified as: adult, immature, juvenile fledgling, OR, unknown. Inputs for fields are case sensitive.

The below code primarily is based around three functions in R:

select
mutate
relocate

Understand how these work if needed.

NOTE: You will need to change the names of your columns if they differ to the example data below.

25.8.1.3.1 Timestamp column

A common problem many people encounter when learning to analyse animal tracking data is dealing with the column that relates to the timestamp of the tracking device.

Typically, this information will be stored as a date and time - in a single column - from tracking devices.

Timestamps typically need to be in the format of a POSIXct object.

The parse_date function from the parsedate package, attempts to provide a simple option for standardising timestamp data which can come in multiple different formats.

Understand and review the requirements of timestampe data for processing animal tracking data if required.

## Use the parse_date function to try and standardise a timestamp column
#str(track.df$dttm)
track.df$dttm <- parse_date(track.df$dttm)
#str(track.df$dttm)

Continue with preparing data

## First, select relevant columns of information from your existing datapoint information needed to match the STDB format.
df_stdb <- track.df %>% dplyr::select(bird_id,
                                      dttm,
                                      latitude,
                                      longitude) 

## Then, modify and create relevant columns of information - where you have these in your data - to align with STDB format.

## the mutate functions allows you to add a new column of information.
## add the new columns and rename the object to a more standardised name.
df_stdb <- df_stdb %>% dplyr::mutate(BirdId = bird_id,
                                      TrackId = bird_id,
                                      DateGMT = date(dttm),
                                      TimeGMT = format(df_stdb$dttm, format = "%H:%M:%S"),
                                      Latitude = latitude,
                                      Longitude = longitude,
                                      Equinox = NA,
                                      ArgosQuality = NA) %>% 
  ## remove the original columns (note the minus sign "-" in front of each column name you are removing)
  dplyr::select(-bird_id,
                -dttm,
                -latitude,
                -longitude) 


## Now GET THE relevant metadata information for your tracking data "datapoints" information
## Ensure that your link column has the same name (in this case: it is the BirdID column above)
df_meta_points <- df_meta %>% 
  ## select the relevant columns
  dplyr::select(bird_id, sex, age_deployment_start, breed_stage_deployment_start) %>% 
  ## rename the columns if need be to match the format of the STDB
  rename(BirdId = bird_id, 
         Sex = sex, 
         Age = age_deployment_start, 
         Breed.Stage = breed_stage_deployment_start)

## Now bind the relevant metadata onto your datapoints data
#head(df_stdb,2)
df_stdb <- left_join(df_stdb, df_meta_points, by = "BirdId")
## review the bind worked
#head(df_stdb,2)

## Reorder the column names to match the format of the STDB
df_stdb <- df_stdb %>% 
  relocate(BirdId,
            Sex, 
            Age, 
            Breed.Stage,
            TrackId,
            DateGMT,
            TimeGMT,
            Latitude,
            Longitude,
            Equinox,
            ArgosQuality)

25.8.1.4 Review of the INPUT format for the Seabird Tracking Database

## Compare your data, to the STDB data template
head(df_stdb,2)

##            BirdId     Sex   Age   Breed.Stage         TrackId    DateGMT
## 1 19_Tag17600_Z-9 unknown adult chick-rearing 19_Tag17600_Z-9 2019-05-24
## 2 19_Tag17600_Z-9 unknown adult chick-rearing 19_Tag17600_Z-9 2019-05-24
##    TimeGMT  Latitude Longitude Equinox ArgosQuality
## 1 00:49:09 42.811528 16.885531      NA           NA
## 2 01:09:03 42.812029 16.886907      NA           NA

head(data.frame(stdb.df.template))

##  [1] BirdId       Sex          Age          Breed.Stage  TrackId     
##  [6] DateGMT      TimeGMT      Latitude     Longitude    Equinox     
## [11] ArgosQuality
## <0 rows> (or 0-length row.names)

For the columns highlighted above, you may notice a few things:

BirdId, and TrackId, are specified with the same code. This is because when data is formatted to align with the format of the STDB:
- we have a code that relates to the bird that was tracked (BirdId)
- we have a unique code that relates to each trip undertaken by the bird, when multiple trips are recorded (TrackId). However, it is often the case that users do not provide data which has been pre-split into unique trips. Therefore, it is often the case that all entries relating to TrackId match that of BirdId
Equinox and ArgosQuality are both specified as NA. This is because our data relates to GPS data which does not have an ArgosQuality estimate (typical of PTT devices) or a measure relating to the Equinox (typical of GLS devices).
- see the Seabird Tracking Database data template for examples of how to specify Equinox and ArgosQuality when necessary.

25.8.2 INPUT STDB format: saving

You should now have a key file:
- A single file (a data frame called df_stdb) with all your data standardised into a common format.
- The common format of your data should reflect that of the INPUT files associated with uploading data to the Seabird Tracking Database.

PLEASE NOTE: While it is not mandatory to upload your data to the Seabird Tracking Database to perform analyses outlined in this toolkit, we greatly encourage users to do so given the many benefits of curating data in centralised repositories.

## Save the output file
write.csv(df_stdb,
          paste0("./data-testing/tracking-data/",
                 species.name,
                 "-",colony.name,"-tracking-STDB-input.csv"),
          row.names = F)

25.9 OUTPUT format: Seabird Tracking Database

Formatting data to align with the input format of the seabird tracking database supports your ability to curate your data in a secure online repository.

Typically though, the data file one might use for analysis, will reflect the output format of data from the seabird tracking database.

Here, instead of requiring users to upload data and then download again, we provide code to convert data from the input format of the seabird tracking database to the output format.

25.9.1 Load STDB output template

Load and view the structure of the data according to the output format of the STDB.

## Load the template
stdb.df.template.output <- read.csv("./data-testing/tracking-data-stdb/Template_Datapoints_Output_Format.csv")

## View the column names of the template
head(stdb.df.template.output)

##  [1] dataset_id        scientific_name   common_name       site_name        
##  [5] colony_name       lat_colony        lon_colony        device           
##  [9] bird_id           track_id          original_track_id age              
## [13] sex               breed_stage       breed_status      date_gmt         
## [17] time_gmt          latitude          longitude         argos_quality    
## [21] equinox          
## <0 rows> (or 0-length row.names)

25.9.2 Convert data to STDB output template

Converting data to the output format of the STDB in this tutorial requires two things:

A metadata file aligned to the format provided in the example earlier
A single data frame matching the input format for the STDB (as created above)

## First, convert the basis of the input format to the output format.
## Essentially, you are just changing columns names here to match the output format
df_stdb_output <- df_stdb %>% 
  dplyr::select(bird_id = BirdId,
                sex = Sex,
                age = Age,   
                breed_stage = Breed.Stage,
                track_id = TrackId,
                date_gmt = DateGMT,
                time_gmt = TimeGMT,
                latitude = Latitude,
                longitude = Longitude,
                equinox = Equinox,
                argos_quality = ArgosQuality)

## Second, get the relevant metadata from your metadata file
## Here you are selecting the key metadata, and renaming columns accordingly
## If your columns names differ, you will need to change the relevant inputs here.
df_meta_output <- df_meta %>% 
  dplyr::select(bird_id = bird_id,
                scientific_name = species,
                common_name = common_name,
                site_name = site_location,
                colony_name = colony_code,
                lat_colony = colony_latitude,
                lon_colony = colony_longitude,
                device = tracking_device_type)


## Third, some columns for the STDB output are populated automatically when uploading data
## Here we create the necessary columns of STDB metadata for the purpose of the tutorial, but we populate
## the columns with dummy data only.
df_meta_output <- df_meta_output %>% 
  mutate(dataset_id = "populated-upon-upload-STDB",
         original_track_id = "populated-upon-upload-STDB",
         breed_status = "populated-upon-upload-STDB")

## Next, we bind the relevant metadata onto the overall tracking data dataframe
df_stdb_output <- left_join(df_stdb_output, df_meta_output, by = "bird_id")

## Finally, we reorder the columns to match the output format of the STDB
df_stdb_output <- df_stdb_output %>% relocate(colnames(stdb.df.template.output))

## review and compare the column names and order between your data and STDB output example
data.frame(stdb.output.example = colnames(stdb.df.template.output),
           data.example = colnames(df_stdb_output))

##    stdb.output.example      data.example
## 1           dataset_id        dataset_id
## 2      scientific_name   scientific_name
## 3          common_name       common_name
## 4            site_name         site_name
## 5          colony_name       colony_name
## 6           lat_colony        lat_colony
## 7           lon_colony        lon_colony
## 8               device            device
## 9              bird_id           bird_id
## 10            track_id          track_id
## 11   original_track_id original_track_id
## 12                 age               age
## 13                 sex               sex
## 14         breed_stage       breed_stage
## 15        breed_status      breed_status
## 16            date_gmt          date_gmt
## 17            time_gmt          time_gmt
## 18            latitude          latitude
## 19           longitude         longitude
## 20       argos_quality     argos_quality
## 21             equinox           equinox

25.9.3 OUTPUT STDB format: saving

You should now have another key file:
- df_stdb_output
- The format of this file matches that of the output format of the STDB. i.e. the format of the data when you download it from the STDB.

NOTE: if your dataset is already hosted on the Seabird Tracking Database, you can download it from there directly and skip the steps above.

## Save the output file
write.csv(df_stdb_output,
          paste0("./data-testing/tracking-data/",
                 species.name,
                 "-",colony.name,"-tracking-STDB-output.csv"),
          row.names = F)