24 Tracking data: Merge files together for analysis

Analyses outlined in this chapter were performed in R version 4.3.2 (2023-10-31 ucrt)

This chapter was last updated on 2024-02-23

24.1 What this chapter covers:

Read in raw tracking data into R (assuming data is in *.csv file format).
Combine data into a single data frame.
Save the single data frame as a *.csv file for further analyses

24.2 Where you can get example data for the chapter:

This tutorial uses example data from a project led by the BirdLife International partner in Croatia: BIOM

The citation for this data is: Zec et al. 2023
Example data is available upon request

24.3 Load packages

Load required R packages for use with codes in this chapter:

If the package(s) fails to load, you will need to install the relevant package(s).

## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Load libraries --------------------------------------------------------------
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## sf package for spatial data analyses (i.e. vector files such as points, lines, polygons)
library(sf)
## Tidyverse for data manipulation
library(tidyverse)
## ggplot2 for plotting opionts
library(ggplot2)
## rnaturalearth package for geographic basemaps in R
library(rnaturalearth)
## leaflet package for interactive maps in R
library(leaflet)
## lubridate for date time
library(lubridate)
## track2kba for the analysis of important site identification
library(track2KBA)
## speed filter
library(trip)
## linear interpolation
library(adehabitatLT)
##
library(raster)
##
library(viridis)
##
library(readxl)
library(xlsx)

24.4 Define object names for chapter

Typically, if your data follows the same format as the examples in the chapter, then below should be the only thing(s) you need to change.

## Define your species name (avoid spaces by using hashes instead. This can help with later coding steps)
species.name <- "Puffinus-yelkouan"

## Define your colony name
colony.name <- "Z"

## Set file path to where tracking data is stored
## If you are less familiar with R, it can be easier to specify the entire file path.
fpath.tracks <- "C:\\Users\\jonathan.handley\\OneDrive - BirdLife International\\JonoHandley_BirdLife\\PROJECTS\\Marine Toolkit\\GitHub_MarineToolkit_JH3\\MarMeCo-Toolkit-R-Beta-JH3\\data-testing\\tracking-data\\Puffinus-yelkouan-raw-csv-tracking\\Z"

Pro Tip: navigate to the folder where your tracking data is stored. Copy the file path from within the file explorer. Then type the readClipboard() function in R to print the file path. Copy and paste the file path into the code above.

24.5 Storing, reading, and formatting raw tracking data

24.5.1 Storing raw tracking data

The type of animal tracking device you use will dictate what format your raw tracking data is stored in.

Typically, raw outputs from animal tracking devices have been stored as *.csv files.

Good file management is critical when working with large tracking datasets.

24.5.2 Reading raw tracking data into R / Rstudio

Depending on your file structure, type of raw data, and size of your overall data, we recommend reading data into R in a way that produces a single data file for all your data required for a specific analysis.

Reading all your data in at once is greatly facilitated when each data file is stored in a stardised format.

24.6 Example data summary: Yelkouan Shearwaters (Puffinus yelkouan), Croatia

Summary of the example dataset used in this tutorial:

Species tracked: Yelkouan Shearwater (Puffinus yelkouan)
Colony tracked from: Zaklopatica (Z), Croatia

Site / source population birds tracked from: Lastovo SPA, Croatia
Life-cycle stage when birds were tracked: chick-rearing
Years birds were tracked over: 2019, 2020
Devices birds were tracked with: GPS
Device model type: PathTrack nanoFix GPS/UHF transmitters (≤ 5.5 g)

24.7 Loading example data: Yelkouan Shearwaters (Puffinus yelkouan)

24.7.1 Load example data: First, exploring the data on your machine

## Check where your current working directory is set up to go to:
getwd()

## [1] "C:/Users/jonathan.handley/OneDrive - BirdLife International/JonoHandley_BirdLife/PROJECTS/Marine Toolkit/GitHub_MarineToolkit_JH6"

In the examples below, you can see different levels at which we have explored what is inside each folder.

You will note:

A top level folder called Puffinus-yelkouan-raw-csv-tracking
Within this folder, different colonies worth of tracking data: Zaklopatica (Z), Veli Maslovnjak (VM).
Within each colony folder of tracking data, uniquely named .csv files relating to each unique deployment on a bird.

This format of broadly storing data by Species -> Colony is aligned with the format used for inputting data into the Seabird Tracking Database.

We recognise that more granular (finer) levels of data storage may be chosen.

##  [1] "_book"                                           
##  [2] "_main.Rmd"                                       
##  [3] "_main_files"                                     
##  [4] "01-01-Executive-Summary.html"                    
##  [5] "01-01-Executive-Summary.Rmd"                     
##  [6] "01-02-Toolkit-Offers.Rmd"                        
##  [7] "01-03-Toolkit-NOT-Offers.Rmd"                    
##  [8] "01-04-Contribute.Rmd"                            
##  [9] "01-05-Theory-Of-Change.html"                     
## [10] "01-05-Theory-Of-Change.Rmd"                      
## [11] "01-06-Project-Management.html"                   
## [12] "01-06-Project-Management.Rmd"                    
## [13] "01-07-Before_IBA-KBA-Consider.Rmd"               
## [14] "02-01-Site-Concept-AreaBasedConservation.Rmd"    
## [15] "02-02-Sites-KBAs-IBAs.Rmd"                       
## [16] "02-03-Sites-Seabirds.html"                       
## [17] "02-03-Sites-Toolkit.Rmd"                         
## [18] "02-04-Sites-Method-Consider.Rmd"                 
## [19] "02-04-Sites-Seabirds-Method-Summary.html"        
## [20] "02-05-Sites-Seabirds-Cons-Summary.Rmd"           
## [21] "02-06-Sites-Seabirds-PolAdvoc-Summary.Rmd"       
## [22] "03-01-Data-Required-Seabird-Sites.Rmd"           
## [23] "03-02-DataGroups-Defining.html"                  
## [24] "03-02-DataGroups-Defining.Rmd"                   
## [25] "03-03-Colony-Single-Assessment.Rmd"              
## [26] "03-04-Colony-Multi-Assessment.Rmd"               
## [27] "03-05-Colony-MultiSpecies-Assessment.Rmd"        
## [28] "04-01_Seaward_extension_introduction.Rmd"        
## [29] "04-02-Seaward_extension_background_single_ne.Rmd"
## [30] "04-03_Seaward_extension_multi_ne.Rmd"            
## [31] "05-01-TrackingData-IntroToTracking.Rmd"          
## [32] "05-02-TrackingData-SamplingStrategy.Rmd"         
## [33] "05-06-TrackingData-FormatAndMergeCSVfiles.Rmd"   
## [34] "05-07-TrackingData-STDB-format.Rmd"              
## [35] "05-08-TrackingData-Intro-Plot-ReviewTabular.Rmd" 
## [36] "05-09-TrackingData-CleaningData.Rmd"             
## [37] "06-01-Track2KBA-Intro.Rmd"                       
## [38] "06-02-Track2KBA-CPF-CleanSummariseData.Rmd"      
## [39] "06-03-Track2KBA-CPF-Analysis.Rmd"                
## [40] "06-04-Track2KBA-Non-CPF-Example.Rmd"             
## [41] "07-01-Prelim-Sites-AssessCriteria.Rmd"           
## [42] "07-02-Prelim-Sites-KBA-Considerations.Rmd"       
## [43] "07-03-Prelim-Sites-Merging-Layers.Rmd"           
## [44] "07-04-Prelim-Sites-RefineFinalBoundaries.Rmd"    
## [45] "08-01-SuppData-AtSeaSurveys.Rmd"                 
## [46] "08-02-SuppData-Modelling.Rmd"                    
## [47] "09-01-Sites-Proposing-KBA.Rmd"                   
## [48] "10-01-Sites-Monitoring.Rmd"                      
## [49] "10-02-Sites-Conservation.Rmd"                    
## [50] "10-03-Sites-Policy-Advocacy.html"                
## [51] "10-03-Sites-Policy-Advocacy.Rmd"                 
## [52] "91-02-Sites-Proposing-IBA.Rmd"                   
## [53] "91-04-Appendix-InterpolationMethodsCompare.html" 
## [54] "91-04-Appendix-InterpolationMethodsCompare.Rmd"  
## [55] "91-05-Contact.Rmd"                               
## [56] "91-06-References.Rmd"                            
## [57] "AllZoteroReferences.bib"                         
## [58] "CHAPTERS-Temp-folder"                            
## [59] "data-input-files-bookdown"                       
## [60] "data-input-files-tracking"                       
## [61] "data-testing"                                    
## [62] "docs"                                            
## [63] "GitHub_MarineToolkit_JH6.Rproj"                  
## [64] "index.Rmd"                                       
## [65] "LICENSE"                                         
## [66] "photos-for-book"                                 
## [67] "photos-for-book-NonGitHub"                       
## [68] "presentations-toolkit"                           
## [69] "R-RMarkdown-AdvocacyPolicy"                      
## [70] "R-RMarkdown-AtSea-Modelling"                     
## [71] "R-RMarkdown-BookdownChapters"                    
## [72] "R-RMarkdown-Conservation"                        
## [73] "R-RMarkdown-Jono"                                
## [74] "R-RMarkdown-TestFiles"                           
## [75] "R-Scripts-Chapters"                              
## [76] "R-Scripts-Chapters-MarineToolkit.zip"            
## [77] "R-Scripts-RENDER-FromMarkdown.R"                 
## [78] "R-Scripts-SavingFromMarkdown.R"                  
## [79] "README.md"                                       
## [80] "seaward_extension_outputs"                       
## [81] "tracking_CleanAndPrepareData2_AllTracks.R"

## [1] "VM" "Z"

## [1] "20_Tag17700_VM-13.csv"             "20_Tag17717_VM-8.csv"             
## [3] "20_Tag40014_VM-3 (2nd Parent).csv" "20_Tag40086_VM-3.csv"             
## [5] "20_Tag40138_VM-18.csv"             "20_Tag40536_VM-8 (2nd Parent).csv"
## [7] "20_Tag40615_VM-23.csv"             "20_Tag40817_VM-12.csv"

dir("./data-testing/tracking-data/Puffinus-yelkouan-raw-csv-tracking/Z")

##  [1] "19_Tag17600_Z-9.csv"                   
##  [2] "19_Tag17604_Z-7.csv"                   
##  [3] "19_Tag17617_Z-4 (2nd Parent).csv"      
##  [4] "19_Tag17644_Z-13.csv"                  
##  [5] "19_Tag17652_Z-2.csv"                   
##  [6] "19_Tag17704_Z-11.csv"                  
##  [7] "19_Tag17735_Z-3 (RAW DATA MISSING).csv"
##  [8] "19_Tag40066_Z-14.csv"                  
##  [9] "19_Tag40069_Z-6.csv"                   
## [10] "19_Tag40073_Z-1.csv"                   
## [11] "19_Tag40078_Z-3 (2nd Parent).csv"      
## [12] "19_Tag40086_Z-11 (2nd Parent).csv"     
## [13] "19_Tag40094_Z-16.csv"                  
## [14] "19_Tag40118_Z-2 (2nd Parent).csv"      
## [15] "19_Tag40133_Z-15.csv"                  
## [16] "19_Tag40138_Z-17.csv"                  
## [17] "19_Tag40170_Z-12.csv"                  
## [18] "19_Tag40177_Z-17 (2nd Parent).csv"     
## [19] "19_Tag40182_Z-4.csv"                   
## [20] "20_Tag17600_Z-170.csv"                 
## [21] "20_Tag17604_Z-95.csv"                  
## [22] "20_Tag17644_Z-106.csv"                 
## [23] "20_Tag17677_Z-170 (2nd Parent).csv"    
## [24] "20_Tag17724_Z-106 (2nd Parent).csv"    
## [25] "20_Tag40024_Z-178 (2nd Parent).csv"    
## [26] "20_Tag40039_Z-15 (2nd Parent).csv"     
## [27] "20_Tag40073_Z-131 (2nd Parent).csv"    
## [28] "20_Tag40078_Z-178.csv"                 
## [29] "20_Tag40094_Z-131.csv"                 
## [30] "20_Tag40118_Z-175.csv"                 
## [31] "20_Tag40133_Z-1 (2nd Parent).csv"      
## [32] "20_Tag40193_Z-13 (2nd Parent).csv"     
## [33] "20_Tag40859_Z-179.csv"                 
## [34] "20_Tag41108_Z-95 (2nd Parent).csv"

24.7.2 Load example data: Second, prepare the files for loading into R

Preparing animal tracking data for merging into a single data frame may require some initial cleaning of the raw tracking data outputs. You will need to consider:

Do all the devices I have collected data with have a common file format? (e.g. are the column names consistent across all devices?)
Do all the devices I have collected data with have a common file type? (e.g. are the output files *.csv files, or are they custom to the device manufacturer?)
A number of other factors may need to be considered.

If the raw outputs from tracking devices vary across deployments, you may need to first standardise data from each deployment into a common format for subsequent merging.

For example: If you deployed GPS device TYPE A in season 1 and the output was a csv file with 8 columns, and you then deployed GPS device TYPE B in season 2 and the output was a custom file type with 10 columns, you would need to standardise the data separately for each GPS device type and season combination, before being able to merge all the data into a single data frame.

24.7.3 Load example data: Third, load the files into a single data frame in R

Here, we assume users have prepared their data into standardised *.csv files across each deployment, where each file has the same number of columns and matching column names.

The number of columns and column names can be unique to your data. The key thing is that all column names, and the associated variables represented by each column, are matching.

Notes on the example data:

In the case of the example data for Yelkouan Shearwaters, the original output from the PathTrack nanoFix GPS/UHF transmitters (GPS devices) was a custom .pos file. These .pos files were prepared for analysis in a separate R script.

24.7.3.1 Produce list of file names with all your tracking data

Produce a list of file names with all your tracking data

## specifying the files: folder directly
track.list <- list.files(path = fpath.tracks,
                         ## Set recursive = TRUE to search through sub-folders if required.
                         recursive = FALSE,
                         pattern = ".csv",
                         full.names = T)

Check how long the list of names is names that was read in using the list.files() function.

## Check how many deployments you are expecting to bind together. 
## This code effectively says, how long is the list 
## of names that were read in using the list.files function.
length(track.list)

## [1] 34

If the number is too small or too large, and you expect more or less deployments to be considered, it may be the case that:

there are additional *.csv files in your deployment folders that should not be there (i.e. too many deployments being considered / number too large)
there are fewer deployments being considered than should be. (i.e. supposed *.csv files indicative of deployments are likely not being read correctly)

Ultimately, if the number of deployments you think you have in total is not equivalent to this review, then check what the issue might be as per options considered above.

Next, create a blank data frame to which you can bind on the tracking data from each unique deployment. Effectively, you can consider this step as preparing for sticking all your data together to make one big table (a data frame in R) with all of the data.

24.7.3.2 Merge tracking data together to create singe file

## specify a blank dataframe
track.df <- data.frame()

Finally, use a for loop to read in each file from each unique deployment, and then bind them all together.

Unsure how a for loop works? See numerous resources online.

for(i in 1:length(track.list)){
  ## read in a unique file
  temp <- read.csv(track.list[i])
  ## bind the unique file onto the data frame template
  track.df <- rbind(track.df,temp)
  ## print some text to show progress of the loop and binding process
  print(paste("Deployment ", i, " of ", length(track.list), "being bound to the data frame."))
}

## [1] "Deployment  1  of  34 being bound to the data frame."
## [1] "Deployment  2  of  34 being bound to the data frame."
## [1] "Deployment  3  of  34 being bound to the data frame."
## [1] "Deployment  4  of  34 being bound to the data frame."
## [1] "Deployment  5  of  34 being bound to the data frame."
## [1] "Deployment  6  of  34 being bound to the data frame."
## [1] "Deployment  7  of  34 being bound to the data frame."
## [1] "Deployment  8  of  34 being bound to the data frame."
## [1] "Deployment  9  of  34 being bound to the data frame."
## [1] "Deployment  10  of  34 being bound to the data frame."
## [1] "Deployment  11  of  34 being bound to the data frame."
## [1] "Deployment  12  of  34 being bound to the data frame."
## [1] "Deployment  13  of  34 being bound to the data frame."
## [1] "Deployment  14  of  34 being bound to the data frame."
## [1] "Deployment  15  of  34 being bound to the data frame."
## [1] "Deployment  16  of  34 being bound to the data frame."
## [1] "Deployment  17  of  34 being bound to the data frame."
## [1] "Deployment  18  of  34 being bound to the data frame."
## [1] "Deployment  19  of  34 being bound to the data frame."
## [1] "Deployment  20  of  34 being bound to the data frame."
## [1] "Deployment  21  of  34 being bound to the data frame."
## [1] "Deployment  22  of  34 being bound to the data frame."
## [1] "Deployment  23  of  34 being bound to the data frame."
## [1] "Deployment  24  of  34 being bound to the data frame."
## [1] "Deployment  25  of  34 being bound to the data frame."
## [1] "Deployment  26  of  34 being bound to the data frame."
## [1] "Deployment  27  of  34 being bound to the data frame."
## [1] "Deployment  28  of  34 being bound to the data frame."
## [1] "Deployment  29  of  34 being bound to the data frame."
## [1] "Deployment  30  of  34 being bound to the data frame."
## [1] "Deployment  31  of  34 being bound to the data frame."
## [1] "Deployment  32  of  34 being bound to the data frame."
## [1] "Deployment  33  of  34 being bound to the data frame."
## [1] "Deployment  34  of  34 being bound to the data frame."

24.7.4 Review the data you have read into R

Here you are doing some quick inspections to see if anything unexpected may have happened when reading your data into R.

If you are unsure, or want to learn more about the different way data can be structured in R, consider doing an online R course which teaches the beginnger concepts of R. Also, see the latest “R for Data Science” book by Hadley Wickham and colleagues.

## Print the column names and first two rows of data - does everything look as it should?
head(track.df,2)

##   day month year hour minute second satellites  latitude longitude altitude
## 1  24     5   19    0     49      9          5 42.811528 16.885531    -1.50
## 2  24     5   19    1      9      3          5 42.812029 16.886907     5.25
##   time_offset    accuracy voltage colony_code         bird_id
## 1       2.910 4.70376e-07    4.12           Z 19_Tag17600_Z-9
## 2       2.795 9.25368e-07    4.08           Z 19_Tag17600_Z-9
##                  dttm deploy_year
## 1 2019-05-24 00:49:09        2019
## 2 2019-05-24 01:09:03        2019

## Print the LAST two rows of data - does everything look as it should?
tail(track.df,2)

##       day month year hour minute second satellites  latitude longitude altitude
## 11353  30     5   20   16     33      5          6 43.197648 16.354861    61.25
## 11354  30     5   20   19     51     59          6 42.817527 16.766630    86.75
##       time_offset     accuracy voltage colony_code
## 11353      -23.16 5.226000e-06    3.98           Z
## 11354      -23.31 2.183421e-06    4.02           Z
##                             bird_id                dttm deploy_year
## 11353 20_Tag41108_Z-95 (2nd Parent) 2020-05-30 16:33:05        2020
## 11354 20_Tag41108_Z-95 (2nd Parent) 2020-05-30 19:51:59        2020

## Now check the structure of each column of data
str(track.df,2)

## 'data.frame':    11354 obs. of  17 variables:
##  $ day        : int  24 24 24 24 24 24 24 24 24 24 ...
##  $ month      : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ year       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ hour       : int  0 1 1 1 2 2 2 3 3 6 ...
##  $ minute     : int  49 9 29 49 9 29 49 9 29 47 ...
##  $ second     : int  9 3 3 9 5 6 9 5 5 38 ...
##  $ satellites : int  5 5 5 4 6 7 4 6 5 6 ...
##  $ latitude   : num  42.8 42.8 42.8 42.8 42.8 ...
##  $ longitude  : num  16.9 16.9 16.9 16.9 16.9 ...
##  $ altitude   : num  -1.5 5.25 143.5 143 98 ...
##  $ time_offset: num  2.91 2.79 2.79 2.91 2.93 ...
##  $ accuracy   : num  4.70e-07 9.25e-07 3.52e-07 7.13e-07 4.87e-06 ...
##  $ voltage    : num  4.12 4.08 4.08 4.1 4.12 4.1 4.12 4.12 4.12 4.1 ...
##  $ colony_code: chr  "Z" "Z" "Z" "Z" ...
##  $ bird_id    : chr  "19_Tag17600_Z-9" "19_Tag17600_Z-9" "19_Tag17600_Z-9" "19_Tag17600_Z-9" ...
##  $ dttm       : chr  "2019-05-24 00:49:09" "2019-05-24 01:09:03" "2019-05-24 01:29:03" "2019-05-24 01:49:09" ...
##  $ deploy_year: int  2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...

24.7.5 Save the merged data

## Save the output file
write.csv(track.df,
          paste0("./data-testing/tracking-data/",
                 species.name,
                 "-",colony.name,"-tracking-raw-merged.csv"),
          row.names = F)