24 Tracking data: Merge files together for analysis
Analyses outlined in this chapter were performed in R version 4.3.2 (2023-10-31 ucrt)
This chapter was last updated on 2024-02-23
24.1 What this chapter covers:
Read in raw tracking data into R (assuming data is in *.csv file format).
Combine data into a single data frame.
Save the single data frame as a *.csv file for further analyses
24.2 Where you can get example data for the chapter:
This tutorial uses example data from a project led by the BirdLife International partner in Croatia: BIOM
The citation for this data is: Zec et al. 2023
Example data is available upon request
24.3 Load packages
Load required R packages for use with codes in this chapter:
If the package(s) fails to load, you will need to install the relevant package(s).
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Load libraries --------------------------------------------------------------
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## sf package for spatial data analyses (i.e. vector files such as points, lines, polygons)
library(sf)
## Tidyverse for data manipulation
library(tidyverse)
## ggplot2 for plotting opionts
library(ggplot2)
## rnaturalearth package for geographic basemaps in R
library(rnaturalearth)
## leaflet package for interactive maps in R
library(leaflet)
## lubridate for date time
library(lubridate)
## track2kba for the analysis of important site identification
library(track2KBA)
## speed filter
library(trip)
## linear interpolation
library(adehabitatLT)
##
library(raster)
##
library(viridis)
##
library(readxl)
library(xlsx)
24.4 Define object names for chapter
Typically, if your data follows the same format as the examples in the chapter, then below should be the only thing(s) you need to change.
## Define your species name (avoid spaces by using hashes instead. This can help with later coding steps)
species.name <- "Puffinus-yelkouan"
## Define your colony name
colony.name <- "Z"
## Set file path to where tracking data is stored
## If you are less familiar with R, it can be easier to specify the entire file path.
fpath.tracks <- "C:\\Users\\jonathan.handley\\OneDrive - BirdLife International\\JonoHandley_BirdLife\\PROJECTS\\Marine Toolkit\\GitHub_MarineToolkit_JH3\\MarMeCo-Toolkit-R-Beta-JH3\\data-testing\\tracking-data\\Puffinus-yelkouan-raw-csv-tracking\\Z"
Pro Tip: navigate to the folder where your tracking data is stored. Copy the file path from within the file explorer. Then type the
readClipboard()
function in R to print the file path. Copy and paste the file path into the code above.
24.5 Storing, reading, and formatting raw tracking data
24.5.1 Storing raw tracking data
The type of animal tracking device you use will dictate what format your raw tracking data is stored in.
Typically, raw outputs from animal tracking devices have been stored as *.csv files.
Good file management is critical when working with large tracking datasets.
24.5.2 Reading raw tracking data into R / Rstudio
Depending on your file structure, type of raw data, and size of your overall data, we recommend reading data into R in a way that produces a single data file for all your data required for a specific analysis.
Reading all your data in at once is greatly facilitated when each data file is stored in a stardised format.
24.6 Example data summary: Yelkouan Shearwaters (Puffinus yelkouan), Croatia
Summary of the example dataset used in this tutorial:
Species tracked: Yelkouan Shearwater (Puffinus yelkouan)
Colony tracked from: Zaklopatica (Z), Croatia
Site / source population birds tracked from: Lastovo SPA, Croatia
Life-cycle stage when birds were tracked: chick-rearing
Years birds were tracked over: 2019, 2020
Devices birds were tracked with: GPS
Device model type: PathTrack nanoFix GPS/UHF transmitters (≤ 5.5 g)
24.7 Loading example data: Yelkouan Shearwaters (Puffinus yelkouan)
24.7.1 Load example data: First, exploring the data on your machine
## [1] "C:/Users/jonathan.handley/OneDrive - BirdLife International/JonoHandley_BirdLife/PROJECTS/Marine Toolkit/GitHub_MarineToolkit_JH6"
In the examples below, you can see different levels at which we have explored what is inside each folder.
You will note:
A top level folder called Puffinus-yelkouan-raw-csv-tracking
Within this folder, different colonies worth of tracking data: Zaklopatica (Z), Veli Maslovnjak (VM).
Within each colony folder of tracking data, uniquely named .csv files relating to each unique deployment on a bird.
This format of broadly storing data by Species -> Colony is aligned with the format used for inputting data into the Seabird Tracking Database.
We recognise that more granular (finer) levels of data storage may be chosen.
## [1] "_book"
## [2] "_main.Rmd"
## [3] "_main_files"
## [4] "01-01-Executive-Summary.html"
## [5] "01-01-Executive-Summary.Rmd"
## [6] "01-02-Toolkit-Offers.Rmd"
## [7] "01-03-Toolkit-NOT-Offers.Rmd"
## [8] "01-04-Contribute.Rmd"
## [9] "01-05-Theory-Of-Change.html"
## [10] "01-05-Theory-Of-Change.Rmd"
## [11] "01-06-Project-Management.html"
## [12] "01-06-Project-Management.Rmd"
## [13] "01-07-Before_IBA-KBA-Consider.Rmd"
## [14] "02-01-Site-Concept-AreaBasedConservation.Rmd"
## [15] "02-02-Sites-KBAs-IBAs.Rmd"
## [16] "02-03-Sites-Seabirds.html"
## [17] "02-03-Sites-Toolkit.Rmd"
## [18] "02-04-Sites-Method-Consider.Rmd"
## [19] "02-04-Sites-Seabirds-Method-Summary.html"
## [20] "02-05-Sites-Seabirds-Cons-Summary.Rmd"
## [21] "02-06-Sites-Seabirds-PolAdvoc-Summary.Rmd"
## [22] "03-01-Data-Required-Seabird-Sites.Rmd"
## [23] "03-02-DataGroups-Defining.html"
## [24] "03-02-DataGroups-Defining.Rmd"
## [25] "03-03-Colony-Single-Assessment.Rmd"
## [26] "03-04-Colony-Multi-Assessment.Rmd"
## [27] "03-05-Colony-MultiSpecies-Assessment.Rmd"
## [28] "04-01_Seaward_extension_introduction.Rmd"
## [29] "04-02-Seaward_extension_background_single_ne.Rmd"
## [30] "04-03_Seaward_extension_multi_ne.Rmd"
## [31] "05-01-TrackingData-IntroToTracking.Rmd"
## [32] "05-02-TrackingData-SamplingStrategy.Rmd"
## [33] "05-06-TrackingData-FormatAndMergeCSVfiles.Rmd"
## [34] "05-07-TrackingData-STDB-format.Rmd"
## [35] "05-08-TrackingData-Intro-Plot-ReviewTabular.Rmd"
## [36] "05-09-TrackingData-CleaningData.Rmd"
## [37] "06-01-Track2KBA-Intro.Rmd"
## [38] "06-02-Track2KBA-CPF-CleanSummariseData.Rmd"
## [39] "06-03-Track2KBA-CPF-Analysis.Rmd"
## [40] "06-04-Track2KBA-Non-CPF-Example.Rmd"
## [41] "07-01-Prelim-Sites-AssessCriteria.Rmd"
## [42] "07-02-Prelim-Sites-KBA-Considerations.Rmd"
## [43] "07-03-Prelim-Sites-Merging-Layers.Rmd"
## [44] "07-04-Prelim-Sites-RefineFinalBoundaries.Rmd"
## [45] "08-01-SuppData-AtSeaSurveys.Rmd"
## [46] "08-02-SuppData-Modelling.Rmd"
## [47] "09-01-Sites-Proposing-KBA.Rmd"
## [48] "10-01-Sites-Monitoring.Rmd"
## [49] "10-02-Sites-Conservation.Rmd"
## [50] "10-03-Sites-Policy-Advocacy.html"
## [51] "10-03-Sites-Policy-Advocacy.Rmd"
## [52] "91-02-Sites-Proposing-IBA.Rmd"
## [53] "91-04-Appendix-InterpolationMethodsCompare.html"
## [54] "91-04-Appendix-InterpolationMethodsCompare.Rmd"
## [55] "91-05-Contact.Rmd"
## [56] "91-06-References.Rmd"
## [57] "AllZoteroReferences.bib"
## [58] "CHAPTERS-Temp-folder"
## [59] "data-input-files-bookdown"
## [60] "data-input-files-tracking"
## [61] "data-testing"
## [62] "docs"
## [63] "GitHub_MarineToolkit_JH6.Rproj"
## [64] "index.Rmd"
## [65] "LICENSE"
## [66] "photos-for-book"
## [67] "photos-for-book-NonGitHub"
## [68] "presentations-toolkit"
## [69] "R-RMarkdown-AdvocacyPolicy"
## [70] "R-RMarkdown-AtSea-Modelling"
## [71] "R-RMarkdown-BookdownChapters"
## [72] "R-RMarkdown-Conservation"
## [73] "R-RMarkdown-Jono"
## [74] "R-RMarkdown-TestFiles"
## [75] "R-Scripts-Chapters"
## [76] "R-Scripts-Chapters-MarineToolkit.zip"
## [77] "R-Scripts-RENDER-FromMarkdown.R"
## [78] "R-Scripts-SavingFromMarkdown.R"
## [79] "README.md"
## [80] "seaward_extension_outputs"
## [81] "tracking_CleanAndPrepareData2_AllTracks.R"
## [1] "VM" "Z"
## [1] "20_Tag17700_VM-13.csv" "20_Tag17717_VM-8.csv"
## [3] "20_Tag40014_VM-3 (2nd Parent).csv" "20_Tag40086_VM-3.csv"
## [5] "20_Tag40138_VM-18.csv" "20_Tag40536_VM-8 (2nd Parent).csv"
## [7] "20_Tag40615_VM-23.csv" "20_Tag40817_VM-12.csv"
## [1] "19_Tag17600_Z-9.csv"
## [2] "19_Tag17604_Z-7.csv"
## [3] "19_Tag17617_Z-4 (2nd Parent).csv"
## [4] "19_Tag17644_Z-13.csv"
## [5] "19_Tag17652_Z-2.csv"
## [6] "19_Tag17704_Z-11.csv"
## [7] "19_Tag17735_Z-3 (RAW DATA MISSING).csv"
## [8] "19_Tag40066_Z-14.csv"
## [9] "19_Tag40069_Z-6.csv"
## [10] "19_Tag40073_Z-1.csv"
## [11] "19_Tag40078_Z-3 (2nd Parent).csv"
## [12] "19_Tag40086_Z-11 (2nd Parent).csv"
## [13] "19_Tag40094_Z-16.csv"
## [14] "19_Tag40118_Z-2 (2nd Parent).csv"
## [15] "19_Tag40133_Z-15.csv"
## [16] "19_Tag40138_Z-17.csv"
## [17] "19_Tag40170_Z-12.csv"
## [18] "19_Tag40177_Z-17 (2nd Parent).csv"
## [19] "19_Tag40182_Z-4.csv"
## [20] "20_Tag17600_Z-170.csv"
## [21] "20_Tag17604_Z-95.csv"
## [22] "20_Tag17644_Z-106.csv"
## [23] "20_Tag17677_Z-170 (2nd Parent).csv"
## [24] "20_Tag17724_Z-106 (2nd Parent).csv"
## [25] "20_Tag40024_Z-178 (2nd Parent).csv"
## [26] "20_Tag40039_Z-15 (2nd Parent).csv"
## [27] "20_Tag40073_Z-131 (2nd Parent).csv"
## [28] "20_Tag40078_Z-178.csv"
## [29] "20_Tag40094_Z-131.csv"
## [30] "20_Tag40118_Z-175.csv"
## [31] "20_Tag40133_Z-1 (2nd Parent).csv"
## [32] "20_Tag40193_Z-13 (2nd Parent).csv"
## [33] "20_Tag40859_Z-179.csv"
## [34] "20_Tag41108_Z-95 (2nd Parent).csv"
24.7.2 Load example data: Second, prepare the files for loading into R
Preparing animal tracking data for merging into a single data frame may require some initial cleaning of the raw tracking data outputs. You will need to consider:
Do all the devices I have collected data with have a common file format? (e.g. are the column names consistent across all devices?)
Do all the devices I have collected data with have a common file type? (e.g. are the output files *.csv files, or are they custom to the device manufacturer?)
A number of other factors may need to be considered.
If the raw outputs from tracking devices vary across deployments, you may need to first standardise data from each deployment into a common format for subsequent merging.
For example: If you deployed GPS device TYPE A in season 1 and the output was a csv file with 8 columns, and you then deployed GPS device TYPE B in season 2 and the output was a custom file type with 10 columns, you would need to standardise the data separately for each GPS device type and season combination, before being able to merge all the data into a single data frame.
24.7.3 Load example data: Third, load the files into a single data frame in R
Here, we assume users have prepared their data into standardised *.csv files across each deployment, where each file has the same number of columns and matching column names.
The number of columns and column names can be unique to your data. The key thing is that all column names, and the associated variables represented by each column, are matching.
Notes on the example data:
- In the case of the example data for Yelkouan Shearwaters, the original output from the PathTrack nanoFix GPS/UHF transmitters (GPS devices) was a custom .pos file. These .pos files were prepared for analysis in a separate R script.
24.7.3.1 Produce list of file names with all your tracking data
Produce a list of file names with all your tracking data
## specifying the files: folder directly
track.list <- list.files(path = fpath.tracks,
## Set recursive = TRUE to search through sub-folders if required.
recursive = FALSE,
pattern = ".csv",
full.names = T)
Check how long the list of names is names that was read in using the list.files()
function.
## Check how many deployments you are expecting to bind together.
## This code effectively says, how long is the list
## of names that were read in using the list.files function.
length(track.list)
## [1] 34
If the number is too small or too large, and you expect more or less deployments to be considered, it may be the case that:
there are additional *.csv files in your deployment folders that should not be there (i.e. too many deployments being considered / number too large)
there are fewer deployments being considered than should be. (i.e. supposed *.csv files indicative of deployments are likely not being read correctly)
Ultimately, if the number of deployments you think you have in total is not equivalent to this review, then check what the issue might be as per options considered above.
Next, create a blank data frame to which you can bind on the tracking data from each unique deployment. Effectively, you can consider this step as preparing for sticking all your data together to make one big table (a data frame in R) with all of the data.
24.7.3.2 Merge tracking data together to create singe file
Finally, use a for loop to read in each file from each unique deployment, and then bind them all together.
Unsure how a for loop works? See numerous resources online.
for(i in 1:length(track.list)){
## read in a unique file
temp <- read.csv(track.list[i])
## bind the unique file onto the data frame template
track.df <- rbind(track.df,temp)
## print some text to show progress of the loop and binding process
print(paste("Deployment ", i, " of ", length(track.list), "being bound to the data frame."))
}
## [1] "Deployment 1 of 34 being bound to the data frame."
## [1] "Deployment 2 of 34 being bound to the data frame."
## [1] "Deployment 3 of 34 being bound to the data frame."
## [1] "Deployment 4 of 34 being bound to the data frame."
## [1] "Deployment 5 of 34 being bound to the data frame."
## [1] "Deployment 6 of 34 being bound to the data frame."
## [1] "Deployment 7 of 34 being bound to the data frame."
## [1] "Deployment 8 of 34 being bound to the data frame."
## [1] "Deployment 9 of 34 being bound to the data frame."
## [1] "Deployment 10 of 34 being bound to the data frame."
## [1] "Deployment 11 of 34 being bound to the data frame."
## [1] "Deployment 12 of 34 being bound to the data frame."
## [1] "Deployment 13 of 34 being bound to the data frame."
## [1] "Deployment 14 of 34 being bound to the data frame."
## [1] "Deployment 15 of 34 being bound to the data frame."
## [1] "Deployment 16 of 34 being bound to the data frame."
## [1] "Deployment 17 of 34 being bound to the data frame."
## [1] "Deployment 18 of 34 being bound to the data frame."
## [1] "Deployment 19 of 34 being bound to the data frame."
## [1] "Deployment 20 of 34 being bound to the data frame."
## [1] "Deployment 21 of 34 being bound to the data frame."
## [1] "Deployment 22 of 34 being bound to the data frame."
## [1] "Deployment 23 of 34 being bound to the data frame."
## [1] "Deployment 24 of 34 being bound to the data frame."
## [1] "Deployment 25 of 34 being bound to the data frame."
## [1] "Deployment 26 of 34 being bound to the data frame."
## [1] "Deployment 27 of 34 being bound to the data frame."
## [1] "Deployment 28 of 34 being bound to the data frame."
## [1] "Deployment 29 of 34 being bound to the data frame."
## [1] "Deployment 30 of 34 being bound to the data frame."
## [1] "Deployment 31 of 34 being bound to the data frame."
## [1] "Deployment 32 of 34 being bound to the data frame."
## [1] "Deployment 33 of 34 being bound to the data frame."
## [1] "Deployment 34 of 34 being bound to the data frame."
24.7.4 Review the data you have read into R
Here you are doing some quick inspections to see if anything unexpected may have happened when reading your data into R.
If you are unsure, or want to learn more about the different way data can be structured in R, consider doing an online R course which teaches the beginnger concepts of R. Also, see the latest “R for Data Science” book by Hadley Wickham and colleagues.
## Print the column names and first two rows of data - does everything look as it should?
head(track.df,2)
## day month year hour minute second satellites latitude longitude altitude
## 1 24 5 19 0 49 9 5 42.811528 16.885531 -1.50
## 2 24 5 19 1 9 3 5 42.812029 16.886907 5.25
## time_offset accuracy voltage colony_code bird_id
## 1 2.910 4.70376e-07 4.12 Z 19_Tag17600_Z-9
## 2 2.795 9.25368e-07 4.08 Z 19_Tag17600_Z-9
## dttm deploy_year
## 1 2019-05-24 00:49:09 2019
## 2 2019-05-24 01:09:03 2019
## day month year hour minute second satellites latitude longitude altitude
## 11353 30 5 20 16 33 5 6 43.197648 16.354861 61.25
## 11354 30 5 20 19 51 59 6 42.817527 16.766630 86.75
## time_offset accuracy voltage colony_code
## 11353 -23.16 5.226000e-06 3.98 Z
## 11354 -23.31 2.183421e-06 4.02 Z
## bird_id dttm deploy_year
## 11353 20_Tag41108_Z-95 (2nd Parent) 2020-05-30 16:33:05 2020
## 11354 20_Tag41108_Z-95 (2nd Parent) 2020-05-30 19:51:59 2020
## 'data.frame': 11354 obs. of 17 variables:
## $ day : int 24 24 24 24 24 24 24 24 24 24 ...
## $ month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ year : int 19 19 19 19 19 19 19 19 19 19 ...
## $ hour : int 0 1 1 1 2 2 2 3 3 6 ...
## $ minute : int 49 9 29 49 9 29 49 9 29 47 ...
## $ second : int 9 3 3 9 5 6 9 5 5 38 ...
## $ satellites : int 5 5 5 4 6 7 4 6 5 6 ...
## $ latitude : num 42.8 42.8 42.8 42.8 42.8 ...
## $ longitude : num 16.9 16.9 16.9 16.9 16.9 ...
## $ altitude : num -1.5 5.25 143.5 143 98 ...
## $ time_offset: num 2.91 2.79 2.79 2.91 2.93 ...
## $ accuracy : num 4.70e-07 9.25e-07 3.52e-07 7.13e-07 4.87e-06 ...
## $ voltage : num 4.12 4.08 4.08 4.1 4.12 4.1 4.12 4.12 4.12 4.1 ...
## $ colony_code: chr "Z" "Z" "Z" "Z" ...
## $ bird_id : chr "19_Tag17600_Z-9" "19_Tag17600_Z-9" "19_Tag17600_Z-9" "19_Tag17600_Z-9" ...
## $ dttm : chr "2019-05-24 00:49:09" "2019-05-24 01:09:03" "2019-05-24 01:29:03" "2019-05-24 01:49:09" ...
## $ deploy_year: int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...