# Pivot wider Jupytool 

This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. 
This script is the final part of the data preparation for the ecoregionalization Galaxy workflow.   

In [62]:
#Date : 22/05/2024
#Author : Seguineau Pauline & Yvan Le Bras 

#Load libraries
library(tidyr)

#load file 

input_path = "galaxy_inputs"

for (dir in list.dirs(input_path)){
    for (file in list.files(dir)) {
        file_path = file.path(dir, file)}
}

file = read.table(file_path,header=T, sep = "\t")

#Run pivot_wider function
pivot_file = pivot_wider(data = file,
                        names_from = phylum_class_order_family_genus_specificEpithet,
                        values_from = individualCount,
                        values_fill = 0,
                        values_fn = sum)

#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data
for(c in 3:length(pivot_file)){
    pivot_file[c][pivot_file[c]>=1] <- 1}


write.table(pivot_file, "outputs/pivot_file.tabular", sep = "\t", quote = F, row.names = F)

In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.

Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies.