The CIP Data

The International Potato Center (CIP) was founded in \(1971\) as a research-for-development organization with a focus on potato, sweetpotato and Andean roots and tubers. It delivers innovative science-based solutions to enhance access to affordable nutritious food, foster inclusive sustainable business and employment growth, and drive the climate resilience of root and tuber agri-food systems. Headquartered in Lima, Peru, CIP has a research presence in more than \(20\) countries in Africa, Asia and Latin America.

CIP is a CGIAR research center, a global research partnership for a food-secure future. CGIAR science is dedicated to transforming food, land and water systems in a climate crisis. Its research is carried out by 13 CGIAR Centers/Alliances in close collaboration with hundreds of partners, including national and regional research institutes, civil society organizations, academia, development organizations and the private sector.

CIP data contains identifying information for each experiment and field-based measurements of the plants.

CIP data dictionary

Use ggplot() re-create the following plot below. Before plotting, you will first need to filter your data set to only include observations with a release year of 2014 and a harvest period of 120 days

Lectura de datos

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)

# Cargar datos desde el archivo CSV
datos <- read.csv("CIP.csv")

Filtrar datos

Se filtran los datos por la columna release == 2014 y harvest == 120 days

# Filtrar datos para incluir solo las observaciones deseadas
datos_filtrados <- datos %>% filter(release == 2014, harvest == "120 days")

head(datos_filtrados)

##      trial    type season release    loc     geno  harvest rep row column nops
## 1 HUA-ST02 variety  2019A    2014 Huaral PZ06.120 120 days   1   1      4  120
## 2 HUA-ST02 variety  2019A    2014 Huaral     Sumi 120 days   1   2      3  120
## 3 HUA-ST02 variety  2019A    2014 Huaral  Abigail 120 days   1   2      4  120
## 4 HUA-ST02   check  2019A    2014 Huaral Salyboro 120 days   1   2      5  120
## 5 HUA-ST02 variety  2019A    2014 Huaral   Isabel 120 days   1   3      3  120
## 6 HUA-ST02 variety  2019A    2014 Huaral   Isabel 120 days   2   1      6  120
##   noph     vw nocr   crw  ncrw   trw
## 1  109 141.40  189 26.37 10.91 37.28
## 2  117 142.35  241 48.48 16.11 64.59
## 3  111  96.78  328 86.53 12.28 98.81
## 4   94 124.78   61  9.73  7.75 17.48
## 5   97 101.81  152 32.55  4.99 37.54
## 6  102 106.18  171 46.43  4.37 50.80

# Crear el gráfico utilizando ggplot
ggplot(datos_filtrados, aes(x = geno, y = trw, fill = geno)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Warning: Removed 10 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Recrear gráfica

Use ggplot() re-create the following plot below. Clues: - Change axis labels and create a title. - Use geom_smooth to generate the trend line (linear trend)

datos %>% 
  ggplot(aes(x=crw, y=trw))+geom_point()+
  labs(
       subtitle = "Scatter Plot of Comercial vs Total Root Weight",
      y = "Weight  of total roots(kg)",
      x = "Weight  of commercial (kg)",
       color = "red")+
  geom_smooth(method = "lm", se = FALSE, col = "blue") +
  theme(axis.text.x = element_text(angle = 0, hjust = 1))

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 81 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 81 rows containing missing values or values outside the scale range
## (`geom_point()`).

Diagramas de barras

Recreate the next plot using ggplot() function. Clues:

• Use facet_wrap to split the plot into different trial. 

• Use geom_bar with “summary” argument to gather the mean of commercial root weight per plot. 

• To set axis x ticks diagonal use theme(axis.text.x = element_text(angle = 45, hjust = 1))

datos %>% 
  ggplot(aes(x=geno,y=crw))+geom_col()+
  labs(    subtitle = "Mean Commercial Root Weight per Genotype Across Trials",
    y = "Mean Commercial Root Weight per Plot",
    x = "Genotype",
    color = "red")+
  facet_wrap(~trial)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).