--- title: "BGLR_demonstration" author: "Matthew McGowan" date: "April 22, 2020" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Dependencies BGLR is a self-contained R package for calculating Bayesian genomic selection models. It is available on CRAN and is easy to install and load. When you want to include a package in an Rmarkdown, but don't know if it will already be installed on all systems running the script, you can use the 'require' function to check if the package is available or needs to be installed. ```{r setup} test_install <- require("BGLR", "ggplot2") if (!test_install) { install.packages("BGLR") library(BGLR) library(ggplot2) } source("FUNCTIONS_GS.R") # Read in the data myGD=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T) row.names(myGD) = myGD[,1] myGD_mat<- as.matrix(myGD[,-1]) myGM <- read.table("http://zzlab.net/GAPIT/data/mdp_SNP_information.txt", header = T, stringsAsFactors = F, sep = "\t") myY <- read.table("http://zzlab.net/GAPIT/data/mdp_traits.txt", header = T, stringsAsFactors = F, sep = "\t") ``` ## Data Formatting Again, because we know the demo data contains monomorphic markers, we need to pre-process it to avoid any downstream issues. The genotype matrix can be in either c(-1,0,1) or c(0,1,2) format. There are also helper functions for using PLINK formatted data in BGLR (see documentation). While this script will demonstrate the basics of using BGLR, I highly recommend going through the vignette posted on the CRAN package url: [HERE](https://cran.r-project.org/web/packages/BGLR/index.html) ```{r data_formatting, echo=FALSE} # Remove monomorphic markers maf <- calc_maf_apply(myGD_mat, encoding = c(0, 1, 2)) mono_indices <- which(maf == 0) taxa <- row.names(myGD) myGD_mat = myGD_mat[,-mono_indices] myGM = myGM[-mono_indices,] # Subset a single phenotype by taxa trait_col <- 2 trait_name <- names(myY)[trait_col] singleY <- myY[,c(1,trait_col)] taxa_match <- match(myGD$taxa, singleY$Taxa) singleY <- singleY[taxa_match,] singleY_vect <- as.vector(singleY[,2]) ``` ## BGLR basics ```{r BGLR_basics, echo = F} # Testing a single model # Set missing values phenotype_withNA <- singleY_vect randNA <- sample(1:length(singleY_vect), 30) phenotype_withNA[randNA] <- NA # Calculate the GS model using BGLR bayesA_ETA<-list(list(X=myGD_mat,model="BayesB")) bayesA_results <- BGLR(y = phenotype_withNA, ETA = bayesA_ETA, nIter=5000, burnIn=1000) bayesA_predictions <- predict(bayesA_results) bayesA_acc <- cor(singleY_vect[randNA], bayesA_predictions[randNA]) ``` ## Testing Multiple Models ```{r BGLR_model_testing, echo = F} model_testing_results <- test_all_models_BGLR_cv(genotypes = myGD_mat, phenotype = singleY_vect, nIter = 100, burnIn = 20) ```