---
title: "BGLR_demonstration"
author: "Matthew McGowan"
date: "April 22, 2020"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Dependencies

BGLR is a self-contained R package for calculating Bayesian genomic selection models. It is available on CRAN and is easy to install and load. When you want to include a package in an Rmarkdown, but don't know if it will already be installed on all systems running the script, you can use the 'require' function to check if the package is available or needs to be installed. 

```{r setup}
test_install <- require("proto")
if (!test_install)
{
  install.packages("proto")
  library(proto)
}

# Read in the data
myGD=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T)
row.names(myGD) = myGD[,1]
myGD_mat<-myGD[,-1]

myGM <- read.table("http://zzlab.net/GAPIT/data/mdp_SNP_information.txt", header = T, stringsAsFactors = F, sep = "\t")
myY <- read.table("http://zzlab.net/GAPIT/data/mdp_traits.txt", header = T, stringsAsFactors = F, sep = "\t")
```

## Data Formatting

Again, because we know the demo data contains monomorphic markers, we need to pre-process it to avoid any downstream issues.

The genotype matrix can be in either c(-1,0,1) or c(0,1,2) format. There are also helper functions for using PLINK formatted data in BGLR (see documentation).

```{r data_formatting, echo=FALSE}
# Remove monomorphic markers
maf <- calc_maf_apply(myGD_mat, encoding = c(0, 1, 2))
mono_indices <- which(maf == 0)
taxa <- row.names(myGD)
myGD_mat = myGD_mat[,-mono_indices]
myGM = myGM[-mono_indices,]
  
# Subset a single phenotype by taxa
myY <- myY[,1:2]
taxa_match <- match(myGD$taxa, myY$Taxa)
myY <- myY[taxa_match,]
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.