Load libraries


All samples

File retrieval and preparation

Get the metadata, which includes sample name, rna extraction batch, clean up batch, behavioural test, viral load, colony location, and colony ID

metadata <- read.csv("./metadata.csv", row.names=1)

Retrieve first row of the counts file, which will be the sample ID

rawcounts <- read.table("./counts.tsv", nrows=1)
colnames <- as.character(rawcounts)

Get rawcounts

  • row.names: use the first column as the row names
  • col.names: use the column names we retrieved above as column names
  • check.names: in this instance prevents R from prepending an “X” to the column names
  • skip: skip the first row because it’s just the sample ID
rawcounts <- read.table("./counts.tsv", row.names=1, col.names=colnames,

This has the effect of putting the rawcounts columns in the same order as the metadata rows. This is required by DESeq2

rawcounts <- select(rawcounts, rownames(metadata))

Check that they are now in the same order

all(rownames(metadata) == colnames(rawcounts))
## [1] TRUE

DESeq2 object

Create the DESeq2 object specifying viral load as the condition of interest

dds <- DESeqDataSetFromMatrix(countData = rawcounts, 
                              colData = metadata,
                              design = ~ viral_load)


dds <- estimateSizeFactors(dds)

Extract the normalised counts
If the normalized argument were specified as FALSE then we would extract the raw counts from the object

normalised_counts <- counts(dds, normalized=TRUE)

Variance stabilising transformation

This is a logarithmic transformation that moderates the variance across the mean
blind = TRUE ensures that the transformation should be blind to the sample information given in the design formula. This is important when performing quality assessment

vsd <- vst(dds, blind=TRUE)

Extract the vst matrix from the object

vsd_matrix <- assay(vsd)

Compute pairwise correlation values

vsd_correlation <- cor(vsd_matrix)


Viral load

pheatmap(vsd_correlation, annotation = select(metadata, viral_load))

Behavioural test

pheatmap(vsd_correlation, annotation = select(metadata, behavioural_test))

Colony location

pheatmap(vsd_correlation, annotation = select(metadata, colony_location))


Viral Load

plotPCA(vsd, intgroup="viral_load")

Behavioural test

plotPCA(vsd, intgroup="behavioural_test")

Colony location

plotPCA(vsd, intgroup="colony_location")