The goal of this website is to explore the software used in weighted gene co-expression network analysis, WGCNA.
Briefly, the WGCNA software is designed to construct gene co-expression networks, that is, to find connections between multiple genes. In these networks, each gene is represented by a node, and the strength of the connection between nodes (genes) is based on gene expression level similarity. This similarity is usually measured by Pearson’s correlation and then transformed. To read more about this transformation, or the mathematics behind gene co-expression networks, see the WGCNA theory papers here.
In addition to characterizing gene co-expression networks, there is also interest in integrating these networks with clinical data, such as weight. The goal of this process is identify potentially biologically meaningful groups of genes. Studying these groups of genes could be helpful for identifying the key driver genes in these groups. The key drivers are candidate genes for the measured traits. These genes of interest could then be subjected to experimental validation as causal genes or evaluated for efficacy as biomarkers for one or more of the traits.
In the first WGCNA tutorial in R, I analyzed a dataset of microarray-based gene expression measurements from 135 female mice. In addition to microarray data from each of the mice, there are also 25 body and metabolic-related traits, such as weight and glucose level. I am interested in identifying which genes/group of genes contribute most to each of these traits. For an overview of this process and a typical workflow, see this short paper.
Below is code that I ran as part of tutorial 1. Most of the code was taken from the R scripts provided by the WGCNA authors, but most of the annotation outside of the code chunks is my own.
To see the data input and cleaning, click here.
To see clustering based on the gene expression values of the mice, click here.
To see the clinical data collected on the mice, click here.
To see how a co-expression network is created, click here.
To see the final clusters of highly interconnected genes (“modules”), click here.
To see preliminary analysis between traits and the gene modules, click here. Note: This topic is covered in greater depth in tutorials 2 and 3.
To look for enrichment, e.g. Gene Ontology terms, in the modules, click here.
To see other plots created during this tutorial, click here.
Future plans: Complete the second WGCNA tutorial. This tutorial covers comparing co-expression networks across 2 or more datasets, such as case-control datasets, before-after treatment comparisons, etc. It also covers integration of these networks with clinical data in more depth.
This R Markdown site was created with workflowr