Colloquium: Analysis of Epigenome-Wide DNA Methylation Data
Abstract: DNA methylation is an epigenetic mechanism used by cells to control gene expression and involves the addition of a single methyl group to cytosine at position 5. Genome-wide DNA methylation measures are now routinely used to investigate their association with complex diseases and traits. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, disease status, and may be impacted by aging and exposure to environmental factors such as diet or smoking. These wide-range correlations pose analytical challenges in epigenome-wide association studies (EWAS), including reverse causality and confounding by non-genetic factors. In this talk, I will cover two separate topics when analyzing EWAS data: 1) Inherent to the array-based DNA methylation measures, e.g., Illumina HumanMethylation450 (HM450) or Infinium MethylationEPIC chips, is the associated technical variation of non-biological interest. We have used technical replicates from the Atherosclerosis Risk in Communities (ARIC) study to evaluate reliability of methylation measures from HM450 array, and proposed statistical methods to incorporate known technical variation, e.g., bead-to-bead variation, in downstream analysis; 2) since the methylation level is mitotically heritable but also sensitive to environmental exposures, both short-range and long-range correlation can exist between DNA methylation sites. We computed and characterized epigenome-wide correlation between all pairs of CpG sites using the ARIC data. We compared correlation patterns among CpG sites in close proximity to those in long distance or even on different chromosomes. We also evaluated the impact of adjustment for technical and environmental factors on these correlations.