ChIP calculations

From DebuLab
Jump to: navigation, search

Data analysis for ChIP experiments will vary based on the controls and references one decides to use. Here we will discuss several ways of representing ChIP data and the benefits and shortcomings of each method. The preferred method in our lab is to express ChIP results as percent recovery and plot them alongside IgG pull-down to show background noise.

For the purpose of this discussion let's assume that our primer efficiency is 100%, which means that each PCR cycle will double the amount of target DNA. So our efficiency coefficient (let's call it Ep) is 2 (i.e. doubles). Different primer sets will have different PCR efficiency; however, we find that primers with amplicons in the range of 70-150 are comparable in efficiency and we rarely need to standardize them.

Percent Recovery

One of the most obvious questions that comes to mind when analyzing ChIP data is "How much of the total DNA did the antibody pull down?" To answer this question, we express our ChIP results as percent recovery. This is done by taking an aliquot of the initial sample after sonication, but before adding the antibody and subjecting it to qPCR alongside the precipitated DNA. Typically 1/1000th of a single ChIP reaction yields good PCR values. This fraction is important and will be used in later calculations.

We follow the procedure outlined in the chromatin immunoprecipitation protocol. This method of data analysis requires that all samples are treated in an identical fashion. Particular care should be take at the DNA Isolation portion of the procedure to make sure that the same amount of DNA is recovered.

When performing calculations with Ct values (which are essentially exponents), we prefer to immediately exponentiate the values and express them as "amount of DNA relative to some arbitrary Ct value". The "arbitrary Ct value" we typically use is 40. Not having to deal with exponents when performing calculations results in more accurate averaging of replicates, etc.

Let's say that qPCR of out precipitated DNA yielded a CtP of 26 and our input CtI was 31.

We will first have to adjust our input Ct value to account for the fact that we only took 1/1000th of the ChIP reaction. We can simply multiply our exponentiated CtI by 1000 (i.e. $ \text{DNA input} = E_p^{40-Ct_I}*1000 = 2^{40-31}*1000 = 2^9 * 1000 $).

Percent recovery would be $ \dfrac{\text{DNA recovered}}{\text{DNA input}} * 100\% $ or $ \dfrac{E_p^{40-Ct_P}}{\text{DNA input}} * 100\% = \dfrac{2^{40-26}}{2^9 * 1000} * 100\% = 3.2\% $

Thus, we pulled down 3.2% of our input DNA. This may or may not seem significant, but we have no way of judging unless we can compare it to something else. A negative control would let us determine if this is in fact significant and not just background.

Negative control

There are a couple things that a negative control should take into account. First, the beads may be pulling down some DNA non-specifically and contributing to the signal. Even if our protein of interest is not there, we may still be seeing signal at the site.

IgG or no antibody

IgG or no antibody at all are commonly used to control for non-specific binding of chromatin to the beads. In theory, using this control should not pull down any chromatin. However, this is not the case in practice. We calculate the value for negative control the same way we do for the actual sample. A percent recovery of around 0.1% is generally a result of a clean ChIP pull down.

This would mean that the 3.2% recovery that we obtained with the specific antibody is 32 times larger than with negative control. This would suggest a pretty strong signal.

It may be tempting to express the ChIP data in terms of percent recovery fold over negative control, but we find that plotting background data next to signal gives a better representation and understanding of what is going on. Small fluctuations in background may significantly alter signal values without giving an indication of the cause behind these fluctuations

Off-target sequence

Using IgG or no antibody at all, however, does not account for non-specific binding of antibody to the chromatin. Our antibody may be binding everything in sight, and our signal might not be that significant even if it is 32 times stronger than IgG background. To account for this, we use sequences that are not expected to bind our protein of interest.

For transcription factors with specific binding sites, designing tiled or closely spaced set of primers will generally show a peak at the binding site that trails off as we get farther away. This is as good of a strategy for ChIP targets like histone modifications, which are expected to span larger areas of chromatin. In this case a far-away sequence may be used, like a promoter of an inactive gene if probing for histone modifications associated with gene activation.


In either case, a percent recovery value is calculated the same way as above. When tiled primers are used, plotting them as a scatter plot with data points joined by a line gives an effective representation of transcription factor binding. For off-target sequences a bar graph should be used.

  • Be careful when plotting data in scatter-plot format to properly account for different spacing between primer sets. Calculate the middle point of each primer set and use that value as the x value.

The down side to this negative control is that sometimes it is not clear where a particular protein is not expected to bind. It also requires designing additional primers and is not as versatile as IgG control.

Relative to common control

Some papers normalize the ChIP data to a common control like GAPDH or 18s rDNA. In an ideal ChIP reaction, if the protein is not expected to be present at these genes, there should be no DNA recovered that would amplify with GAPDH or 18s primers. However, there is always some non-specific binding and background is always present. This normalization technique takes advantage of this noise. 18s, GAPDH and several other genes exist in numerous copies across the genome and would produce the strongest signal from within the background noise.

In this method, there is no input DNA. Rather, each of the samples is expressed as fold over control.

Let's say that our sample CtP is 23 and out negative control is CtN is 27.

Then our data would be expressed as $ \dfrac{E_p^{40-Ct_P}}{E_p^{40-Ct_N}} = E_p^{Ct_N-Ct_P} $ which in our case would be $ 2^{27-23} = 2^4 = 16 $.

By doing this with all the samples we will obtain "normalized" values and can then compare them.

This normalization will account for non-specific binding by the antibody and the beads as well as potential error within the procedure, but because the amount of DNA pulled down is relatively small, the values may fluctuate significantly making this control unreliable. Furthermore, one must be certain that a particular protein or modification is not present at the control loci, because this would produce false negative results.

Finally as the reagents and techniques get better and better, the amount of background is expected to decrease. With lower levels of background the signal of the control would decrease and fluctuate even more, making this method of calculation even less reliable.