View on GitHub

Strand bias in STR motifs


Summary: Certain STR motifs are much more likely to occur on the coding strand in transcribed regions, suggesting STRs may play an important role in properties of transcription.

Short tandem repeats (STRs) are ubiquitous across the genome, with an average of several STRs per kilobase. Many of these STRs occur in transcribed regions (UTRs, exons, and introns). While most of these repeats are thought to have no functional consequences, some of them have been implicated in regulating transcription, suggesting that not all STRs are “neutral”.

If STRs indeed did not play any importance in transcription, one would expect that the strand on which a certain STR motif occurs shouldn’t matter. Therefore, if we look at for instance all repeats with an “AC” motif (this includes e.g. ACACACAC but also TGTGTGTG, since these are just the same motif on opposite strands), then we should be equally likely to see the “AC” repeat on the coding strand as we would to see the “TG” repeat on the coding strand. The same applies for other motif species: e.g. for “AAC” vs. “GTT” repeats, if STRs did nothing we’d expect to see “AAC” just as often as “GTT on the coding strand.

In their paper in PLOS One last year, Sawaya et al. indeed found strand biases for STR motifs around promoters. For example, they found for A/T and AC/TG motifs, there is a phase shift that occurs around transcription start sites: “A” and “AC” are more likely to be on the coding strand upstream of the TSS, whereas “T” and “GT” are more likely to be on the coding strand downstream of the TSS (see Figure 4 of their paper).

I decided to examine whether strand biases for specific motifs exist across transcribed regions compared to other parts of the genome. For each of commonly occurring STR motifs, I asked what percent of the time the “A” rich motif was on the coding strand for STRs occurring in:

The results showed some pretty clear biases:

AC motifs strand bias

AG motifs strand bias

AT motifs strand bias


So the question: what drives the enrichment of “T” rich motifs on the coding strand in transcribed regions?. Several possible explanations:

While it is unclear what is driving these biases, clearly they exist and likely tell us something important about the role of STRs in transcription.

comments powered by Disqus