Xiang-Jun's Corner: Meaning of nucleotide IUPAC codes

Sunday, July 26, 2009

Meaning of nucleotide IUPAC codes

Today, anyone with some basic knowledge of biochemistry should be familiar with A, C, G and T, the four bases of DNA, and probably the A-T and G-C Watson-Crick base-pairs as well. The meaning of the one-letter abbreviations is very clear: A for Adenine, C for Cytosine, G for Guanine, and T for Thymine. Of course, for RNA, there is the U (for Uracil) in place of T of DNA.

In the early days when I entered into the field of DNA structure, I also learned that R stands for puRine, i.e., A and G, and Y for pYrimidine, i.e., C and T (U). Trained as a chemist, I had no difficult at all in understanding and remembering them. To process base sequences in bioinformatics projects, I have come across the IUPAC degeneracy codes of nucleotides, such as S, W, M, K D, V, etc, which I had never been able to really memorize what they represent, except for N (A, C, G, T).

My confusions have been clarified completely, however, due to the web document I happened to find: "Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences" (1984) by the Nomenclature Committee of the International Union of Biochemistry (NC-IUB). This is the document I wish I could have known of from the very beginning. For completeness of this post, here is a summary table of the whole IUPAC codes. It is based on Table 1 in the above document except for uracil and gap:

Symbol Meaning          Origin of designation
-----------------------------------------------------------
G      G                Guanine
A      A                Adenine
T      T                Thymine
C      C                Cytosine
U      U                Uracil
R    G or A             puRine
Y    T or C             pYrimidine
M    A or C             aMino
K    G or T             Keto
S    G or C             Strong interaction (3 H bonds)
W    A or T             Weak interaction (2 H bonds)
H    A or C or T        not-G, H follows G in the alphabet
B    G or T or C        not-A, B follows A
V    G or C or A        not-T (not-U), V follows U
D    G or A or T        not-C, D follows C
N    G or A or T or C   aNy
    . or -             gap
-----------------------------------------------------------

Xiang-Jun's Corner

Sunday, July 26, 2009

Meaning of nucleotide IUPAC codes

No comments:

Post a Comment

About Me

Links

Topics

Blog Archive