In the early days when I entered into the field of DNA structure, I also learned that R stands for puRine, i.e., A and G, and Y for pYrimidine, i.e., C and T (U). Trained as a chemist, I had no difficult at all in understanding and remembering them. To process base sequences in bioinformatics projects, I have come across the IUPAC degeneracy codes of nucleotides, such as S, W, M, K D, V, etc, which I had never been able to really memorize what they represent, except for N (A, C, G, T).
My confusions have been clarified completely, however, due to the web document I happened to find: "Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences" (1984) by the Nomenclature Committee of the International Union of Biochemistry (NC-IUB). This is the document I wish I could have known of from the very beginning. For completeness of this post, here is a summary table of the whole IUPAC codes. It is based on Table 1 in the above document except for uracil and gap:
Symbol Meaning Origin of designation
-----------------------------------------------------------
G G Guanine
A A Adenine
T T Thymine
C C Cytosine
U U Uracil
R G or A puRine
Y T or C pYrimidine
M A or C aMino
K G or T Keto
S G or C Strong interaction (3 H bonds)
W A or T Weak interaction (2 H bonds)
H A or C or T not-G, H follows G in the alphabet
B G or T or C not-A, B follows A
V G or C or A not-T (not-U), V follows U
D G or A or T not-C, D follows C
N G or A or T or C aNy
. or - gap
-----------------------------------------------------------
No comments:
Post a Comment
You are welcome to make a comment. Just remember to be specific and follow common-sense etiquette.