As is always the case, each entry is uniquely identified by an id in a database. Interestingly, PDB and NDB have adopted radically different approaches in picking up their ids.
- PDB id is (currently) 4 characters long: the first character is a numeral in the range 1-9, while the rest can be either numerals or letters. Early PDB entries could be acronyms. For example, 1bna for the famous Dickerson-Drew B-DNA dodecamer with sequence CGCGAATTCGCG, the first full turn B-DNA duplex; and 1mbn for myoglobin, the first solved protein structure. Recently, due to the quick increase of deposited macromolecular structures, the PDB ids "are automatically assigned and do not have any meaning." (page 9)
- NDB id by design seems to contain more information, even though detailed specifications cannot be located from online search. For examples, A-DNA, B-DNA and Z-DNA ids start with AD, BD, and ZD, respectively; protein-DNA complexes start with PD; and ribosomal RNAs start with RR etc. Furthermore, the third letter also has a meaning in the NDB code. E.g., L in BDL084 means 12 since it is the 12th letter in English alphabet, thus we know BDL084 is a B-DNA dodecamer. Similarly, the H in ADH026 means 8, thus ADH026 is an A-DNA octomer.
I have no idea of how many such mis-picked ids exist in the NDB. What is clear is that as more and more weird structures (especially RNA) are deposited (or extracted from the PDB), it would be even harder to pick up an id in its 'canonical' sense. Inconsistency will then become a big issue. In contrast, PDB ids do not have such a problem by design, whether an id is an acronym or a random, automatic pick by a software program.
Over the years, NDB has served me no other purposes than as a pre-selected subset of PDB entries containing nucleic acid structures. It has become clear to me that starting directly from PDB would be a better choice, if nothing but to reduce a level of redundancy, and to avoid possible mis-leading ids.