Saturday, August 22, 2009

Fit a least squares plane to a set of points

In the early days when I first got into the field of DNA structures, I was interested in knowing details on how the "best mean (base-pair) plane" is actually defined. I could not find a solution from the articles I read at that time. For clarification, I sent emails to some authors whose papers mentioned the concept, still the results were less than satisfactory.

The mystery was finally solved after I read the following two articles from early days (traced initially from a comment in the source code of Dickerson's NewHelix, if I recall it correctly):
  • D. M. Blow (1960). ``To Fit a Plane to a Set of Points by Least Squares.'' Acta Cryst., 13, 168.
  • V. Schomaker et al. (1959). ``To Fit a Plane or a Line to a Set of Points by Least Squares.'' Acta Cryst., 12, 600-604.
As is clear from the titles, they focused just on least-squares plane/line fitting algorithms. More specifically, the worked example(s) help make the underlying concept clear.

As a concrete example, let's work out the least-square plane of an adenine using Octave/Matlab. The implementation here is much simpler than those in the literature cited above. Note: this example is mentioned in the 3DNA home page in the Technical Details section, so the order of atoms is kept the same as there.

The x-, y-, and z-coordinates are as follows:
xyz = [
16.461 17.015 14.676 % 91 N9
15.775 18.188 14.459 % 100 C4
14.489 18.449 14.756 % 99 N3
14.171 19.699 14.406 % 98 C2
14.933 20.644 13.839 % 97 N1
16.223 20.352 13.555 % 95 C6
16.984 21.297 12.994 % 96 N6
16.683 19.056 13.875 % 94 C5
17.918 18.439 13.718 % 93 N7
17.734 17.239 14.207 % 92 C8
];

Cmtx = cov(xyz)
[V, LAMBDA] = eig(Cmtx)

The covariance matrix (Cmtx), its corresponding eigenvectors (V) and eigenvalues (LAMBDA) are as follows:
Cmtx =
1.66800 -0.50154 -0.32530
-0.50154 2.06703 -0.58401
-0.32530 -0.58401 0.30605

V =
0.27367 0.83264 -0.48147
0.32243 0.39220 0.86152
0.90617 -0.39102 -0.16113

LAMBDA =
0.00001 0.00000 0.00000
0.00000 1.58452 0.00000
0.00000 0.00000 2.45656
The smallest eigenvalue of Cmtx, 0.00001 is close to zeros, indicating that the base is almost perfectly planar. The corresponding unit eigenvector, i.e., the least squares base normal is: [0.27367 0.32243 0.90617]. Of course, the geometric average of the atoms defines a point the ls-plane pass through.

Referring back to the concept of "best mean (base-pair) plane", it could simply be defined using all base atoms of the pair, or alternatively, as the mean of the two base normals. Obviously, they would lead to (slight) numerical differences. Astute viewers would also notice some other subtleties.

2 comments:

  1. Hi
    I am working on a project in which I
    need to identify the plane of best fit between random 3d points in the
    form xyz.

    I am curious to know if this method obtains for you a plane equation
    of the form

    Ax + By + Cz + D = 0

    I believe the eigenvector values are your A, B and C coefficients and
    then you use the mean of x,y and z to define a point on that plane in
    order to solve for D. Is my interpretation of your method correct ?

    Any help would be most appreciated

    yours faithfully

    ReplyDelete
  2. Let (n1, n2, n3) be the normal vector to the (least-squares) plane, (c1, c2, c3) be a known point in the plane, and (x, y, z) an arbitrary point in the plane, then the following condition holds:

    (x - c1) * n1 + (y - c2) * n2 + (z - c3) * n3 = 0

    which expands and simplifies to:
    n1*x + n2*y + n3*z - (n1*c1+n2*c2+n3*c3) = 0

    and matches to Ax + By + Cz + D = 0

    So your interpretation is right.

    To make the above discussion clear and concrete, try to work out the following example:

    Suppose you have three points [1, 0, 0], [0, 1, 0], and [0, 0, 1], which by definition lies in a plane, use the method outlined in this blog post, derive the plane equation. Please post your solution here.

    ReplyDelete

You are welcome to make a comment. Just remember to be specific and follow common-sense etiquette.