> I would love to learn from your work. I appreciate this opportunity. > > Here are some things I already know. R-trees store ranges in each > dimensions, e.g., rectangles or boxes in 2-D or 3-D, whereas Z-order B-trees > (or Hilbert or any other space-filling curve) store points. R-trees have > tricky search algorithms & cost functions because one might need to search > multiple descendents from any given internal tree node, since ranges > represented by child nodes might overlap. Splitting nodes can be quite > tricky because the overlap must be minimized heuristically. Also, building > an R-tree for a large pre-existing data set is tricky, whereas it is well > known how to sort & append the entries in B-trees. > > Hilbert is slightly more effective that Z because it doesn't have any > diagonal lines, but I think the math is more complex, and computing > precisely the ranges where the space filling curve enters and exits the > query box is more tricky. In fact, I have worked the entry & exit points out > for Z curves using a method different than the one patented by Bayer and > Markl. > > One can map ranges in N dimensions to points in twice as many (2N) > dimensions. Doing so will result in a situation in which only half the > domain is used. For example, if I have a 1-D space and want to store ranges, > I need a 2-D Z-curve. Presuming that the (0,0) point is at the bottom right, > and presuming that a 1-D range is mapped to a 2-D point by making the left > end of the range the x-value in the 2-D space, there will be a diagonal > through the indexed space from bottom left to top right. None of the ranges > will map to point above left of this diagonal, since for all ranges in 1-D > the left end point has a lower value than the right end point. > > In order to execute a "contains" query, i.e., search for rectangles that are > fully contained within a query range, say (lo, hi), I map the query range to > two points in 2-D, namely (x=lo, y=hi) and (x=hi, y=lo), and then search for > data in the square defined by these two points. This is a fairly efficient > operation in a Z-order index. For range intersection, I search the rectangle > defined by (x=hi, y=lo) and the bottom right corner of the 2-D space, again > fairly efficiently. If I want to, I can limit the search by a line parallel > to the diagonal discussed above, I think, but I am not sure about that yet. > > For nearest-neighbor problems, one has to define an initial bounding box (a > square) and search there. Actually, only the largest circle within the > square can be used. If this search doesn't find a result, e.g., because all > points in the square fail to meet some additional criterion (e.g., I am only > looking for Italian restaurants, not for all possible tourist facilities), > then I need to consider a larger bounding box (square). Within that box, I > must exclude element considered within the earlier circle, and I can > actually omit to fetch entries from the index that fall into an exclusion > box, i.e., the largest square within the earlier circle. > > Here is the key question we need to answer. Is it sufficient to implement > Z-order B-trees only, or are R-trees truly required? What optimizer smarts > are required for Z-order B-trees? What overall performance loss will we need > to tolerate, i.e., possibly explain to potential customers? > > A second question is this: Do we need transformations on the dimensions? For > example, if all actual data point fall to the left, maybe I should scale the > horizontal axis such that Z shapes are narrow on the left but wide on the > right. I am quite sure that only linear and very simple non-linear > transformations are realistically possible in a product. I am also quite > sure that each transformation would apply to one dimension only, i.e., > transformations would orthogonal to each other, because otherwise, a > rectangular query box would not be rectangular after the transformation. > > If you have any ideas or suggestions or advice on these key questions, I > would love to learn more from you. I greatly appreciate your help. > > Thank you, > Goetz > > > > -----Original Message----- > From: Chang-tien Lu [ mailto:ctlu@cs.umn.edu] > Sent: Tuesday, November 14, 2000 8:31 PM > To: Goetz Graefe > Subject: R-Tree v.s. B-Tree + Z-order > > > Dear Dr. Graefe: > > First of all, I would like to thank you > for the treat at the Old Spaghetti Factory yesterday. > Right now, I am working with Prof. Shashi Shekhar > on the spatial database group, and have done some > work on join-index based spatial join processing > as in the following link: > > http://www-users.cs.umn.edu/~ctlu/papertalk.html > > We are interested in the comparison between R-tree > and Z-order + B-tree, or related spatial join processing. > Could you give us some suggestion from your work > at Microsoft SQL server? > > > Thank you very much! > > > Chang-tien Lu > > > > > > >