Synopsis: This paper proposes a hierarchical structure called
HS-bitmap index to represent document-term matrix. The authors
implemented their data structure on PostgreSQL and observed it to perform
better than an inverted index. A short-coming might be that HS-bitmap
index takes more space than the inverted index even after compression.
Synopsis: By far the most important thing to get out of this
paper is that packing the bitmaps as tightly as possible is a good
approach. This is a big part of the improvements on the RIDBit code
during the project -- the text of the paper alluded to the improvements
twice. The second point to note is that it demonstrated again that
vertical partitioning is a good thing for read-only data.
Synopsis: This paper presents a set of multi-level binning
strategies. In their tests, the multi-level indexes were shown to be
about 10 times faster than the single-level indexes. Potentially, most
of the performance gain is from reduction in the number of bitmaps
accessed during query processing. The same authors
have published some timing measurements before [IPDPS 2006],
the key contribution of this paper is a systematic analysis of the
multi-level indexes. This is also a big part of Sinha's PhD thesis
on indexing scientific data.
The approach of using multi-level bitmap indexes was also reported
earlier in a tech report
in 2001.
Approximate encoding for direct access and query processing over compressed
bitmaps.
Tan Apaydin,
Guadalupe
Canahuate,
Hakan
Ferhatosmanoglu, and
Ali
Saman Tosun.
In Proceedings of the 32nd international conference on Very large
data bases, Seoul, Korea. Pages: 846 - 857. 2006.
Synopsis: This paper presents an indexing technique based on
multiple hashes. This indexing method can reduce the index sizes as
well as the query response time. The trade-off is that it answers
queries approximately. The paper also shows a way to predict the rate
of false positives.
Note that the technique presented is similar to
Bloom filters.
Synopsis: This presentation describes a bitmap index
implementation effort at GreenPlum. Since Bizgres is based on Postgres
and bitmap
is making some headway into PostgreSQL, we may see full-fledge
bitmap index in PostgreSQL yet.
Optimizing bitmap indices with efficient compression.
K. Wu,
E. Otoo,
and
A. Shoshani.
2006. ACM Transactions on Database Systems, v 31, pages 1-38, 2006.
Synopsis: This paper presents a formal analysis of the WAH
compressed (basic) bitmap index. It formally establishes the WAH
compressed index as one of the optimal indexing methods, an
exculsive club including the best of B-tree variants such as B*-tree and
B+tree. This puts the WAH compressed bitmap index at the same
theoretical standing as the best of the B-trees. However, compressed
bitmap indexes have an unique advantage. Because it is much easier to
combine the results from multiple bitmap indexes, it is more efficient
to answer a query using multiple bitmap indexes than using multiple
B-Tree indexes. Therefore, bitmap indexes are more appropriate in
query-intersive applications, such as data warehousing and business
intelligence.
Supporting RFID-based item tracking applications in Oracle DBMS
using a bitmap datatype.
Y. Hu,
S. SundaraT. ChormaJ.
Srinivasan. VLDB 2005. Pages: 1140-1151.
Synopsis: This paper introduces a new data type in ORACLE DBMS
called bitmap and a set of techniques to work with these bitmaps
efficiently. The Summary Bitmap Tree described in this paper is similar
to certaim multi-level bitmap indexes with equality encoding.
Synopsis: This paper describe the critical algorithm to make
compressed bitmap index answers queries in time that is no worse
than a linear function of the number of hits (i.e., the number of
records selected by the query condition). In terms of
computational
complexity, this places WAH compressed bitmap index in the same
category of optimal indexing methods as the best of the
B-tree indexes.
The critical algorithm is to perform bitwise logical OR operations on a
large number of small compressed bitmap in time that proportional to the
total size of all the bitmaps involved.
Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes.
M. Morzy,
T. Morzy,
A. Nanopoulos,
Y. Manolopoulos.
Advances in Databases and Information Systems, 2003. Pages: 236-252.
Synopsis: The Hierarchical Bitmap Index proposed in this paper is
in fact of hierarchy of signatures that are designed to answer the
set-containment queries. It is a version of a
signature
tree. The signatures are organized into a tree structure with the
higher level nodes indicating the non-empty nodes at the next
level. The leaf nodes contains information about the presence of a
value in a set.
Synopsis: This is a brief write up about
WAH
compression and its performance on a small subset of data from a
high-energy project called STAR.
The work with STAR eventually lead to the development of
Grid
Collector.
Synopsis: This paper describes a number of algorithms for
operating on the bitmaps of a bit-sliced index to compute various
qualities such as SUM, MAX and so on. These operations are useful
for aggregate queries. This paper also addresses the issue of
Bit-Sliced Term-Match for
Information
Reterival.
BitCube: A Three-Dimensional Bitmap Indexing for XML
Documents.
J. P. Yoon,
V. Raghavan,
V. Chakilam,
and
L. Kerschberg.
Journal of Intelligent Information Systems, Vol. 17, November 2001.
Synopsis: In this paper, the authors describe a bitmap indexing
technique for clustering XML documents. The BitCube is formed from
three dimensions of documents (d), XML-elements (p), and terms (w). The
BitCube can be used to efficiently locate documents containing
XML-elements and terms. In addition, it can also be used to computer
aggregate values such as avereges and standard deviations.
Synopsis: This dissertation presents Spatial Bit-Sliced (SBS)
index based on Bit-Sliced indexing method. The main conclusion from
their studies is that the performance of SBS index structure is
“immune” to the query size, regardless of the query
size, while other indexing strategies such as R*-tree exhibit
significant variations depending on the query size.
Synopsis: This paper describes a set of performance
measurements comparing a number of different bitmap compression
techniques. In the tests, no compression technique was the clear
winner, even though the Byte-aligned Bitmap Code (BBC) was the most
efficient in many cases. The author eventually developed a strategy
to choose the most appropriate compression method based
characteristics of the bitmaps.
An efficient bitmap encoding scheme for selection queries.
C. Chan
and
Y. Ioannidis.
1999. SIGMOD '99. ACM Press, New York, NY,
215-226.
Synopsis: The Group Bitmap Index was designed to address the
association rule retrieval problem. Two variations are proposed in this
paper, the Simple Group Bitmap Index and Hash Group Bitmap Index . The
Simple Group Bitmap Index is essentially a bitmap join index between
item number and the group number (transcatioin ID). The Hash Group
Bitmap Index can be thought of as a one-component Bloom filter, thus
the same analysis apply.
Bitmap index design and evaluation.
C. Chan
and
Y.
Ioannidis. 1998.
SIGMOD '98. ACM Press, New York, NY, 355-366.
Synopsis: This paper proposes Range Encoding and
multi-component encodings. From their analysis, they conclude that
using two-component encodings provides the optimal space-time
trade-off among the multi-component encodings.
Improved query performance with variant indexes.
P. O'Neil
and
D.
Quass. 1997. SIGMOD '97. ACM Press, New York, NY, 38-49.
Synopsis: This paper presents a short review Bitmap indexes,
and Bit-Sliced index and Projection index. A Projection index
materializes all values of a column in RID order, and a Bit-Sliced
index essentially takes an orthogonal bit-by-bit view of the same
data, both of which are available in
Sybase
IQ. This paper also present analysis that demonstrates important
performance advantages for variant indexes in some types of SQL
aggregation, predicate evaluation, and grouping.
Multi-table joins through bitmapped join indices.
P. O'Neil
and
D. Quass.
ACM SIGMOD Record. Volume 24, Issue 3, Pages: 8-11. 1995.
Synopsis: This is the first paper to fully explain the bitmap
join indexes for multi-table joins based on foreign keys. Such indexes
are used in popular database management systems and are very effective
for data warehousing applications. This is an important reason why some
people claim that "the star schema and bitmap indexes are a marriage made in heaven."
Model 204 Architecture and Performance.
P.
O'Neil.
1987. In
Proceedings of the 2nd international Workshop on High Performance
Transaction Systems (September 28 - 30, 1987). Lecture Notes In
Computer Science, vol. 359. Pages 40-59, Springer-Verlag, London.
Synopsis: This paper introduces the Bit Transposed File as
"an extreme version of the (attribute) transposed file."
It stores the bulk of the data as compressed bit vectors and answers
queries with bitwise logical operations. This establishes two key
characteristics of a bitmap
index: using bit vectors to store information and use bitwise
logical operations to answer queries. They did not present an
implementation of bitmap indexes.
Synopsis: This article describes the Encoded Vector Index
(EVI) implemented in IBM DB2. EVI can be considered as a version of
the binary
encoded bitmap index. Another name used in literature is the
bit-sliced index.
Bitmap Index vs. B-tree Index: Which and When?
V. Sharma.
Synopsis: This series of articles explains the bitmap index
in ORACLE and how to structure the database to maximize the
advantage of using bitmap indexes.