Table of Contents
Over the years, processors and cache memories have become much faster, but mass storage based on rotating magnetic disks has not kept pace. While the storage capacity of disks has grown by about a factor of 1,000 in the past decade, random seek times and data transfer rates are still severely limited by mechanical constraints. Therefore, many workloads are i/o-bound. The idea of data compression is to pay a small cost in increased CPU utilization for the benefit of smaller databases and reduced i/o to improve throughput, potentially significantly.
The ability to compress user data is an important new capability of the InnoDB Plugin. Compressed tables reduce the size of the database on disk, resulting in fewer reads and writes needed to access the user data. For many InnoDB workloads and many typical user tables (especially with read-intensive applications where sufficient memory is available to keep frequently-used data in memory), compression not only significantly reduces the storage required for the database, but also improves throughput by reducing the i/o workload, at a modest cost in processing overhead. The storage cost savings can be important, but the reduction in i/o costs can be even more valuable.
The size of InnoDB data pages is 16K. Beginning with this
release of the InnoDB Plugin, you can use the attributes
ROW_FORMAT=COMPRESSED or
KEY_BLOCK_SIZE in the
CREATE TABLE and ALTER TABLE
commands to request InnoDB to compress each page to
1K, 2K, 4K, 8K, or 16K bytes.
(The term KEY_BLOCK_SIZE may be
confusing. It does not refer to a “key” at all, but simply
specifies the size of compressed pages that will be used for the
table. Likewise, in the InnoDB Plugin, compression is
applicable to tables, not to individual rows, so the option
ROW_FORMAT really should be
TABLE_FORMAT. Unfortunately, MySQL
does not permit a storage engine to add syntax to SQL statements,
so the InnoDB Plugin simply re-uses the clause originally defined
for MyISAM).
To create a compressed table, you might use a command like this:
CREATE TABLEname(column1 INT PRIMARY KEY) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
If KEY_BLOCK_SIZE not specified, the
default compressed page size of 8K will be used. If
KEY_BLOCK_SIZE is specified, the attribute
ROW_FORMAT=COMPRESSED may be omitted.
Setting KEY_BLOCK_SIZE=16 most often will
not result in much compression, since the normal InnoDB page
size is 16K. However, this setting may be useful for tables with
many long BLOB, VARCHAR or
TEXT columns, because such data often do
compress well, and might therefore require fewer
“overflow” pages as described in
Section 3.3.2.2, “
Compressing BLOB, VARCHAR
and TEXT columns
”.
Note that compression is specified on a table-by-table basis.
All indexes of a table (including the clustered index) will be
compressed using the same page size, as specified on the
CREATE TABLE or ALTER TABLE
command. Table attributes such as ROW_FORMAT and
KEY_BLOCK_SIZE are not part of
the CREATE INDEX syntax, and are ignored if they
are specified (although you will see them in the output of the
SHOW CREATE TABLE command).
Compressed tables are stored in a format previous versions
of InnoDB cannot process. Therefore, to preserve downward
compatibility of database files, compression can be specified only
when the “Barracuda” data file format is enabled using the
configuration parameter innodb_file_format.
Furthermore, table compression is not available for the
InnoDB system tablespace. The system tablespace (space 0, the
ibdata* files) may contain user data, but it
also contains internal InnoDB system information, and therefore
is never compressed. Thus, compression applies only to tables
(and indexes) stored in their own tablespaces.
To use compression, you must enable the
“file per table” mode using the configuration
parameter innodb_file_per_table and enable the
“Barracuda” disk file format using the parameter
innodb_file_format. You can set these
parameters in the MySQL option file my.cnf or
my.ini, but both are dynamic parameters that
you can change with the SET command without
shutting down the MySQL server, as noted in
Section 8.3, “Configuring the Plugin”.
Specifying ROW_FORMAT=COMPRESSED or a KEY_BLOCK_SIZE in the
CREATE TABLE or ALTER TABLE commands if the “Barracuda”
file format has not been enabled will produce these warnings that
you can view with the SHOW WARNINGS command:
| Level | Code | Message |
|---|---|---|
| Warning | 1478 | InnoDB: KEY_BLOCK_SIZE requires
innodb_file_per_table. |
| Warning | 1478 | InnoDB: KEY_BLOCK_SIZE requires
innodb_file_format=1. |
| Warning | 1478 | InnoDB: ignoring
KEY_BLOCK_SIZE= |
| Warning | 1478 | InnoDB: ROW_FORMAT=COMPRESSED requires
innodb_file_per_table. |
| Warning | 1478 | InnoDB: assuming
ROW_FORMAT=COMPACT. |
Note that these messages are only warnings, not errors, and the table will be created as if the options were not specified. If you wish, you may enable InnoDB “strict mode” (see Section 7.6, “InnoDB Strict Mode”) to cause InnoDB to generate an error, not a warning. In strict mode, the table would not be created if the current configuration does not permit using compressed tables.
The “non-strict” behavior is intended to permit you to
import a mysqldump file into a database that does
not support compressed tables even if the source database contained
compressed tables. In that case the InnoDB Plugin will create the
table in ROW_FORMAT=COMPACT instead of preventing the operation.
Generally, however, when you import the dump file into a new
database, you should be sure the server is running the
InnoDB Plugin with the proper settings for the configuration
parameters innodb_file_format and innodb_file_per_table,
if you want to have the tables re-created as they exist in the
original database.
The attribute KEY_BLOCK_SIZE is permitted
only when ROW_FORMAT is specified as
COMPRESSED or is omitted. Specifying a
KEY_BLOCK_SIZE with any other
ROW_FORMAT will generate a warning that
you can view with SHOW WARNINGS. However, the
table will be as non-compressed (the specified
KEY_BLOCK_SIZE will be ignored).
| Level | Code | Message |
|---|---|---|
| Warning | 1478 |
InnoDB: ignoring KEY_BLOCK_SIZE=
|
If you are running in InnoDB strict mode, the combination of a
KEY_BLOCK_SIZE with any
ROW_FORMAT other than
COMPRESSED will generate an error, not a warning,
and the table will not be created.
The following tables summarize how the various options on
CREATE TABLE and ALTER TABLE
are handled.
Table 3.1.
Meaning of CREATE and ALTER TABLE options
| Option | Usage | Description |
|---|---|---|
ROW_FORMAT=REDUNDANT | Storage format used prior to MySQL 5.0.3 |
Less efficient than ROW_FORMAT=COMPACT;
for backward compatibility
|
ROW_FORMAT=COMPACT | Default storage format since MySQL 5.0.3 | Can store prefix of up to 768 bytes of long column values in the clustered index page, with the remaining bytes stored in an overflow page |
ROW_FORMAT=DYNAMIC |
Available only with innodb_file_format
“Barracuda”
| Will store values within the clustered index page if they fit; if not, will store only a 20-byte pointer to an overflow page (no prefix) |
ROW_FORMAT=COMPRESSED |
Available only with innodb_file_format
“Barracuda”
|
Compresses the table and indexes using zlib to default
compressed page size of 8K bytes; implies
ROW_FORMAT=DYNAMIC
|
KEY_BLOCK_SIZE=
|
Available only with innodb_file_format
“Barracuda”
|
Specifies compressed page size of 1, 2, 4, 8 or 16K bytes;
implies ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED
|
The table below summarizes error conditions that occur with certain
combinations of configuration parameters and options on the
CREATE or ALTER TABLE commands,
and how the options appear in the output of
SHOW TABLE STATUS.
When “InnoDB strict mode” is OFF,
InnoDB will create or alter the table, but may ignore certain
settings, as shown below. You can see the warning messages in the
mysql error log. When innodb_strict_mode is
ON, these specified combinations of options
generate errors, and the table is not created or altered.
You can see the full description of the error condition with
SHOW ERRORS.
For example:
mysql> CREATE TABLE x (id INT PRIMARY KEY, c INT) -> ENGINE=INNODB KEY_BLOCK_SIZE=33333; ERROR 1005 (HY000): Can't create table 'test.x' (errno: 1478) mysql> SHOW ERRORS; +-------+------+-------------------------------------------+ | Level | Code | Message | +-------+------+-------------------------------------------+ | Error | 1478 | InnoDB: invalid KEY_BLOCK_SIZE=33333. | | Error | 1005 | Can't create table 'test.x' (errno: 1478) | +-------+------+-------------------------------------------+ 2 rows in set (0.00 sec)
Table 3.2.
Warnings and Errors with CREATE and ALTER TABLE options
| Syntax | Warning or Error Condition |
Resulting ROW_FORMAT,
as shown in SHOW TABLE STATUS
|
|---|---|---|
ROW_FORMAT=REDUNDANT | None | REDUNDANT |
ROW_FORMAT=COMPACT | None | COMPACT |
ROW_FORMAT=COMPRESSED or
DYNAMIC or
KEY_BLOCK_SIZE is specified
|
Ignored if innodb_file_format =
“Antelope” (default) or
innodb_file_per_table = 0
| COMPACT |
Invalid KEY_BLOCK_SIZE is specified
(not 1, 2, 4, 8 or 16)
|
Ignored, without warning (unless
innodb_strict_mode is ON);
default of 8K is used
|
COMPRESSED, even though the table is not
compressed
|
ROW_FORMAT=COMPRESSED and valid
KEY_BLOCK_SIZE are specified
|
None; KEY_BLOCK_SIZE specified is used,
not the 8K default
|
COMPRESSED
|
Valid KEY_BLOCK_SIZE is specified with
ROW_FORMAT=REDUNDANT,
COMPACT or DYNAMIC
|
KEY_BLOCK_SIZE is ignored, without
warning (unless innodb_strict_mode is
ON)
|
REDUNDANT,
COMPACT or
DYNAMIC
|
ROW_FORMAT is not one of
REDUNDANT,
COMPACT,
DYNAMIC or
COMPRESSED
| Error is issued by MySQL | N/A |
Note that it is not possible to see the chosen
KEY_BLOCK_SIZE using
SHOW TABLE STATUS. However,
SHOW CREATE TABLE will display the
KEY_BLOCK_SIZE (even if it was ignored).
This section describes some internal implementation details about compression in InnoDB. The information presented here may be helpful in tuning for performance, but is not necessary to know for basic use of compression.
Some operating systems implement compression at the file system level. Files are typically divided into fixed-size blocks that are compressed into variable-size blocks, which easily leads into fragmentation. Every time something inside a block is modified, the whole block is recompressed before it is written to disk. These properties make this compression technique unsuitable for use in an update-intensive database system.
The InnoDB Plugin implements a novel type of compression with the help of the well-known zlib library, which implements the LZ77 compression algorithm. This compression algorithm is mature, robust, and efficient in both CPU utilization and in reduction of data size. The algorithm is “lossless”, so that the original uncompressed data can always be reconstructed from the compressed form. LZ77 compression works by finding sequences of data that are repeated within the data to be compressed. The patterns of values in your data will determine how well it compresses, but typical user data will often compress by 50% or more.
Unlike compression performed by an application, or compression features of some other database management systems, InnoDB compression applies both to user data and to indexes. In many cases, indexes can constitute 40-50% or more of the total database size, so this difference is significant. When compression is working well for a data set, the size of the InnoDB data files (the .idb files) will be 25% to 50% of the uncompressed size or possibly smaller. Depending on the workload, this smaller database can in turn lead to a reduction in i/o, and an increase in throughput, at a modest cost in terms of increased CPU utilization.
All user data in InnoDB is stored in pages comprising a B-tree index (the so-called clustered index). In some other database systems, this type of index is called an “index-organized table”. Each row in the index node contains the values of the (user-specified or system-generated) primary key and all the other columns of the table.
Secondary indexes in InnoDB are also B-trees, containing pairs of values: the index key and a pointer to a row in the clustered index. The pointer is in fact the value of the primary key of the table, which is used to access the clustered index if columns other than the index key and primary key are required. Secondary index records must always fit on a single B-tree page.
The compression of B-tree nodes (of both clustered and
secondary indexes) is handled differently from compression of
overflow pages used to store long VARCHAR,
BLOB, or TEXT columns, as
explained in the following sections.
Because they are frequently updated, B-tree pages require special treatment. It is important to minimize the number of times B-tree nodes are split, as well as to minimize the need to uncompress and recompress their content.
One technique InnoDB uses is to maintain some system information in the B-tree node in uncompressed form, thus facilitating certain in-place updates. For example, this allows rows to be delete-marked and deleted without any compression operation.
In addition, InnoDB attempts to avoid unnecessary uncompression and recompression of index pages when they are changed. Within each B-tree page, the system keeps an uncompressed “modification log” to record changes made to the page. Updates and inserts of small records may be written to this modification log without requiring the entire page to be completely reconstructed.
When the space for the modification log runs out, InnoDB uncompresses the page, applies the changes and recompresses the page. If recompression fails, the B-tree nodes are split and the process is repeated until the update or insert succeeds.
If a row does not fit by itself on a compressed page, the
INSERT or UPDATE operation
will fail with
ERROR 42000: Row size too large. The only
remedy is to rebuild the table with ALTER TABLE,
in order to select a larger compressed page size
(KEY_BLOCK_SIZE) or to disable compression
entirely with
ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPACT.
In a clustered index, BLOB,
VARCHAR and TEXT columns
that are not part of the primary key may be stored on
separately allocated (“overflow”) pages. We call
these “off-page columns” whose values are stored
on a singly-linked list of overflow pages.
For tables created in ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPRESSED, the values of BLOB,
TEXT or VARCHAR columns may
be stored fully off-page, depending on their length and the length
of the entire row. For columns that are stored off-page, the
clustered index record only contains 20-byte pointers to the
overflow pages, one per column. Whether any columns are
stored off-page depends on the page size and the total size of
the row. When the row is too long to fit entirely within the
page of the clustered index, InnoDB will choose the longest
columns for off-page storage until the row fits on the
clustered index page. As noted above, if a row will not fit by
itself on a compressed page, an error will occur.
Tables created in previous versions of InnoDB use the
“Antelope” file format, which supports only
ROW_FORMAT=REDUNDANT and ROW_FORMAT=COMPACT. In these formats,
InnoDB stores the first 768 bytes of BLOB,
VARCHAR and TEXT columns in
the clustered index record along with the primary key. The 768-byte
prefix is followed by a 20-byte pointer to the overflow pages that
contain the rest of the column value.
When a table is in COMPRESSED format,
all data written to overflow pages is compressed
“as is”; that is, InnoDB applies the zlib
compression algorithm to the entire data item. Other than the data,
compressed overflow pages contain an uncompressed header and
trailer comprising a page checksum and a link to the next
overflow page, among other things. Therefore, very
significant storage savings can be obtained for longer
BLOB, TEXT or
VARCHAR columns if the data is highly
compressible, as is often the case with text data (but not
previously compressed images).
The overflow pages are of the same size as other pages. A row containing ten columns stored off-page will occupy ten overflow pages, even if the total length of the columns is only 8K bytes. In an uncompressed table, ten overflow pages occupy 160K bytes. In a compressed table with an 8K page size, they occupy 80K bytes. Thus, it is often more efficient to use compressed table format for tables with long column values.
Using a 16K compressed page size can reduce storage and
i/o costs for BLOB,
VARCHAR or TEXT columns,
because such data often will compress well, and might
therefore require fewer “overflow” pages, even
though the B-tree nodes themselves will take as many pages as
in the uncompressed form.
In a compressed InnoDB table, every compressed page (whether 1K, 2K, 4K or 8K) corresponds to an uncompressed page of 16K bytes. To access the data in a page, InnoDB must read the compressed page from disk (unless it is already in memory), and then uncompress the page to its original 16K byte form. This section describes how InnoDB manages the buffer pool with respect to pages of compressed tables.
To minimize i/o and to reduce the need to uncompress a page, at times the buffer pool will contain both the compressed and uncompressed form of a database page. However, to make room for other required database pages, InnoDB may “evict” from the buffer pool an uncompressed page, while leaving the compressed page in memory. Or, if a page has not been accessed in a while, the compressed form of the page may be written to disk, to free space for other data. Thus, at any given time, the buffer pool may not contain any copy of a given database page, may contain only the compressed form of the page, or may contain both the compressed and uncompressed forms of the page.
InnoDB keeps track of which pages to retain in memory and which to evict using a least-recently-used (LRU) list, so that “hot” or frequently accessed data tends to stay in memory. When compressed tables are accessed, InnoDB uses an adaptive LRU algorithm to achieve an appropriate balance of compressed and uncompressed pages in memory. This adaptive algorithm is sensitive to whether the system is running in an i/o-bound or CPU-bound manner.
The essential idea is to avoid spending too much processing time uncompressing pages when the CPU is busy, and to avoid doing excess i/o when the CPU has spare cycles that can be used for uncompressing compressed pages (that may already be in memory). When the system is i/o-bound, the algorithm will prefer to evict the uncompressed copy of a page rather than both copies, in order to make more room for other disk pages to become memory resident. When the system is CPU-bound, InnoDB will preferentially evict both the compressed and uncompressed page, so that more memory can be used for “hot” pages and reducing the need to uncompress data in memory only in compressed form.
Before (but not necessarily at the same time as) a compressed page is written to disk, InnoDB will write a copy of the page to the redo log (if it has been recompressed since the last time it was written to the database). This is done to ensure that redo logs will always be usable, even if a future version of InnoDB uses a slightly different compression algorithm. Therefore, some increase in the size of log files, or a need for more frequent checkpoints, can be expected when using compression. The amount of increase in the log file size or checkpoint frequency depends on the number of times compressed pages are modified in a way that requires reorganization and recompression.
Note that the redo log file format (and the database file format) are different from previous releases when using compression. The current release of InnoDB Hot Backup (version 3) therefore does not support databases that use compression. Only databases using the file format “Antelope” can be backed up online by InnoDB Hot Backup.
Most often, the internal optimizations in InnoDB described in section Section 3.3.2, “InnoDB Data Storage and Compression” will ensure that the system runs well with compressed data. However, because the efficiency of compression depends on the nature of your data, there are some factors you should consider to get best performance. You need to choose which tables to compress, and what compressed page size to use. You may also want to adjust the size of the buffer pool based on run-time performance characteristics such as the amount of time the system spends compressing and uncompressing data.
In general, compression will work best on tables that include a reasonable number of character string columns and where the data is read far more often than it is written. However, there are no universal and absolute ways to predict whether or not compression will benefit a particular situation. Only testing with a specific workload and data set running on representative configuration can lead to the appropriate decision. However, there are some general considerations that may help you decide when and how to use compression.
A key determinant of the efficiency of compression in
reducing the size of data files is the nature of the data
itself. Recall that compression works by identifying repeated
strings of bytes in a block of data. Completely randomized
data is the worst case. Typical data will often have repeated
values, and so will compress effectively. Character strings
will often compress well, whether defined in
CHAR, VARCHAR,
TEXT or BLOB columns.
On the other hand, tables containing mostly binary data
(integers or floating point numbers) or data that is
previously compressed (for example JPEG or
PNG images) may not generally compress
well, significantly or at all.
Compression is chosen on a table by table basis with the InnoDB Plugin, and a table and all of its indexes use the same (compressed) page size. It might be that the primary key (clustered) index, which contains the data for all columns of a table, will compress more effectively than the secondary indexes. For those cases where there are long rows, the use of compression may result in long column values being stored “off-page”, as discussed in Section 5.3, “DYNAMIC Row Format”. Those overflow pages may compress well. Given these considerations, for many applications, some tables will compress more effectively than others, and you may find that your workload performs best only with a subset of tables compressed.
Experimenting is the only way to determine whether or
not to compress a particular table. InnoDB compresses data
in 16K chunks corresponding to the uncompressed page size, and
in addition to user data, the page format includes some
internal system data that is not compressed. Compression
utilities compress an entire stream of data, and so may find
more repeated strings across the entire input stream than
InnoDB would find in a table compressed in 16K chunks. But
you can get a sense of how efficient compression will be by
using a utility that implements LZ77 compression (such as
gzip or WinZip) on your data file.
Another way to test compression on a specific table is
to copy some data from your uncompressed table to a similar,
compressed table (having all the same indexes) and look at the
size of the resulting file. When you do so (if nothing else
using compression is running), examine the ratio of successful
compression operations to overall compression operations. (In
the INNODB_CMP table, compare
COMPRESS_OPS to
COMPRESS_OPS_OK. See
Section 6.2.1, “INNODB_CMP and
INNODB_CMP_RESET
” for more
information.) If a high percentage of compression operations
complete successfully, the table might be a good candidate for
compression.
Generally, it is not useful to attempt to compress already compressed data, so you should choose between compressing data in your application and having InnoDB do so. For some applications, it may make sense to compress data (such as text) before it is inserted into the database. This is the only choice for storage engines that do not do compression, of course, but you may still prefer to do that in some cases, even when using the InnoDB Plugin. You may design your application to compress some data and not other data, saving overhead for data that will not compress well. This approach also has the advantage of using CPU cycles on the client machine rather than the database server for compression and uncompression. However, is usually not wise to store data that is compressed by the application in an InnoDB compressed table, as further compression is extremely unlikely.
One disadvantage of compressing data in the application is that the data values are opaque to InnoDB. You cannot use SQL to compare values of externally compressed data, nor create an effective index on such data, as InnoDB would not be able to recognize and sort the uncompressed data values.
An advantage of using internal compression is that InnoDB compresses the entire contents of a table (including all its columns and the clustered index and secondary indexes). This means that even data that is indexed or used in comparisons can be stored in compressed form. Indexes are often a significant fraction of the total size of a database. Depending on how many indexes your table has, and how effectively the data in those indexes compresses, compression could result in significant savings in storage, i/o or processor time.
Of course, it is possible to combine these approaches. For some applications, it may be appropriate to use some compressed tables and some uncompressed tables. It may be best to externally compress some data (and store it in uncompressed InnoDB tables) and allow InnoDB to compress (some of) the other tables in the application. Careful schema and application design are always important, and this is of course true when deciding how to use compression. Different use cases will have different best solutions.
In addition to choosing which tables to compress (and
the page size), the workload is another key determinant of
performance. If the application is dominated by reads, rather
than updates, fewer pages will need to be reorganized and
recompressed after the index page runs out of room for the
per-page “modification log” that InnoDB
maintains for compressed data. If the updates predominantly
change non-indexed columns or those containing
BLOBs or large strings that happen to be
stored “off-page”, the overhead of compression
may be acceptable. If the only changes to a table are
INSERTs that use a monotonically increasing
primary key, and there are few secondary indexes, there will
be little need to reorganize and recompress index pages.
Since InnoDB can “delete-mark” and delete rows
on compressed pages “in place” by modifying
uncompressed data, DELETE operations on a
table are relatively efficient.
For some environments, the time it takes to load data can be as important as run-time retrieval. Especially in data warehouse environments, many tables may be read-only or read-mostly. In those cases, it might or might not be acceptable to pay the price of compression in terms of increased load time, unless the resulting savings in fewer disk reads or in storage cost is significant.
Fundamentally, compression works best when the CPU time is available for compressing and uncompressing data. Thus, if your workload is i/o bound, rather than CPU-bound, you may find that compression can improve overall performance. Therefore when you test your application performance with different compression configurations, it is important to test on a platform similar to the configuration on which it will run in production.
Reading and writing database pages from and to disk is the slowest aspect of system performance. Therefore, compression attempts to reduce i/o by using CPU time to compress and uncompress data, and thus is most effective when i/o is a relatively scarce resource compared to processor cycles.
This is often especially the case when running in a multi-user environment with fast, multi-core CPUs. When a page of a compressed table is in memory, InnoDB will often use an additional 16K in the buffer pool for an uncompressed copy of the page. The adaptive LRU algorithm in the InnoDB Plugin attempts to balance the use of memory between compressed and uncompressed pages to take into account whether the workload is running in an i/o-bound or CPU-bound manner. Nevertheless, a configuration with more memory dedicated to the InnoDB buffer pool will tend to run better when using compressed tables than a configuration where memory is highly constrained.
The optimal setting of the compressed page size depends on the type and distribution of data that the table and its indexes contain. The compressed page size should always be bigger than the maximum record size, or operations may fail as noted in Section 3.3.2.1, “Compression of B-tree Pages”.
Setting the compressed page size too large will waste some space, but the pages will not have to be compressed as often. If the compressed page size is set too small, inserts or updates may require time-consuming recompression, and the B-tree nodes may have to be split more frequently, leading to bigger data files and less efficient indexing.
Typically, one would set the compressed page size to 8K
or 4K bytes. Given that the maximum InnoDB record size is
around 8K, KEY_BLOCK_SIZE=8 is usually a
safe choice.
The current version of the InnoDB Plugin provides only a limited means to monitor the performance of compression at runtime. Overall application performance, CPU and i/o utilization and the size of disk files are the best indicators of how effective compression is for your application.
Nevertheless, the InnoDB Plugin does include some Information
Schema tables (see
Example 6.1, “Using the Compression Information Schema Tables”)
that reflect the internal use of memory and the rates of compression
used overall. The INNODB_CMP tables report
information about compression activity for each compressed page
size (KEY_BLOCK_SIZE) in use. The
information in these tables is system-wide, and includes summary
data across all compressed tables in your database. You can use
this data to help decide whether or not to compress a table by
examining these tables when no other compressed tables are being
accessed.
The key statistics to consider are the number of, and
amount of time spent performing, compression and uncompression
operations. Since InnoDB must split B-tree nodes when they
are too full to contain the compressed data following a
modification, you should also compare the number of
“successful” compression operations with the number
of such operations overall. Based on the information in the
INNODB_CMP tables and overall application
performance and hardware resource utilization, you may decide to
make changes in your hardware configuration, adjust the size of
the InnoDB buffer pool, choose a different page size, or
select a different set of tables to compress.
If the amount of CPU time required for compressing and uncompressing is high, changing to faster CPUs, or those with more cores, can help improve performance with the same data, application workload and set of compressed tables. You may also benefit by increasing the size of the InnoDB buffer pool, so that more uncompressed pages can stay in memory, reducing the need to uncompress pages which exist in memory only in compressed form.
A large number of compression operations overall (compared
to the number of INSERT,
UPDATE and DELETE
operations in your application and the size of the database)
could indicate that some of your compressed tables are being
updated too heavily for effective compression. You may want to
choose a larger page size, or be more selective about which
tables you compress.
If the number of “successful” compression
operations (COMPRESS_OPS_OK) is a high
percentage of the total number of compression operations
(COMPRESS_OPS), then the system is likely
performing well. However, if the ratio is low, then InnoDB is
being caused to reorganize, recompress and split B-tree nodes
more often than is desirable. In this case, you may want to
avoid compressing some tables or choose a larger
KEY_BLOCK_SIZE for some of the tables for
which you are using compression. You may not want to compress
tables which cause the number of
“compression failures” in
your application to be more than 1% or 2% of the total (although
this may be acceptable during a data load, for example, if your
application does not encounter such a ratio during normal
operations).