Fast, Reliable, Proven transactional storage for MySQL

Third Party Contributions in InnoDB Plugin 1.0.4

The InnoDB team gratefully acknowledges the contributions of leading MySQL community members. These contributions have in some cases significantly improved the performance of the InnoDB Plugin, building on the gains demonstrated in InnoDB Plugin 1.0.3. This page briefly describes the key enhancements based on these third-party contributions. Chapter 7 of the InnoDB documentation contains more complete information.

Multiple Background Threads

InnoDB uses background threads to service various types of I/O requests. Starting from InnoDB Plugin 1.0.4, the number of background threads tasked with servicing read and write I/O requests on data pages is configurable. Both Google and Percona suggested this change. The implementation in InnoDB Plugin 1.0.4 is based on the Percona patch, which included a technique for balancing the load among the I/O threads. The InnoDB team also ensured that this change was appropriately ported to Windows, where the number of background threads was previously controlled by the configuration parameter innodb_file_io_threads, which has been removed in this release.

Master Thread I/O Capacity Tuning

The master thread in InnoDB performs various tasks in the background, most of which are I/O related, like flushing of the dirty pages from the buffer cache or writing the buffered inserts to the appropriate secondary indexes. Historically, InnoDB has used a hard coded value of 100 IOPs (I/Os per second) as the total I/O capacity of the server, and it attempts to tune its activities to take advantage of available capacity. Google and Percona suggested similar changes to allow the DBA to specify the I/O capacity. We based the implementation in InnoDB Plugin 1.0.4 on the Google contribution, with the following changes: (a) if there is excess IO capacity, InnoDB will always utilize it to flush dirty pages and (b) we made the parameter innodb_io_capacity dynamic, so DBAs can experiment with different values without restarting the server.

Asynchronous Read Ahead

An asynchronous read ahead request is an I/O request to pre-fetch multiple pages from disk to the buffer cache in anticipation that these pages will be needed in the near future. InnoDB has historically used two read ahead algorithms (random read ahead and linear read ahead) to improve I/O performance.

Both Google and Percona have observed that generally InnoDB is too aggressive in doing read ahead, and have proposed patches. Percona proposed a patch that would allow the DBA to select either, neither or both types of read ahead algorithms. While we appreciate the contribution of Percona, our team took a different approach.

Instead of permitting the DBA to enable or disable random read ahead, we removed it from InnoDB Plugin 1.0.4, since this feature generally provided no benefit in our testing. And, rather than having a switch to enable or disable linear read ahead, we introduced the parameter innodb_read_ahead_threshold that controls the aggressiveness with which InnoDB will schedule asynchronous reads. This approach is better suited for mixed workloads, where some table scans are happening together with random reads on other tables. This new sensitivity knob allows the DBA to reduce read aheads due to random reads without affecting table scans.

Group Commit

InnoDB is required to flush to the redo log the changes made by a transaction before it is committed. With group commit, InnoDB can issue a single write to the log file to effectuate the commit action for multiple user transactions that commit at about the same time, significantly improving throughput. The implementation of group commit also allowed InnoDB Hot Backup to work properly, ensuring the same order of commit in MySQL binlog and the InnoDB logfile.

Group commit historically was supported until MySQL 4.x. When MySQL 5.0 introduced distributed transactions and the 2-phase commit protocol, group commit functionality inside InnoDB was broken, and throughput suffered.

Percona offered a patch that eliminated a mutex used at “prepare commit” time. For InnoDB Plugin 1.0.4, we took a different approach, based on a deep analysis of the code. We determined that there is no need to hold such a mutex during the prepare phase, nor while doing the flush of the commit phase. By splitting the write to the log file and the flush of the log file to disk in the commit phase, the second part can happen without holding the mutex. This yields the desired effect of committing multiple transactions with a single flush.

Adaptive Flushing

The InnoDB “master thread” flushes dirty pages (those pages changed but not yet written to the database files). It does so aggressively if the percentage of dirty pages in the buffer pool exceeds innodb_max_dirty_pages_pct. If a checkpoint becomes inevitable, InnoDB will flush the dirty pages within a user thread. This behavior can cause temporary reductions in throughput when excessive buffer pool flushing takes place, limiting the I/O capacity available for ordinary read and write activity.

Percona suggested a change whereby InnoDB would proactively watch for an approaching checkpoint and do a flush from within the master thread before the checkpoint was imminent. When the age of the most recent checkpoint reached one half of the maximum allowed age, the Percona patch would have flushed buffer pool pages using up to 10% of the I/O capacity. And, if the checkpoint age reached 75% of the allowed maximum, InnoDB would become more aggressive and do a flush using up to 100% of I/O capacity.

We took a different, more mathematically based approach, using a heuristic based on the number of dirty pages in the buffer cache and the rate at which redo is being generated. Based on this heuristic, the master thread will decide how many dirty pages to flush from the buffer cache each second. This self adapting heuristic, which we call “adaptive flushing” (since it does not affect checkpointing per se), is able to deal with sudden changes in the workload, and provides the desired smoothing of I/O rates and transaction throughput.

Additional Patches

This release of the InnoDB Plugin also includes several small patches contributed by the MySQL team at Sun.

These patches include:

  • Inline handling of functions when compiling InnoDB with SunStudio
  • Allowing the Solaris compiler to generate data pre-fetch instructions
  • Use of the x86 PAUSE instruction in InnoDB spin loops
  • Improved default spin values

Most of the contributed atches are specific to Solaris or SunStudio. The InnoDB team made sure that the PAUSE instruction, or its equivalent, is used in spin loops on platforms other than Solaris.