In-memory processing for analytics has captivated the imagination of many by offering the extreme speeds of RAM to processing problems that previously required extensive access to disks. But is RAM truly fast enough? At 4,000 times faster than disks, and 1,000 times faster than flash it’s seductively tempting to think so. However, in our work on Dynamic In-memory Processing for DB2 10.5 with BLU Acceleration, we came to believe than RAM is really far too slow and that modern servers were capable of processing data faster than most in-memory algorithms would allow. Our achievement: RAM can be smaller that data, and data (the interesting bits) can be processed faster than RAM.
Is it possible to go faster than the maximum speed of the medium you’re traveling in? We think so. The late 1960′s brought the dream of space exploration and the rise of the Apollo space missions inspired Hollywood and televisions franchises for super fast fictional space ships that could be used for interstellar exploration. Space is a big place, so if you’re going to get around you’ll need a ship that’s not only faster than sound (330 m/s) but faster than light (300 million m/s). That’s because the nearest star is several light years away, and most are hundreds if not thousands of light years away. Star Trek popularized the notion of warp speed – the ability for a ship to travel several times the speed of light. In other words, several times faster than the fastest thing we know of. In BLU Acceleration we’ve achieved a similar kind of warp speed… a data analytics warp that allows modern servers to operate faster than the memory access speed (i.e. faster than RAM). Moreover, by dynamically determining at query runtime which data resides in RAM, BLU dramatically reduces the requirements for system memory; only the hot portions of active columns need to be memory resident. Db2 ensures the interesting data is either already in memory, or can be made so through its innovative in-memory optimized prefetching algorithms. In essence, the database automatically loads the important data from disk into memory (RAM) just before it’s needed. With the active & interesting data in memory the warp processing can begin.
Dynamic In-memory Processing in BLU Acceleration combines three novel technologies:
- Dynamic list prefetching. A specialized in-memory optimized columnar prefetching algorithm to determine a few milliseconds in advance what data should be loaded into RAM
- Scan friendly victim selection. A re-engineered algorithm for analytics to determine which data should stay in RAM
- Cache optimized processing that maximizes the processing time for every analytic algorithm towards data in L3 and L2 caches.
It’s that last item of the 3, cache optimized processing, that allows BLU Acceleration to operate faster than RAM, and exceed the speed of the medium. In addition to disks and RAM, modern servers deploy a hierarchy of memory layers to reduce the time spent accessing RAM. These commonly include L3, L2, L1 caches, along with the CPU’s own registers.
Access to RAM, while three orders of magnitude faster than disk, is actually still surprisingly slow compared to the speed of the CPU. That’s why CPU manufacturers like IBM and Intel have invested heavily in the memory hierarchy, rather than simply trusting RAM alone to be fast enough. When the CPU needs data that is not in its registers it searches L1, L2 and L3 caches for the data, and if unlucky is forced to visit RAM to retrieve the data. While this processing appears on most CPU monitoring utilities as busy CPU time, in fact it is not time spent processing the data… merely finding it. How much faster is access to L3 , L2, and L1? Dramatically! Some typical rates would be in the range of 15x, 30x, and 100x faster respectively. In the bar chart (left) are some real numbers for a current generation CPU (make and model deliberately unstated) showing actual speed-up of 15x, 41x, and 173x respectively. CPU caches alone are helpful, but not sufficient. The software they execute will make naive use of them unless special algorithms are applied. For data analytics the volumes involved are usually dramatically larger than cache sizes, which highly limits exploitation of CPU caches. The secret to the warp speed-up achieved in BLU Acceleration is that IBM has designed every algorithm to minimize access to RAM, and maximize processing time in L3 and L2 caches. For the processor shown on the left can that provides a whopping 15x-41x speed-up for memory access.
When we combine these 3 powerful ideas (dynamic list prefetching, scan friendly victim selection, and cache optimized processing), we see that BLU Acceleration can operate with extreme speed even when total compressed database size is much larger than RAM, because :
- Only fractions of the active columns need to be kept in RAM and only a fraction of the system columns are typically active. RAM requirements are therefore a fraction of a fraction of the compressed data size.
- The database determines dynamically what data should be pre-loaded (prefetched) into RAM using a specialized in-memory optimized algorithm.
- The database determines dynamically which data should remain in RAM, using a specialized algorithm for column stores.
- Once in memory the bulk of the analytic processing occurs on data while it is in L2 and L3 caches, providing order of magnitude speed-up over traditional processing approaches.
Best of all, DB2 determines all of this dynamically. No tuning required. CPU cache is the new RAM. RAM is the new disk. Disk is the new tape. Tape is the new…. don’t bother.
Not simply in-memory processing, it’s better than in-memory processing. RAM can be smaller than data, and data can be processed faster than RAM.
Mr. Sulu, take us out of here. Warp factor 5.