Pushing Processing

In my earlier entry I spoke about the challenges of indexing the Enterprise. The biggest challenge is the speed at which indexing has to occur. Enterprises are creating data faster than traditional indexing methods can index data. In order to process billions of files at high speeds we needed to implement a new approach to scraping words. This method, which utilizes our advanced text scanning algorithms, works best on CPU architectures with very high-speed memory bandwidth and low-latency to that memory. After much analysis we found that the AMD Opteron CPU was the best fit. Because of Opteron’s Direct Connect Architecture, the latency for accessing random data from main memory has been minimized. This problem has also benefited by the gaming market which has driven down the CAS latency of DDR400 memory to 2 clock cycles.

What can we expect in the future? It is clear that the future of processors is multiple-cores. Thermal issues have put a damper on increasing clock speeds so any new available real estate is being used to add more cores which effectively increases the number of instructions that get executed per clock cycle. Existing CPU intensive applications will need to be modified to take advantage of these new architectures. Even though multi-threading has been around for a long time it is still worth examining this issue on multi-core systems and I will address this issue in my next blog entry.

What would I love to see in a future processor? Like everyone else our application would benefit from more L1 and L2 cache. This is obviously important to AMD also as they recently licensed the Z-RAM high density memory IP from Innovative Silicon. Hopefully some of that technology will significantly increase cache space. We could also use a simple built-in hash instruction for hashing strings. The best public domain hash functions take about 20 operations per word. I would guess a 10 to 20 fold speed up for a silicon approach.

Overall I look forward to many more cores and integrated DDR2 memory controllers. Just keep in mind that your code has to be tuned to take advantage of the new CPU’s otherwise you will have a lot of idle cores.