
IBM's two new products that support the Cell Broadband Engine architecture. XL C/C++ for Multicore Acceleration for Linux on x86 Systems, V9.0 and XL C/C++ for Multicore Acceleration for Linux on System p, V9.0 include all of the common XL C/C++ compiler features.
Program Optimization
XL C/C++ delivers several compiler options that allow you to:
Select different levels of compiler optimizations
Control optimizations for loops, floating-point, and other types of operations
XL C/C++ also includes specific optimization features tailored to exploit the unique performance capabilities of Cell Broadband Engine processors, including specialized data types and highly optimized built-in functions.

Cross-compilation
XL C/C++ for Multicore Acceleration for Linux on x86 Systems, V9.0 and XL C/C++ for Multicore Acceleration for Linux on System p,V9.0 are cross-compilers for generating 32-bit and 64-bit code that runs on Cell hardware. Compilation occurs on a host x86 system or IBM System p running Red Hat Enterprise Linux 5.1 (RHEL 5.1). The compiled application will run on a Cell/B.E. system.

Invocation and Linking command
XL C/C++ for Multicore Acceleration for Linux, V9.0 compiles PPU and SPU program code in separate steps using compiler invocation commands targeted specifically for each type of program code. Several versions of PPU-specific compiler invocation commands are delivered. SPU-specific invocation commands are also provided.

Mathematical Acceleration Subsystem (MASS)
XL C/C++ for Multicore Acceleration for Linux includes the Mathematical Acceleration Subsystem (MASS). MASS consists of libraries of tuned mathematical intrinsic functions that offer improved performance over the standard mathematical library routines, are thread-safe and support C, C++, and Fortran applications. Compilations for SPU and 32-bit and 64-bit compilations for PPU are supported.

Automatic code overlay
-qipa=overlay lets developers create SPU programs that would otherwise be too large to fit in the local memory store of the SPUs. -qipa=overlay tells the compiler to automatically generate code overlays for those SPUs that allow two or more code segments to be loaded at the same physical address.

Automatic SIMD Vectorization of program code
When compiler option -qhot=simd is in effect, certain operations that are performed in a loop on successive elements of an array are converted into a call to a vector instruction. This call calculates several results at one time, which is faster than calculating each result sequentially. Applying this suboption is useful for applications with large image processing demands.

Interprocedural Analysis (IPA)
Interprocedural Analysis can result in significant performance improvements. Interprocedural analysis can be specified on the compile step only or on both compile and link steps in whole program mode. Whole program mode expands the scope of optimization to an entire program unit, which can be an executable or shared object.