IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
   
     Home      Products      Services & solutions      Support & downloads      My account     
 
Microelectronics
About Microelectronics
Custom chip solutions
Foundry
Components
Services
Support
News & events
News by topic
Industry coverage
Events
Previous stories
IBM PowerPC Processor News
Micronews
Photo catalog
Literature


IBM PowerPC processor news
Programming Tools Previous ArticleNext Article

Advanced compiler technology reveals IBM PowerPC technology advantages

by Robert Redfield, Director of Partner Programs, Green Hills Software

Today, designers of high-performance embedded products face the challenge of choosing the best combination of hardware and supporting software application tools. After the hardware architecture is selected, care must be taken to evaluate and select software development tools that squeeze the highest performance out of the architecture. After all, what good is a hardware architecture feature if the software doesn’t take advantage of it?

IBM engineers recently concluded a study to help customers face this challenge for the IBM® PowerPC® 405 and 440 processors. Several issues were addressed:

  • Are there useful, honest benchmarks to evaluate hardware/software combinations?
  • Can I control how my C/C++ compiler exploits the special features of the PowerPC architecture?
  • How fast can I expect my application to run compared to the theoretical maximum?

Useful, honest performance benchmarks
The Embedded Microprocessor Benchmark Consortium (EEMBC) develops and certifies meaningful performance benchmarks for embedded processors and compilers. The EEMBC benchmarks are composed of dozens of algorithms organized into benchmark suites targeting telecommunications, networking, automotive and industrial, consumer, and office equipment products, as shown in Table 1. Although there is no better benchmark for a customer application than the application itself, the EEMBC benchmarks’ real-world nature offers a clear improvement over other outdated synthetic benchmarks.

The scores
IBM tested various compilers with its PowerPC 4xx products on all five EEMBC suites and chose the compiler that produced the best scores. IBM then certified the scores at EEMBC Certification Labs (ECL) and published the scores on the public EEMBC Web site, www.eembc.org.

“EEMBC benchmarks are based on real-world code that indicates how our PowerPC 405GPr and 440GP processors work in our customers' applications,” said Kalpesh Gala, PowerPC strategic marketing manager at IBM Microelectronics. “With Green Hills Software's compilers, our PowerPC 440GP processor exceeded all other System-on-Chip processors on four of the five EEMBC benchmark suites.”

  Table 1. Compilers mix for EEMBC certification
IBM PowerPC processor Telecom Office
automation
Consumer Automotive/
industrial
Networking
405GPr – 266 MHz Green Hills Green Hills Green Hills Green Hills Green Hills
405GPr – 400 MHz Green Hills Green Hills Green Hills Green Hills Green Hills
440GP Green Hills Green Hills GNU Green Hills Green Hills
Note: Table 1 shows the IBM PowerPC architecture and compiler configurations that produced the highest scores on the EEMBC Web site.

Scores on the EEMBC Web site show that Green Hills Software’s compilers outperformed other compilers by as much as 20% on the PowerPC 440GP, a significant margin in a realm where single-digit improvements are often considered a breakthrough in the success of a customer’s product design (see Table 2).

  Table 2. Certified EEMBC scores for PowerPC 440GP
IBM PowerPC 440GP
- 500 MHz
Green Hills Software MULTI 3.6.1 Next-best score Green Hills Software faster by:
Telecom Telemark
11.4   
9.5
WRS-Diab 4.4a
20%    
Automotive/industrial Automark
264.2   
222.1
MetaWare HighC/C++ 4.5b
19%    
Consumer Consumermark
42.5   
48.0
GNU GCC 3.04
- 13%    
Networking Netmark
9.4   
8.8
GNU GCC 3.04
7%    
Office automation Oamark
511.1   
501.3
Green Hills Software 3.5
2%    
Note: Certified EEMBC scores are public and enable customers to compare compiler performance.

The technology behind the scores
Key compiler optimizations that produced IBM’s EEMBC scores fall into two categories:

  • IBM PowerPC architecture-specific optimizations
  • General optimization

The Green Hills Software compilers are one component of the MULTI integrated development environment, show in Figure 1. Programmers can evoke the compiler within MULTI or from makefiles.

Screen capture Figure 1. MULTI Integrated Development Environment

Multiply accumulate instructions
One of the keys to the high EEMBC scores was the compiler’s ability to effectively use the
multiply-accumulate (MAC) instruction set extensions on the PowerPC 405 and 440 processors. The 16-bit multiply instructions included in the MAC extensions offer shorter latencies and higher throughputs than the 32-bit multiply instructions of the standard PowerPC architecture. Unfortunately, the C programming language is not always conducive to arithmetic operations on values of size less than that of type "int" because many operations promote their results to larger sizes, and programmers often do not select the smallest data types for their variables. The compiler however, overcame both of these obstacles and automatically identified and used the smallest-possible data sizes for these operations and maximized the MAC usage.

Divides and multiplies can often be performed using sequences of shifts, adds and subtracts, which can be faster than using actual divide and multiply instructions. The Green Hills Software compiler chooses an approach based on factors such as:

  • Latency and throughput execution times of particular divide and multiply instructions
  • Heuristics tuned by a data base of performance characteristics from other real applications run on the particular PowerPC processor

Alternatively, the user can override the compiler’s choice by directing it through compiler flags.

Pipeline scheduling
The high scores were also a result of the compiler effectively scheduling instructions for the PowerPC architecture pipelines. The PowerPC 405 utilizes a single-issue five-stage pipeline. The PowerPC 440 processor uses superscalar, out-of-order execution with a seven-stage pipeline containing of three parallel pipelines. In addition, it contains two integer units and one load/store unit. The challenge facing the compiler is to schedule instructions, often involving creative reordering, to make the most productive use of the PowerPC 405 and 440 processor’s execution units.

Register allocation
The compiler’s advanced register allocation logic was also a positive factor. Register allocation tries to allocate local variables in registers instead of in memory, because accessing them there requires multiple load and store instructions. Several optimization techniques, including register coalescing, loop optimization analysis, and analysis of constant data and variable lifetimes, minimize the number of accesses to variables in memory.

Many other advanced compiler optimizations were required to achieve the high scores, including:

  • Intermodule inlining
  • Loop unrolling
  • Constant folding
  • Register coalescing
  • Loop rotation
  • Static address elimination
  • Common subexpression elimination
  • Dead code elimination
  • Constant propagation
  • Strength reduction
  • Loop invariant removal
  • Tail recursion
  • Peephole optimization

Investment in compiler technology continues at Green Hills Software. The return on that investment is evident. Early tests on Green Hills’ next-generation PowerPC compiler, planned for release later this year, show it beating its predecessor on these same EEMBC benchmarks.

Conclusion
Many attributes of a compiler and hardware architecture must be considered when evaluating software development tools for a high-performance System-on-Chip (SOC) design based on IBM PowerPC technology. One valuable evaluation tool is the EEMBC benchmarks, which enabled IBM engineers to compare compiler performance on real-world code and publish winning, certified performance scores. Green Hills Software’s out-of-the-box compilers contain optimization technology that enables the user to unlock the highest performance and smallest code size for applications based on the IBM PowerPC 4xx family.

 

Previous ArticleNext Article


    About IBM Privacy Contact