Performance Optimization for Future Hardware: FPGA
The FPGA branch of this project is destined to investigate in hardware structures fitting optimally to the
needs of scientific codes.
FPGAs are an interesting candidate for high-performance computing, because the user is able to freely
configure its computational resources and their interconnection. Even manufacturers of supercomputers
and general-purpose CPUs are about to introduce FPGAs or reconfigurable parts in their products.
Therefore, recently a lot of contributions on the use of FPGAs for high-performance computing were published.
Nearly without exception they came to the conclusion, that FPGAs are not well suited for this community.
The main reasons for this are manifold. However, most of them are due to too superficial examination:
Concluding, FPGAs enable the creation of the perfectly fitting architecture, but the costs (in terms of both effort
and financial costs) are too high. However, even large general-purpose CPU manufacturers think about partial
reconfigurable designs and thus investigation of the principles will be worthwile.
- "FPGAs are of no use for floating-point operations"
The common opinion is, that only integer or at most fixed-point arithmetic is really efficient on FPGAs.
However, also the FPGA development had rapid advances in recent years. There are devices with thousands of
basic floating-point multiplier units that can be used to assemble even double-precision. The manufacturers often
offer libraries that makes the use very easy while remaining efficient at 75% of the device's clock speed
Moreover, least of the scientists have examined their methods on the need for accuracy. Mostly
they know they can be sure that double-precision is sufficiently accurate and since it is not expensive
on a general-purpose CPU, they use floating-point operations in double precision. When developing a special
hardware device of a specific method (i.e. using an FPGA), one has freedom in choice of the accuracy for
each operation needed. Spending effort into an accuracy study for the method one can optimize each single
operator for the trade-off of width (i.e. space) and speed.
- "FPGAs cannot evade the memory bottle neck"
Most scientists who try to port their application to an FPGA buy a standard evaluation board from the FPGA
manufacturer and hope to execute one iteration of their problem per cycle. But then they realize that the
data cannot transferred fast enough from and to memory. What they do forget is, that they would have the
ability to evade the bottle neck. Again they used a preconfigured architecture and tried to adapt their
problem to this structure.
Completely free from any constraints a specific architecture for every scientific problem could be
assembled. For this not only the integrated circuit, implemented in the FPGA, is necessary, but also
a elaborate board design that exactly fits to the needs of the algorithm. I.e. for dealing with the memory
bottle neck, several memory controllers or a memory bus width of hundreds of bits could be used.
Of course the design of an individual board demands even more knowledge of electrical engineering, such
that the evaluation process of FPGA as alternative architecture would last several years.
- "Programming FPGAs isn't like programming C++"
Since FPGAs simulate hardware circuits, normally the configuration is compiled from a description written
in a hardware description language (HDL) on the Register Transfer abstraction Level (RTL). Albeit learning a
new language is no obstacle for programmers, HDLs are very different because they not only describe the
processes (like a normal programming language) but also the structure. The conclusion of most scientists is,
that it is not possible to expect from the standard scientists to learn writing in HDLs.
However, there are tools that try to smooth this way by translating codes written in a programming language
like C++ to a hardware description. All the current tools have drawbacks and limitations, however, with
enough effort and scientific motivation better tools could be made available.
Therefore, this part of the project will address the limitations and possibilities of reconfigurable hardware in
general. With small studies on different evaluation boards principles will be derived that allow conclusions whether
a scientific method would benefit from reconfigurable parts in a computer (or CPU) and in which way current architectures
have to be improved to be better suitable for modern scientific problems.