We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.
Heterogeneous computer systems are ubiquitous in all areas of computing, from mobile
to high-performance computing. They promise to deliver increased performance
at lower energy cost than purely homogeneous, CPU-based systems. In recent years
GPU-based heterogeneous systems have become increasingly popular. They combine
a programmable GPU with a multi-core CPU. GPUs have become flexible enough
to not only handle graphics workloads but also various kinds of general-purpose
algorithms. They are thus used as a coprocessor or accelerator alongside the CPU.
Developing applications for GPU-based heterogeneous systems involves several
challenges. Firstly, not all algorithms are equally suited for GPU computing. It is thus
important to carefully map the tasks of an application to the most suitable processor
in a system. Secondly, current frameworks for heterogeneous computing, such as
OpenCL, are low-level, requiring a thorough understanding of the hardware by the
programmer. This high barrier to entry could be lowered by automatically generating
and tuning this code from a high-level and thus more user-friendly programming
language. Both challenges are addressed in this thesis.
For the task mapping problem a machine learning-based approach is presented in
this thesis. It combines static features of the program code with runtime information
on input sizes to predict the optimal mapping of OpenCL kernels. This approach is
further extended to also take contention on the GPU into account. Both methods are
able to outperform competing mapping approaches by a significant margin.
Furthermore, this thesis develops a method for targeting GPU-based heterogeneous
systems from OpenMP, a directive-based framework for parallel computing.
OpenMP programs are translated to OpenCL and optimized for GPU performance.
At runtime a predictive model decides whether to execute the original OpenMP code
on the CPU or the generated OpenCL code on the GPU. This approach is shown to
outperform both a competing approach as well as hand-tuned code
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.