Show simple item record

dc.contributor.authorGuo, J.
dc.contributor.authorThiyagalingam, J.
dc.contributor.authorScholz, S.
dc.identifier.citationGuo , J , Thiyagalingam , J & Scholz , S 2011 , Breaking the GPU programming barrier with the auto-parallelising SAC compiler . in Procs of the 6th ACM Workshop on Declarative Aspects of Multicore Programming, DAMP'11, 83988 . ACM Press , pp. 15-23 .
dc.identifier.otherPURE: 101626
dc.identifier.otherPURE UUID: ae9d2313-4982-4748-b668-1b6c1940ec50
dc.identifier.otherdspace: 2299/5722
dc.identifier.otherScopus: 79952162843
dc.descriptionOriginal article can be found at : Copyright ACM [Full text of this article is not available in the UHRA]
dc.description.abstractOver recent years, the use of Graphics Processing Units (GPUs) for general-purpose computing has become increasingly popular. The main reasons for this development are the attractive performance/price and performance/power ratios of these architectures. However, substantial performance gains from GPUs come at a price: they require extensive programming expertise and, typically, a substantial re-coding effort. Although the programming experience has been significantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices. Directive-based approaches such as hiCUDA or OPENMP-variants offer further improvements but have not eliminated the need for the expertise on these complex architectures. Similarly, special purpose programming languages such as Microsoft's Accelerator try to lower the barrier further. They provide the programmer with a special form of GPU data structures and operations on them which are then compiled into GPU code. In this paper, we take this trend towards a completely implicit, high-level approach yet another step further. We generate CUDA code from a MATLAB-like high-level functional array programming language, Single Assignment C (SAC). To do so, we identify which data structures and operations can be successfully mapped on GPUs and transform existing programs accordingly. This paper presents the first runtime results from our GPU backend and it presents the basic set of GPU-specific program optimisations that turned out to be essential. Despite our high-level program specifications, we show that for a number of benchmarks speedups between a factor of 5 and 50 can be achieved through our parallelising compiler.en
dc.publisherACM Press
dc.relation.ispartofProcs of the 6th ACM Workshop on Declarative Aspects of Multicore Programming, DAMP'11, 83988
dc.titleBreaking the GPU programming barrier with the auto-parallelising SAC compileren
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionScience & Technology Research Institute

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record