Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance

Knight, Louise, Stefanic, Polona, Cigale, Matej, Jones, Andrew and Taylor, Ian 2019. Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance. Future Generation Computer Systems 100 , pp. 542-556. 10.1016/j.future.2019.05.039
Item availability restricted.

[img] PDF - Accepted Post-Print Version
Restricted to Repository staff only until 22 May 2020 due to copyright restrictions.

Download (522kB)

Abstract

SWITCH (Software Workbench for Interactive, Time Critical and Highly self-adaptive cloud applications) allows for the development and deployment of real-time applications in the cloud, but it does not yet support instances backed by Graphics Processing Units (GPUs). Wanting to explore how SWITCH might support CUDA (a GPU architecture) in the future, we have undertaken a review of time-critical CUDA applications, discovering that run-time requirements (which we call ‘wall time’) are in many cases regarded as the most important. We have performed experiments to investigate which parameters have the greatest impact on wall time when running multiple Amazon Web Services GPU-backed instances. Although a maximum of 8 single-GPU instances can be launched in a single Amazon Region, launching just 2 instances rather than 1 gives a 42% decrease in wall time. Also, instances are often wasted doing nothing, and there is a moderately-strong relationship between how problems are distributed across instances and wall time. These findings can be used to enhance the SWITCH provision for specifying Non-Functional Requirements (NFRs); in the future, GPU-backed instances could be supported. These findings can also be used more generally, to optimise the balance between the computational resources needed and the resulting wall time to obtain results.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA76 Computer software
Publisher: Elsevier
ISSN: 0167-739X
Funders: European Union
Date of First Compliant Deposit: 17 June 2019
Date of Acceptance: 15 May 2019
Last Modified: 18 Oct 2019 18:29
URI: http://orca-mwe.cf.ac.uk/id/eprint/123498

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics