Tetzlaff D. A framework for machine learning based mapping of concurrent applications to parallel architectures

pdf file
size 7,47 MB

added by Lesly 03/01/2019 23:48
info modified 03/02/2019 04:42

Tetzlaff D. A framework for machine learning based mapping of concurrent applications to parallel architectures

Technical University of Berlin, 2014. — 246 p.

In this thesis, we present a general framework to improve the automatic mapping of concurrent applications to parallel architectures and we define its instantiation for mapping MPI programs to processor networks. For scheduling the parallelly executable tasks of applications and for their allocation to the processing elements of the target architecture, the major challenge is to analyze in advance the expected run-time behavior of applications. Our proposed framework solves this problem by utilizing Machine Learning (ML) techniques to derive precise predictions for this information. This knowledge is used as fast and accurate heuristics to establish a cost model that rates the gain of various mappings. Using cost models based on machine learned heuristics that include knowledge about the run-time behavior of programs one can expect an advantage over cost models based on purely static analyses, which must conservatively over-approximate the run-time behavior. As a result, optimization potential to improve program performance is increased. To improve the mapping, we define automatic analyses that statically determine the parallel structure of applications. Based on this, we introduce ML techniques to derive the most needed information for optimizing the mapping: knowledge about the execution times of parallelly executable tasks of applications and about the communication amount between tasks. These ML techniques have to be deployed only once per architecture in a training phase, which is decoupled from the compilations of applications. Hence, the compile time is not increased, thereby preserving an efficient and continuous compilation flow. To rate the gain of alternative mapping schemes, we also define a general cost model that is based on machine learned knowledge and that is parametrized by hardware-dependent information. Using this cost model enables a power-efficient and communication-aware mapping of applications to any parallel architectures. Our general framework is applicable to a wide diversity of parallel programming models and target architectures. In the second part of this thesis, we show how our general framework can be applied for improving the mapping of MPI programs to processor networks. We have fully implemented the instantiated framework and performed experiments to determine the accuracy of our approach. For our experiments, we have used a considerable number of programs from various benchmark suites that encompass different real-world application domains. This shows on the one hand the general applicability and on the other hand the high scalability of our framework. The evaluation of our experiments demonstrates that we are able to predict regarded run-time behavior more precisely compared to other heuristic approaches.