CUDA optimisation of Technical Analysis parameters

Start date October 2012

Client Institute of Technology, Tallaght
Investigator

Dr David Nugent
david.nugent@elucidare.co.uk


Abstract

The optimisation of Technical Trading parameters is a computationally intensive exercise. Models comprising even a modest number of Technical Indicators require many thousands of simulations to be executed over a sample period of data, with the best performing sets of parameters employed to generate future trading signals. Researchers at the Institute of Technology Tallaght have developed GPU Computing methodologies for parallelised simulations, and a working Prototype optimiser based on the CUDA architecture. Remarkable speedup by the Prototype was achieved and a number of key design strategies are proposed.


Simulation results

Instrument data (Bund future prices) were captured for a sample 20-year period using a Bloomberg terminal. Testing of the data was performed on two NVIDIA GPUs alongside sequential and parallel versions of the optimiser on the GPU. In the CUDA implementation, the optimisation algorithm is implemented as both a host and device module, thereby ensuring that the same instructions are executed on the CPU as on the GPU. The CPU serial version can thus be considered a suitable baseline with which to compare parallel performance.

A parallel version of the Prototype optimiser was also developed using OpenMP to allow comparison between CUDA and native CPU parallel processing. OpenMP is an API, available to the C/C++ compiler, which enables the development of shared memory parallel applications. Section of code (typically loops) can be marked for parallel processing which are distributed across multiple CPU cores.

Table 1 shows the results of backtesting the optimal implementation of a Moving Average Crossover combined with a Volatility Ratio indicator. This test evaluates almost 12.3 million permutations.

Despite the lower efficiency per CUDA core, the GPUs significantly outperform the OpenMP implementation. Wallclock speeds encountered range from 2.5 hours on a single CPU, 22 minutes using OpenMP to 5.5 minutes on the GT-430 GPU and just 48 seconds on the GTX-480 GPU.

Table 1: Performance analytics for the optimal implementation of Moving Average Crossover and Volatility Ratio, with 12.3 million solutions backtested against twenty years of a single underlying instruments (BUND).


Documents available for download

 

CUDA-enabled Optimisation of Technical Analysis Parameters, John O’Rourke (Allied Irish Banks), Dr. John Burns (School of Science and Computing, Institute of Technology, Tallaght)