cigpu 2009 large logo

Papers to be presented
at
Computational Intelligence on Consumer Games and Graphics Hardware CIGPU-2009.

Evolving Soft Robotic Locomotion in PhysX

John Rieffel

Given the complexity of the problem, genetic algorithms are one of the more promising methods of discovering control schemes for soft robotics. Since physically embodied evolution is time consuming and expensive, an outstanding challenge lies in developing fast and suitably realistic simulations in which to evolve soft robot gaits. We describe two parallel methods of using NVidia's PhysX, a hardware-accelerated (GPGPU) physics engine, in order to evolve and optimise soft bodied gaits. The first method involves the evolution of open-loop gaits using a reduced-order lumped parameter model. The second method involves harnessing PhysX's soft-bodied material simulation capabilities. In each case we discuss the the challenges and possibilities involved in using the PhysX for evolutionary soft robotics.
more

A Fast High Quality Pseudo Random Number Generator for nVidia CUDA

W. B. Langdon

Previously either due to hardware GPU limits or older versions of software, careful implementation of PRNGs was required to make good use of the limited numerical precision available on graphics cards. Newer nVidia G80 and Tesla hardware support double precision. This is available to high level programmers via CUDA. This allows a much simpler C++ implementation Park-Miller random numbers, which provides a four fold speed up compared to an earlier GPU implementation. Code will be available via http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/random-numbers/cuda_park-miller.tar.gz
PDF
more

Evaluating the Cell Broadband Engine as a Platform to Run Estimation of Distribution Algorithms

Carlos Perez-Miguel

Current consumer-grade computers and game devices incorporate very powerful processors that can be used to accelerate many classes of scientific codes. However, programming multi-core chips, hybrid multi-processors or graphical processing units is not an easy task for those programmers that deal mainly with sequential codes. In this paper, we explore the ability of the Cell Broadband Engine to run a particular Estimation of Distribution Algorithm. From an initial sequential version, we develop a multi-threaded one that is afterwards reworked to run on a Cell. The multi-threaded version is capable of efficiently use current multi-core chips, such as those used in desktop PCs. However, the efficiency of the Cell version is very low. We analyse the causes of these discouraging results, and provide some clues about the class of problems that could be efficiently ported to the Cell.
more

Deployment of CPU and GPU-based Genetic Programming on Heterogenous Devices

Garnett Wilson

A widely available and economic means of increasing the computing power applied to a problem is to use modern graphics processing units (GPUs) for parallel processing. We present a new, optimised general methodology for deploying genetic programming (GP) to the PC, Xbox 360 video game console, and Zune portable media device. This work describes, for the first time, the implementation considerations necessary to maximise available CPU and GPU (where available) usage on the three separate hardware platforms. We demonstrate the first instance of GP using portable digital media device hardware. The work also presents, for the first time, an Xbox 360 implementation that uses the GPU for fitness evaluation. Implementations on each platform are also benchmarked on the basis of execution time for an established GP regression benchmark.
more

Parallel Multi-Objective Evolutionary Algorithms on Graphics Processing Units

Man Leung Wong

Most real-life optimisation problems or decision-making problems are multi-objective in nature, since they normally have several (possibly conflicting) objectives that must be satisfied at the same time. Multi-Objective Evolutionary Algorithms (MOEAs) have been gaining increasing attention among researchers and practitioners. However, they may execute for a long time for some difficult problems, because several evaluations must be performed. Moreover, the non-dominance checking and the non-dominated selection procedures are also very time consuming. From our experiments, more than 99% of the execution time is used in performing the two procedures. A promising approach to overcome this limitation is to parallelise these algorithms. In this paper, we propose a parallel MOEA on consumer-level Graphics Processing Units (GPU). We perform many experiments on two-objective and three-objective benchmark problems to compare our parallel MOEA with a sequential MOEA and demonstrate that the former is much more efficient than the latter.
more

Solving Quadratic Assignment Problems by Genetic Algorithms with GPU Computation: A Case Study

Shigeyoshi Tsutsui and Noriyuki Fujimoto

This paper describes designing a parallel GA with GPU computation to solve the quadratic assignment problem (QAP) which is one of the hardest optimisation problems in permutation domains. For the parallel method, a multiple-population, coarse-grained GA model was used. Each subpopulation is evolved by a multiprocessor in a GPU (NVIDIA GeForce GTX285). At predetermined intervals of generations all individuals in subpopulations are shuffled via the VRAM of the GPU. The instances on which this algorithm was tested were taken from the QAPLIB benchmark library. Results were promising, showing a speedup ration from 3 to 12 times, compared to the Intel i7 965 processor.
PDF
more

Design and Implementation of Real-time Parallel GA Operators on the IBM Cell Processor

Pascal Comte

We present a set of single-core designed parallel SIMD Genetic Algorithm (GA) operators aimed at increasing computational speed of genetic algorithms. We use a discrete-valued chromosome representation. The explored operators include: single gene mutation, uniform crossover and a fitness evaluation function. We discuss their low-level hardware implementations on the Cell Processor. We use the Knapsack problem as a proof of concept, demonstrating performances of our operators. We measure the scalability in terms of generations per second. Using the architecture of the Cell Processor and a static population size of 648 individuals, we achieved 11.6 million generations per second on one Synergetic Processing Element (SPE) core for a problem size n = 8 and 9.5 million generations per second for a problem size n = 16. Generality for problem of size n multiple of 8 is shown. Executing 6 independent concurrent GA runs, one per SPE core, allows for a rough overall estimate of 70 million generations per second and 57 million generations per second for problem sizes of 8 and 16 respectively.

Parallel Latent Semantic Analyses using a Graphics Processing Unit

Xiaohui Cui

Latent Semantic Analysis (LSA) can be used to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. In this paper, we presented a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture (CUDA) and Compute Unified Basic Linear Algebra Subprograms (CUBLAS). The performance of this implementation is compared to traditional LSA implementation on CPU using an optimised Basic Linear Algebra Subprograms library. For large matrices that have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version.
more

Design and Implementation of Parallel Linear GP for the IBM Cell Processor

Pascal Comte

We present two different single-core parallel SIMD linear genetic programming (LGP) systems aimed for the IBM Cell Processor on the Playstation3. Our algorithms harness their computational power from the SIMD capabilities of the Cell Processor. We implement two evolutionary algorithms and look at the classical problem of symbolic regression of functions. The first LGP generates a single offspring and selection from the population occurs randomly. The second algorithm generates two offspring and selection from the population is performed using k-tournament with k = 2. Mutation occurs at macro and micro levels. Both SIMD instructions and register operands are subject to mutation. We use a static population of 648 individuals and experiments are constrained to 300 seconds of computational time. Our results indicate that both EAs perform equally well though the first algorithm is faster and outperforms the 2nd algorithm in some cases. We speculate that the speed at which generations are iterated through is significantly greater than that of a typical tree-based GP and scalar linear GP.


back

W.B.Langdon 4 May 2009 (last update 4 July 2014)