Slides presented at NCAF 2010 20 June

GP evolves parallel graphics card kernel for gzip

W. B. Langdon

CREST lab,
Department of Computer Science

      

Introduction

Genetic Programming

Free PDF        Tree (A-10)*B Tree (A-10)*B

GP Generational Cycle

Creating new programs - Crossover

Some applications of Genetic Programming

GP to write source code

GP Automatic Bug Fixing

GP Automatic Coding

GP Automatic Coding

Proof of Concept: gzip

CUDA 2.3 Template

scan_naive_kernel.cu
WBL 30 Dec 2009 Revision: 1.11 Remove comments, blank lines. int g_odata, uch g_idata. Add strstart1 strstart2, const.
 move offset and n, rename n as num_elements
WBL 14 r1.11 Remove crosstalk between threads threadIdx.x, temp -> g_idata[strstart1/strstart2]
__device__ void scan_naive(int *g_odata, const uch *g_idata, const int strstart1,
const int strstart2)
{
    int thid = 0; //threadIdx.x;
    int pout = 0;
    int pin = 1;
    int offset = 0;
    int num_elements = 258;
    <3var> = (thid > 0) ? g_idata[thid-1] : 0;
    for (offset = 1; offset < num_elements; offset *= 2)
    {
        pout = 1 - pout;
        pin  = 1 - pout;
        <3var> =  g_idata[strstart+pin*num_elements+thid];
        if (thid >= offset)
        <3var> += g_idata[strstart+pin*num_elements+thid - offset];
    }
    return <3var> ;
}

BNF grammar

scan_naive_kernel.cu converted into grammar (169 rules) which generalises code.
<line10-18>	::=	"" |<line10-18a>
<line10-18a>	::=	<line10e> <line11> <forbody> <line18>
<line11>	::=	"{\n" "if(!ok()) break;\n"
<line18>	::=	"}\n"
<line10e>	::=	<line10> |<line10e1>
<line10e1>	::=	"for (offset =" <line10.1> ";" <line10e.2> ";offset" <line10.4> ")\n"
<line10.1>	::=	<line10.1.1> |<intexpr>
<line10.1.1>	::=	"1" |<intconst>
<line10e.2>	::=	<line10e.2.1> |<forcompexpr>
<line10e.2.1>	::=	"offset" <line10.2> <line10.3>
<line10.2>	::=	"<" |<compare>
<line10.3>	::=	<line10.3.1> |<intexpr>
<line10.3.1>	::=	"num_elements" |<intconst>
<line10.4>	::=	"*= 2" |<intmod>
<intmod>	::=	"++" |<intmod2>
<intmod2>	::=	"*=" <intconst>
Fragment of 4 page grammar

gzip

gzip longest_match()

Fitness

Number of Strings to Check

gzip hash means mostly longest_match() has few strings to check.
Training data more evenly spread.

Length of Strings to Check

1% 0 bytes
0% 1 bytes
0 2 bytes
30% 3 bytes
26% 4 bytes
25% 5 bytes
14% 6 bytes

gzip heuristics limit search ≤ 258

Fitness 2

Performance of Evolved Code

Fall in number of poor programs

71% useless constants in generation 0
7% constants in last generation

nVidia GeForce 9600 GT


64 Stream Processors
Clock 1.5 GHz
512 MByte

7.8×4⅜ inches

Available
4 Tesla up to 16GBytes
Fermi 64 bit
512 processors
3 billion transistors

Laptops too
108 CUDA computers

Evolution of program complexity

Evolution of gzip Kernel

Evolution of tree size and depth CUDA gzip matches kernel using BNF grammar based STGP. 
Grammar parse trees displayed as circular lattices.
1MB 101 frames

Strongly typed grammar based GP behaving like conventional tree GP

Evolved gzip matches kernel

Parse tree of solution evolved in gen 55.
Ovals are binary decision rules. Red 2nd alternative used.

Evolved gzip matches kernel

__device__ int kernel978(const uch *g_idata, const int strstart1, const int strstart2)
{
int thid = 0;
int pout = 0;
int pin = 0;
int offset = 0;
int num_elements = 258;
for (offset = 1 ; G_idata( strstart1+ pin ) == G_idata( strstart2+ pin ) ;offset ++ )
{
if(!ok()) break;
thid = G_idata( strstart2+ thid ) ;
pin = offset ;
}
return pin;
}
Blue -fixed by template.
Black -default
Red -evolved
Grey - evolved but no impact.

Conclusions

Langdon+Harman WCCI 2010