#WBL 25 May 2012 $Revision: 1.1 $ This tar file contains data for 251 women with breast cancer operated upon in Uppsala. x.dat contains a summary for each woman. The first column is the anonymised patient name. This is used as the file name for data holding HG-U133A and HG-U133B Affymetrix GeneChip probes for that patient. The 2*712*712 GeneChip values were taken (in Feb 2007) from NCBI's GEO dataset GSE3494 and have been quantile normalised (natural log) and are presented as continuous numbers with zero mean one per line. The first line starts with a hash '#' and is a comment, which should be ignored. Data preparation is described in: GP on SPMD parallel Graphics Hardware for mega Bioinformatics Data Mining, W.B. Langdon and A.P. Harrison, in Soft Computing, October 2008, 12(12) 1169-1183. doi:10.1007/s00500-008-0296-x train2nd lists the patients used during the training phase of the genetic programming runs described above. This is the large gene expression data set referred to by http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/gpu_gp_2.tar.gz and http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/gpu_gp_cuda.tar.gz To reduce download time the larger text files have been compressed using the uniz gzip utility. These files have names ending in .gz File id.txt lists the GEO sample id (GSM) for each patient. There are two lines for each anonymised name: one for the HG-U133A measurements one one for the HG-U133B. This information is not needed unless you wish to refer back to GEO or the clinical procedures. Similarly it is only necessary to use the Affymetrix metadata if you wish to refer back to the data sources. The two dimensional (X,Y) layout of the HG-U133A and HG-U133B Affymetrix GeneChip is given by the two Affymetrix files HG-U133A.cdf and HG-U133B.cdf These meta data files for the Affymetrix "Human Genome U133 Set" and can be down loaded from the Affymetrix NetAffx Analysis Center http://www.affymetrix.com/Auth/support/downloads/library_files/hgu133_libraryfile.zip gp2HG-U133.txt lists the identifiers used by gpu_gp_cuda along with their X,Y coordinates on the A or B chip (A first) and the Affymetrix probe name (if any). Notice identifiers start at A1001 and only 994934 probes are named. However you are advised to use the BioConductor R-statistical package www.bioconductor.org rather than use the .cdf files directly.