Problems Running bwa

Supplying more than one file with bwa index does not work.

Eg:

bwa index hs_ref_GRCh37.p5_chr1.fa hs_ref_GRCh37.p5_chr2.fa hs_ref_GRCh37.p5_chr3.fa hs_ref_GRCh37.p5_chr4.fa hs_ref_GRCh37.p5_chr5.fa hs_ref_GRCh37.p5_chr6.fa hs_ref_GRCh37.p5_chr7.fa hs_ref_GRCh37.p5_chr8.fa hs_ref_GRCh37.p5_chr9.fa hs_ref_GRCh37.p5_chr10.fa hs_ref_GRCh37.p5_chr11.fa hs_ref_GRCh37.p5_chr12.fa hs_ref_GRCh37.p5_chr13.fa hs_ref_GRCh37.p5_chr14.fa hs_ref_GRCh37.p5_chr15.fa hs_ref_GRCh37.p5_chr16.fa hs_ref_GRCh37.p5_chr17.fa hs_ref_GRCh37.p5_chr18.fa hs_ref_GRCh37.p5_chr19.fa hs_ref_GRCh37.p5_chr20.fa hs_ref_GRCh37.p5_chr21.fa hs_ref_GRCh37.p5_chr22.fa hs_ref_GRCh37.p5_chrX.fa hs_ref_GRCh37.p5_chrY.fa hs_ref_GRCh37.p5_chrMT.fa hs_ref_GRCh37.p5_unlocalized.fa hs_ref_GRCh37.p5_unplaced.fa -p h_sapiens_37.5 -a is

creates a database (files h_sapiens_37.5*) which takes about 8 minutes but only contains DNA sequences from the first file (hs_ref_GRCh37.p5_chr1.fa

Work around

Concatenate h_sapiens_37.5* into a single file. (See also next error).

Gareth Wilson suggests tweaking the header lines so that they start with the chromosome number followed by a space and then the full heading.

bwa index /tmp/INPUTS.fa -p h_sapiens_37.5 -a bwtsw

now takes 2 hours 23 minutes.

`bwa index -a is Segmentation fault`

status=139

See http://sourceforge.net/mailarchive/forum.php?thread_name=14941A7A-C42F-4729-A2A8-1C5E649E722D%40sanger.ac.uk&forum_name=bio-bwa-help

Work around

Tom Blackwell says "for a .fasta file over 2 Gb one must use -a bwtsw". I.e. replace -a is with -a bwtsw
(See man page.)

Alternative prebuilt indexes

"Illumina built indexes are available at the iGenomes site http://cufflinks.cbcb.umd.edu/igenomes.html"
GenoMax; Today at 08:41 AM

20GB

W.B.Langdon Back 19 October 2012 (last update 14 Nov)