Solutions to Turtle Problems

Segmentation fault

Command lines like
scTurtle32 -f ../1Mreads.fa.gz -o kmer_counts -k 31 -n 6000000 -t 3
cause Turtle 0.3.1 to seg fault almost immediately.

Work around

Decompress 1Mreads.fa.gz

Use -i switch to tell Turtle the file is in fasta format.

scTurtle32 -i ../1Mreads.fa -o kmer_counts -k 31 -n 6000000 -t 3
Should be fine and give output in about ten seconds like
Turtle Copyright (C) 2014 Rajat Shuvro Roy, Alexander Schliep.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it  under certain conditions. For details see the document COPYING.

Parameters received:
fasta input 	../1Mreads.fa
ouput prefix	kmer_counts
k-mer length 	31
Freq. k-mers 	6000000
no of threads 	3
STATs:
No of reads:	 1000000
No of k-mers:	 45613374
no of frequent k-mers found :5590403

Turtle gives help message all the time

It appears Turtle is very picky about its command line. Unless you give it all the parameters it wants it produces its help text.

Work around

Get the command line (including -t) right.

turtle k-mer count not same as grep's

Consider as an example the last 31-mer at the end of kmer_counts0
>2
GGGGTGGACCCAAAAACTCCCCACGCCCCCC
GGGGTGGACCCAAAAACTCCCCACGCCCCCC does not appear in ../1Mreads.fa at all let alone twice.

BUT turtle includes complementary matches. I.e. GGGGGGCGTGGGGAGTTTTTGGGTCCACCCC does occur twice in ../1Mreads.fa

Work around

gawk -f complement.awk GGGGTGGACCCAAAAACTCCCCACGCCCCCC
will generate the complementary strand.

TGTGTGGGGGGCGTGGGGAGTTTTTGGGTCC not reported

If we shift the 31-mer in the previous example one base to give TGTGTGGGGGGCGTGGGGAGTTTTTGGGTCC then neither it (nor its complement) are reported in Turtle's output file kmer_counts0

scTurtle does not report unique k-mers, i.e. with count of exactly one.

Also Turtle treats separate sequences as separate and does not consider the tail of the previous sequence as being adjacent to the start of the next even though they are in the same file.

For the purposes of experiment only, if sequences 2_489329 and 1_489330 are run together, then Turtle will find two cases where GGGGTGGACCCAAAAACTCCCCACGCCCCCC or its complement match and so report two matches for it.


W.B.Langdon Back 21 August 2015