scTurtle32 -f ../1Mreads.fa.gz -o kmer_counts -k 31 -n 6000000 -t 3cause Turtle 0.3.1 to seg fault almost immediately.
-i switch to tell Turtle the file is in
scTurtle32 -i ../1Mreads.fa -o kmer_counts -k 31 -n 6000000 -t 3Should be fine and give output in about ten seconds like
Turtle Copyright (C) 2014 Rajat Shuvro Roy, Alexander Schliep. This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. For details see the document COPYING. Parameters received: fasta input ../1Mreads.fa ouput prefix kmer_counts k-mer length 31 Freq. k-mers 6000000 no of threads 3 STATs: No of reads: 1000000 No of k-mers: 45613374 no of frequent k-mers found :5590403
GGGGTGGACCCAAAAACTCCCCACGCCCCCCdoes not appear in
../1Mreads.faat all let alone twice.
BUT turtle includes complementary matches.
does occur twice in
gawk -f complement.awk GGGGTGGACCCAAAAACTCCCCACGCCCCCCwill generate the complementary strand.
TGTGTGGGGGGCGTGGGGAGTTTTTGGGTCCthen neither it (nor its complement) are reported in Turtle's output file
scTurtle does not report unique k-mers, i.e. with count of exactly one.
Also Turtle treats separate sequences as separate and does not consider the tail of the previous sequence as being adjacent to the start of the next even though they are in the same file.
For the purposes of experiment only,
1_489330 are run together,
then Turtle will find two cases where
GGGGTGGACCCAAAAACTCCCCACGCCCCCC or its complement match
and so report two matches for it.