W.B.Langdon . 5 March 2014 2013 papers , full list
W. B. Langdon, Technical Report RN/13/10.
Mapping next generation DNA sequences from the thousand genome project against published genomes reveals many that match one or more Mycoplasma but are not included in the reference human genome GRCh37.p5. Many of these are of low quality but NCBI BLAST searches confirm some high quality, high entropy sequences match Mycoplasma but no human sequences. Suggesting at least 7percent of 1000G samples are contaminated.
Technical Report RN/12/11.
At least 473 Affymetrix HG-U133 +2 Homosapiens probes match one or more species of mycoplasma. Analysis of published data from thousands of human GeneChips finds correlations in homo sapiens studies between different microbiology laboratories in different countries which suggests contamination with mycoplasma is the common factor. This high lights the problem of experts in evolutionary computation needing to apply due diligence before relying on public medical datasets. Caveat emptor even if the data are free!