Leveraging the Crowd to Understand Your Genome

crowdEarlier this week Peter Aldhous of NewScientist magazine recounted an unusual experience with DTC genomics provider Decode Genetics. In reviewing his genetic data on the deCODEme website, Aldhous uncovered what appeared to be significant and bizarre errors in his mitochondrial DNA. Aldhous turned to Blaine Bettinger, The Genetic Genealogist, for help in diagnosing the problem with his mitochondrial DNA. Bettinger’s response: “This is a strange question, but are you sure this is Homo sapiens?

Aldous, Bettinger and Decode investigated the problem and ultimately determined that the “errors” in the mitochondrial DNA were actually being introduced by a bug in the deCODEme software interface that allows users to browse their data. (Aldhous carefully points out that the software glitch was a rare one and that it did not seem to affect deCODEme’s disease-risk summaries or analysis.)

More than a simple software error, Aldhous’s experience highlights the complexity inherent in consumer genomes. Translating an individual’s saliva sample into a description of genetically influenced traits and risks is a multi-stage process with potential for error at every step in the chain. Or, as Daniel MacArthur of Genetic Future cleverly puts it, “There’s many a slip ‘twixt spit and SNP.”

Know thyself, but how?

Both Aldhous and MacArthur recognize the larger significance and offer sensible guidance to a problem that seems certain to become more prevalent as the number of personal genomics customers and patients increases. Aldhous argues that “meticulous bug-checking will be needed to ensure that health IT delivers on its promise of improving clinical decision-making and reducing human errors.” Error-free health IT is surely a quixotic goal, but Aldhous is quick to recognize that even if it were to be attained, it would be insufficient. Even a genome perfectly sequenced and displayed, without errors introduced by human or computer, is rife with errors of another sort—genetic mutations in the form of single nucleotide polymorphisms (SNPS), as well as other copying and structural changes—the majority of which have no clinical significance whatsoever. What Zac Kohane of Harvard Medical School has termed “the incidentalome.”

When errors abound, and from so many possible sources, what is the individual to do? MacArthur suggests a genomic adaptation of the ancient Greek aphorism “Know Thyself”:

…rather than being a passive recipient of genetic forecasts, dig into your data and see if it makes sense, and keep asking questions until it does. In addition to making it more likely that you’ll pick up any errors in your results, you’ll also develop a much deeper understanding both of the nature of genetics and of your own genome.

While progress in genomic research and the falling cost of genomic sequencing daily bring raw genomic data within reach of increasing numbers of individuals, comparatively few have the time, inclination or ability to dig as deeply into their own genomes as MacArthur suggests—to know themselves by themselves.

The logical, and traditional, source of guidance in this area remains medical professionals. But with a well-recognized deficiency in genetic understanding by many general practitioners, and a paucity of specially trained genetic counselors, DTC genomics companies (including deCODE) have emerged to provide consumers access to and interpretation of their genomic data.

Aldhous’s experience—when his own genetic data threw him an apparent curveball, he consulted a genetic genealogist-lawyer-blogger and bioinformatics experts, and not his physician—demonstrates that these traditional sources of guidance may come to represent a complementary part in a much larger and varied infrastructure available to individuals seeking personal genomic understanding.

A Future of Shared Genomes?

Beyond physicians and genetic counselors, and even beyond the emerging DTC market with its proprietary bioinformatics (that led Peter Aldhous to briefly wonder if he was “the product of some twisted genetic experiment”), there is the crowd.

I wrote last week that the primary difference between the concepts of crowd-sourcing and open-source in genomic research has to do with data availability:

“Crowd-sourcing” refers to using a large, often varied or undefined group or population to undertake a defined task. In the case of genomic research, this might entail using web-driven or other distributed modes of interaction to identify research populations, recruit participants and, ultimately, collect the data necessary to produce meaningful scientific research.

“Open-source,” although similar, means something different. It refers to the public accessibility of data, traditionally the source code for a particular piece of software. In genomics, this means public access to research data—whether collected through crowd-sourcing or other means. Those data can then be used by individuals, by companies or by scientists for whatever purposes they desire.

One of those purposes? Error-checking. Whether its verifying that you’re a member of the species Homo sapiens or attempting to determine the significance of a particular SNP, opening up your genome to public analysis is a novel but potentially powerful way to better know your genomic self.

It’s what Peter Aldhous did when he took his genetic data and handed it to the Genetic Genealogist. It’s an idea that personal genomics companies (if not yet a majority of their customers) have begun to embrace in the form of genomic data-sharing features, including the development of Illumina’s highly-publicized iPhone app. In a much more expansive way it is one of the motivating principles and identifying features of the Personal Genome Project.

It’s difficult to say whether open-source genomics resources, especially for the interpretation and validation of individual genomic data, will develop and, if so, exactly what shape they will take. Still, some early clues are arriving. The Wikipedia-styled SNPedia, a free, publicly editable database of SNPs and their effects, provides genomic data generated by the (also free) interpretive tool Promethease from 34 public genomes, including the Experimental Man and the PGP-10. The Personal Genome Project, which is enrolling its next 100 participants and has over 15,000 potential participants in its enrollment pipeline, is also developing an open-access interpretive tool, Traitomatic (pdf), that will be used to analyze the genomic and phenotypic data supplied by its participants. (The raw data itself is made available using the Creative Commons CC0 universal waiver.)

From unraveling bioinformatics errors, as Aldhous did, to adjusting medications, to uncovering unknown genetic variants, the upside of utilizing an open-access approach to personalized genomic interpretation is the ability to allow an untold number of eyes to comb over your data in search of something important (or perhaps just interesting). It seems highly improbable that any combination of DTC genomics companies and open-source genomics resources will ever completely supplant a one-on-one consultation with a trained medical professional, particularly where clinical genetic guidance is required. And concerns over privacy and misuse of data may inhibit many from sharing their own genomic data, at least at present. But there appears to be a significant role for open-source genomics resources to play in the continuing expansion and democratization of personal genomic inquiry.