Kaiser’s Massive Genetic Database Leverages Its Patient Population (But It’s A One Way Street)

one wayThis week MIT’s Technology Review featured a story about Kaiser Permanente and its plans to use its Northern California patients to construct an enormous genetic database. The acronym-unfriendly Research Program on Genes, Environment, & Health, or RPGEH is funded in large part by a $25 million NIH research grant courtesy of February’s stimulus bill. The program will genotype 100,000 patients using SNP array technology from Affymetrix. If all goes well, the project will expand to as many as 500,000 patients by 2013.

What makes the RPGEH proposal so exciting, from a research perspective, is not just the 700,000 SNPs that will be genotyped for 100,000 patients, although that alone would represent one of the largest genetic research databases currently in existence. The real value lies in the marrying of genetic information with robust medical, environmental and other phenotypic data that Kaiser already maintains as a health care provider. From the RPGEH’s official description:

Researchers are collecting medical, lifestyle, demographic, environmental and, in some cases, genetic information from up to 500,000 Northern California Kaiser Permanente members.

DNA from participants’ saliva and/or blood samples will be used to obtain information about genes and exposures to environmental contaminants. This information will be entered into the new, secure database that contains participants’ survey responses and medical record information. Once we have all this information in one secure place, we’ll have one of the largest databanks of its kind in the United States.

The combination of genomic and phenomic information is widely recognized as a key step in constructing the large, rich datasets that will enable researchers to continue to explore the complex pathways by which a genome is ultimately manifest as an unique human being.

Not long ago, the rate-limiting step in building comprehensive genomic research databases was the generation of genotypic data. After all, sequencing the first composite human genome took over a decade and cost several billion dollars. In recent years, however, the cost of genomic sequencing has declined precipitously. Continued scientific and technological advances seem certain to render genomic sequencing a commodity in short order, perhaps within as little as 3-5 years.

As the generation of genomic information is rendered faster, easier and less expensive, the rate-limiting step is increasingly at the other end: the accumulation of accurate and comprehensive phenomic data. Detailed information on factors like environmental exposures, medical histories and other physical and behavioral characteristics is necessary to construct a complete picture of the process by which genes are converted into traits in individuals. And that is where health care providers such as Kaiser, who are in the business of collecting detailed phenotypic information about their patients, have the potential to provide an invaluable resource.

As Cathy Shaefer, a Kaiser research scientist, puts it in the Technology Review, “The importance of this project is that it will, almost overnight–well, in two years–produce a very large amount of genetic and phenotypic data that a large number of investigators and scientists can begin asking questions of, rather than having to gather data first.”

But what do the 100,000 to 500,000 Kaiser patients receive in return for their participation in the RPGEH? The big picture answer is that those patients will contribute to the creation of a scientific database of indisputable value for the study of human health and disease, unquestionably a sufficient reason for any individual to choose to participate.

But at an individual level, there’s not much else to look forward to. Although patients’ health records will be combined with newly generated genotype data for purposes of the RPGEH database, that genetic information will not be made available to the patients themselves, or to their doctors, except at the discretion of RPGEH scientists. From the RPGEH’s FAQ:

Will I receive any personal benefit from giving a saliva sample that may impact or improve my own health?

This research is not intended to benefit individual participants directly. However, we hope that results from the research will improve how clinicians diagnose, treat and maybe even prevent major illnesses in the future.

Will I be getting any test results?

You will not receive personal health or medical results from taking part in the RPGEH. We do not expect that results from the RPGEH will be the kind of information that can be used by you or your health care provider to make decisions about your current health care. However, if scientists discover information as a result of RPGEH research that we believe is of substantial medical importance to you, we may contact you and ask if you want to learn the results.

As Catherine McCarty touched on in her ELSI piece last week here at the GLR (“To Share or Not to Share: That is the Question”), the logistics of returning genetic results to research participants is a key issue facing the fields of genomics and personalized medicine. As McCarty notes, “research findings . . . that were initially tentative may become truly significant and ideally clinically relevant. Researchers will then need to consider the ethics of withholding information known to be clinically relevant . . . .” Although it depends on the exact SNP data generated by the RPGEH project, with 700,000 SNPs per participant and 100,000 participants, it seems a statistical certainty that some of the genotype data will be of clinical relevance.

The norm in the field of genetic research continues to be the RPGEH approach, with research projects generally declining to return information to participants or patients except in cases of “substantial medical importance,” the threshold for which is unlikely to be clearly defined. New projects such as the Coriell Personalized Medicine Collaborative also restrict the information that is returned to participants, but are more specific about what information should be returned, focusing on genetic variants that are deemed potentially medically actionable.

Still other projects, most notably the Personal Genome Project, are testing traditional genomic research norms by returning all research data directly to participants, a formula that has required the PGP to take steps (pdf) to ensure that participants are given some guidance as to the information content of that data and reminded, repeatedly, of the importance of confirming any research findings with a clinical care provider before taking (or avoiding) any particular course of clinical action.

There is little question that as the cost of genomic sequencing falls, large-scale research databases such as Kaiser’s will continue to proliferate, helping researchers and scientists push the boundaries of genomic understanding. These datasets cannot exist without patients and participants willing to contribute their genomic and phenomic information. Whether we have now reached the point that McCarty and others have envisioned, at which a compelling argument can be made that research data is of such individual value that it should be returned to participants, the very size and promise of the RPGEH project signal that point may not be far off.