ENCODE, CODIS, and the Urgent Need to Focus on what is Scientifically and Legally Relevant to the DNA Fingerprinting Debate

Sara Huston Katsanis, MS is an Associate in Research at the Institute for Genome Sciences & Policy at Duke University.

On September 5, 2012, a coordinated release of 30 articles in Nature, Cell, Science, Genome Research, Genome Biology and other journals published the long-awaited findings of The Encylopedia of DNA Elements (ENCODE) Consortium. The press coverage of ENCODE data is deafening at this point, and ENCODE’s relevance to GLR readers may not be immediately apparent.

Across the U.S., numerous groups are challenging the integration of CODIS profiles (sometimes called “DNA Fingerprints”) into the routine booking procedures upon arrest for certain crimes (depending on the state), placing genetic profiling among other standard procedures such as fingerprinting and mug shot photographs. The GLR has covered these legal challenges previously (including here, here, and here).

On September 19, 2012, the Ninth Circuit sitting en banc heard oral arguments in Haskell v. Harris (formerly Haskell v. Brown) challenging California’s Proposition 69 (for the GLR’s previous coverage, see here). This week, the Electronic Frontier Foundation, or EFF, filed a brief with the Ninth Circuit asking the court to take the recently published ENCODE findings into consideration, explaining on the EFF blog that ENCODE represents:

ground-breaking new research that confirms for the first time that over 80% of our DNA that was once thought to have no function, actually plays a critical role in controlling how our cells, tissue, and organs behave…[ENCODE] should have broad ramifications for federal and state DNA collection programs.

With headline after headline after headline seemingly providing ample ammunition–opponents of DNA profiling of arrestees are taking aim at the constitutionality of CODIS profiles. Those opponents contend that the portions of the genome once disregarded as mere non-protein-coding DNA (ncDNA, or, as the unfortunate phrase goes, “junk DNA”) have been upgraded by ENCODE to “functional” portions of the genome. Though that claim is probably more robustly supported by the hype of ENCODE than ENCODE’s actual findings, opponents of DNA profiling are nevertheless using ENCODE’s findings to argue that CODIS profiles relay far more than just identification information and that, under a totality of the circumstances analysis (which is the legal argument that multiple courts in Haskell v. Harris have, thus far, used to find California’s policy of DNA fingerprinting of felony arrestees constitutional), the interests of the government are far outweighed by privacy interests of individuals not yet convicted.

As two scholars independent of the forensic science community but committed to the ethical, legal, and social implications of genome sciences, we investigated what is currently known about the current and recommended CODIS markers. Our research, conducted in October 2011 on the current draft of the human genome (hg19), was published in the Journal of Forensic Sciences within a week of the ENCODE publications and was cited by the California Attorney General in its reply to the EFF brief in Haskell v. Harris.

In our Technical Note, “Characterization of the Standard and Recommended CODIS Markers,” we found no evidence that CODIS genotypes are causative or predictive of known phenotypes. In the Letter, one of us urged the avoidance of the phrase “junk DNA.” It is of immediate importance and interest to try to reconcile the ENCODE publications with our understanding of CODIS, both because of the professional arguments that have followed the ENCODE publications and because of, as the EFF’s brief demonstrates, the inevitable use of ENCODE’s data and hype to attempt to sway lawmakers, courts and jurors with respect to the appropriateness of DNA fingerprinting.1

A brief introduction to ENCODE and CODIS is necessary before we explain that, while ENCODE (and the large body of scientific work developed prior to and in parallel with ENCODE) may have revealed that the genomic regions including the CODIS markers have some functionality, ENCODE has not demonstrated that CODIS genotypes have any known functionality or that the resulting CODIS profiles have any known direct positive or negative predictive value for inferring phenotypes.

A brief introduction to ENCODE and CODIS. ENCODE, the Encyclopedia of DNA Elements, is a federally supported, international research collaboration aimed at better understanding and cataloging the “functional” elements of the genome and is an important follow-up and supplement to the Human Genome Project.

CODIS, the Combined DNA Index System, is a DNA database established by the FBI pursuant to congressional authorization by the DNA Identification Act of 1994. Since then, a number of federal statutes have authorized the collection of a DNA sample and creation of DNA profiles from individuals convicted of felonies. Similar databases and collection statutes exist worldwide, with Life Corporation ($LIFE) recently estimating that, “to date, 44 countries have now implemented criminal offender DNA database programs with a combined offender sample pool of 40 million and growing.” In the United States, the federal statutory authority expanding DNA profiling from those convicted of any federal felony to those arrested and charged with particular felonies is The DNA Fingerprint Act of 2005. Since its passage, more than half of the states have passed arrestee DNA Database laws.

The DNA profiles used in CODIS consist of genotypes at a standard panel of 13 markers, all short tandem repeats or STRs. The consideration of an additional 11 STRs is ongoing, which could soon result in a standard profile consisting of genotypes at 24 markers. The statutory authorization for CODIS limits the DNA analysis to discovery of identification information and precludes DNA analysis that would disclose phenotypic information (such as an individual’s appearance or health conditions), a point reiterated by the Department of Justice’s Final Rule.

What does ENCODE currently mean for CODIS? The recent news of the functional significance of certain non-protein-coding regions of the genome has snowballed, in part thanks to some questionable constructions of ENCODE’s actual findings, into assumptions that the CODIS markers (which are located in non-protein-coding regions of the genome) are–or might be–predictive of individual traits. This is largely a result of the misguided equating of “marker” with “genotype.”

Even prior to ENCODE, it was a well-known fact that many of the CODIS markers are positioned within genomic regions associated with known traits. Some markers are even located within introns of genes with demonstrated causal relationships to known traits. And some of these genes are listed in OMIM, the Online Mendelian Inheritance in Man database, which to geneticists means that the genes may be associated with disease.

However, the genotypes of the CODIS markers (that is, the specific repeat lengths of As, Cs, Gs and/or Ts at the CODIS markers) have no known association (either statistical or causal) with any phenotype. They remain understood as “polymorphisms,” even now that we know (or perhaps have re-learned, thanks to ENCODE), that the genes and genomic regions in which those markers occur may be largely functional (in the broad sense of that word). These genotypes could be in linkage disequilibrium with mutations associated with phenotypes, meaning that a genotype would be near enough to a mutation to be a statistical harbinger for that mutation. One marker, indeed, has a documented weak association of schizophrenia with a particular repeat length, but the association is so weak that the predictive value is negligible. That is, even if the police know that a person has this repeat-length genotype, the police could not predict whether that person is schizophrenic.

CODIS genotype frequencies do vary by race and ethnicity, and to the extent that phenotypes also vary by race or ethnicity, CODIS profiles could indirectly be used to infer race and/or ethnicity and by extension could then be used to infer phenotypes. However, the statistical correlations are crude at best, as these markers are not robust ancestry informative markers (AIMs). Also of note, CODIS profiles routinely test for a polymorphism in the amelogenin gene that varies on the X- and Y- chromosomes. The genotype for amelogenin, therefore, (in most, but not all, cases) confers sex. But most important to the legal challenges at hand, the limited phenotypic information inferred from ameloginin or ancestry estimation is already decipherable from mug shot photos, so its redundant availability in the form of CODIS profiles is, or at least should be, unlikely to cause any notable controversy.

The bottom line is that knowing a person’s unique 13- or 24-marker profile at the genomic sites used by CODIS does not, to the best of our current knowledge, allow reliable, valid inference of anything more than identity (aside from sex) without performing additional analyses and drawing additional inferences from those analyses (e.g. estimating ancestry from the CODIS genotypes and subsequently performing analyses to infer phenotypes from those ancestry estimates). Importantly, the statutes establishing CODIS expressly prohibit the use of CODIS profiles for analysis other than identity.

A shift in focus is needed. At this time, the courts (including the Ninth Circuit) hold the future of CODIS, and the expanded use of DNA-based technologies in law enforcement, in their hands. It is essential that the courts consider the constitutionality of DNA collection upon arrest within the context of existing scientific knowledge (and not conjectures driven by press releases). A clear understanding of the science, as presented by ENCODE and elsewhere, indicates that the real focus of the legal challenges to DNA profiling upon arrest should rest with the collection and retention of the DNA and the potential for future additional uses of those samples aside from CODIS and beyond identification, rather than an obsession with yet-to-be-discovered direct phenotypic predictive value of the current CODIS markers. That said, the choice of markers used by law enforcement must be flexible enough to respond to concerns of functionality that may be revealed through future scientific research; but, again, the existing set of markers – as far as we know today – do not convey information directly other than identity.

The expanded use of DNA in law enforcement remains a concern for some of us in regards to the perpetuation of racial disparities of the CODIS database, the questionable utility of the expanded database in solving crimes and the appropriate allocation of resources for solving crimes and reducing evidence backlogs.

Those are all important issues that should inform the ongoing conversation over the future of CODIS and similar databases worldwide. But to effectively have that conversation, we must first stop dwelling upon the currently nonexistent functionality of CODIS genotypes and shift the conversation back to the decision at hand – whether the collection and retention of DNA samples from innocent individuals is both constitutionally permissible and represents appropriate public policy.


1 The inevitable use of ENCODE’s data and hype is not limited to the issue of DNA fingerprinting but includes opposition to evolution (For further reading, see here and here).