Revealed names expose flaw in de-identified patient data
■ A researcher finds a way to attach identities to nameless records, but information collected by physician practices is not likely to carry the same risks.
By Pamela Lewis Dolan — Posted May 20, 2013
A recent experiment conducted by a Harvard researcher involving patient information highlights how names can be attached to what is thought to be anonymous health data.
Latanya Sweeney, PhD, a professor of government and technology and director of the Data Privacy Lab at Harvard University in Massachusetts, took a database of 1,130 de-identified participants of a genomic surveillance study and correctly re-identified 241 participants. Those re-identified had submitted three key pieces of information — date of birth, gender and ZIP code — that when combined with public records, such as voter registration, gave researchers enough clues to re-identify them.
Data security experts say that although there are no foolproof methods for de-identifying data, the circumstances involved with an opt-in research project, such as the Personal Genome Project, make it much easier for the data to be re-identified than it would be for other patient databases that are formed using routinely collected data, such as those from physician practices.
Unless a physician was personally involved in research using data on his or her patients, neither has control over the de-identification process, which is done outside the practice, said Angela Dinh Rose, director of the health information management practice at the American Health Information Management Assn. But because of the controls put in place under the Health Insurance Portability and Accountability Act privacy rule, re-identification isn’t something patients need to worry a lot about, she said. Even if the de-identified data were to fall into the hands of hackers, those hackers probably would need firsthand knowledge of the original data source to re-identify easily, she said.
The Personal Genome Project is a voluntary program at Harvard Medical School where individuals submit DNA and personal information to help researchers learn about certain health conditions. Sweeney said that by limiting the specificity of data, consumers can reduce their risk of being re-identified. Patients whose data are collected through the routine process of care delivery already would have those identifying bits of information stripped from their records because of HIPAA.
The process of de-identification
Although it’s possible for patient data with identifying information stripped out to be re-identified, it’s difficult to do so, said Dixie Baker, PhD, senior partner of Martin, Blanck and Associates, a health care consulting firm in Alexandria, Va. Even so, physicians and patients should understand de-identification and the importance of its use.
Under the rules, when routinely collected patient data, such as those collected by a physician, are de-identified using the “safe harbor” method, 18 identifiers must be removed. They include birth dates and geographic indicators such as ZIP codes and cities where the population is fewer than 20,000. When insurers or health care organizations use data for research, they must, as a HIPAA-covered entity, follow these guidelines.
There also is an “expert determination method” that relies on the opinion of a technology professional to determine that the data are not in danger of being re-identified.
Baker said that the more transparent health care organizations are about how they use de-identified data, the more comfortable patients will feel. They should understand that “advancements in medical science are dependent on the availability of quality health information,” she said. “So I think it’s in our best interest as individuals to see that doctors have the best information they can get. And it’s in both our personal best interest and society’s best interest that the researchers have access to quality information, as well.”
A discussion paper published April 15 by the Institute of Medicine’s Clinical Effectiveness Research Innovation Collaborative emphasized that gaining access to patient data for research will require public support for the idea and assurances that sharing data is safe, and that physicians need to collect and share data for the public good.
Sweeney set up a website, AboutMyInfo.org, to help consumers determine how easily their identities could be revealed by the same key pieces of information she used in her experiment.