Online "data scraping" sparks debate about patient privacy
■ After a company collected information from a health website where intimate details of illnesses are shared, a question arises: How much confidentiality can users expect?
By Pamela Lewis Dolan — Posted Oct. 25, 2010
A recent "data scraping" incident involving a patient advocacy site ignited a discussion about the value of patient data and efforts to protect it.
Social networking health site PatientsLikeMe became aware in May that the Nielsen Co. was using its automated data collection tool, BuzzMetrics, to obtain data on its community members who often share intimate details of their illnesses, treatments and medication history online. Nielsen's automated system, which scrolls websites looking for specific keywords and topics, is used to monitor online "buzz" on services and products for clients, which include major drug manufacturers. Patients must create a unique login to participate in discussions on the PatientsLikeMe website, and Nielsen's system was creating member accounts to gain access to the discussion boards.
After The Wall Street Journal published an article about data scraping in which members of PatientsLikeMe were quoted as feeling violated after they discovered what Nielsen was doing, the company announced that it was no longer scraping sites for which a login is needed without the website operators' permission.
The incident prompted a larger discussion about two major themes: the importance of transparency and the importance and value of patient data -- such as what is contained on PatientsLikeMe -- to researchers who use it to develop new products, medicines and support tools for patients.
Patients felt their privacy had been violated, but for executives at PatientsLikeMe, the problem wasn't the practice of using deidentified data from the website. In fact, PatientsLikeMe has several partners to whom it sells deidentified data and is very open about that fact. The issue, the site claimed, was that Nielsen violated PatientsLikeMe's user agreement, which prohibits data scraping tools.
Ben Heywood, co-founder and president of PatientsLikeMe, said when a company like Nielsen takes data from a site such as his, there are no protections in place as there are between PatientsLikeMe and the vendors with whom it does business.
"Our patients know us as a company, they know our values, they put their trust in us that we will use the data responsibly and ethically, including choosing our partners," Heywood said. "They don't know that of any other random scraper, whether it's Nielsen or someone else."
Nielsen spokesman Matt Anchin said data scraping is high-level data collection that is not focused on individuals or groups. It is much more broad and is meant to analyze general attitudes toward particular products and services. Nielsen's automated systems scrape 130 million blogs, more than 8,000 message boards and forums and 40,000 usenet groups. It also scrapes Twitter and other social media sites.
Heywood hopes the recent developments and subsequent awareness it created about data use don't dissuade patients from using the sites and posting information. The incident gave him the opportunity to describe to members how the company tries to protect patients' privacy. For example, Nielsen's scraping activity was detected by a system PatientsLikeMe has in place to flag suspicious activity, including views of too many profiles or posts over a short period of time.
Frydman discourages ACOR members from disclosing personal information they don't want to be made public on the mailing lists, but encourages the use of real names and information on the message boards.
"The conversations are of a much higher quality when people talk to real people," he said.
ACOR also has high-tech systems that help monitor and prevent data scraping. It does not sell patient data, however.
Tena Friery, a staff member at the Privacy Rights Clearinghouse in San Diego, said patients still can participate in online groups without disclosing identifiable information. She said it's important that patients read each site's privacy policies before participating.
Nielsen said that although it stopped scraping, without permission, sites that require a login, it is actively pursuing data acquisition arrangements with a number of those sites. It has established several relationships and will continue to do so, according to the statement.
Friery said patients always should assume that "information gathered for one purpose will ultimately be used for another."