Clinical researchers need to share data to support scientific validation and information reuse, and to comply with a host of regulations and directives from funders. Various organizations are constructing informatics resources in the form of centralized databases to ensure widespread availability of data derived from sponsored research. The widespread use of such open databases is contingent on the protection of patient privacy.
In this paper, we review several aspects of the privacy-related problems associated with data sharing for clinical research from technical and policy perspectives. We begin with a review of existing policies for secondary data sharing and privacy requirements in the context of data derived from research and clinical settings. In particular, we focus on policies specified by the U.S. National Institutes of Health and the Health Insurance Portability and Accountability Act and touch upon how these policies are related to current, as well as future, use of data stored in public database archives.
Next, we address aspects of data privacy and “identifiability” from a more technical perspective, and review how biomedical databanks can be exploited and seemingly anonymous records can be “re-identified” using various resources without compromising or hacking into secure computer systems. We highlight which data features specified in clinical research data models are potentially vulnerable or exploitable. In the process, we recount a recent privacy-related concern associated with the publication of aggregate statistics from pooled genome-wide association studies that has had a significant impact on the data sharing policies of NIH-sponsored databanks.
Finally, we conclude with a list of recommendations that cover various technical, legal, and policy mechanisms that open clinical databases can adopt to strengthen data privacy protections as they move toward wider deployment and adoption.
Link to access full article: