Clinical Data Is an Asset. Are AMCs Ready to Steward It?

Clinical Data Is an Asset. Are AMCs Ready to Steward It?

By Monica Jang

The clinical data held by U.S. academic medical centers (AMCs) has quietly but verifiably become one of the most consequential assets in medicine and science. De-identified electronic health records, imaging archives, biobank genomic datasets, longitudinal patient cohorts — data of this depth is neither easily sourced nor synthesized. Its value is not only commercial. It is the raw material from which the next generation of diagnostics, therapies, learning health systems, and equity-focused population health work will be built. It is also, increasingly, in commercial demand: the global real-world evidence solutions market reached USD 4.74 billion in 2024 and is projected to exceed USD 10.8 billion by 2030.1

Recent regulatory developments have accelerated that demand. In December 2025, the Food and Drug Administration finalized guidance permitting the use of de-identified, aggregate real-world data in medical device submissions without always requiring patient-level data, and signaled that similar flexibility for drugs and biologics is being considered.2 Generative AI has sharpened the picture further: clinical foundation models require the longitudinally complete, high-quality datasets that only large academic health systems can realistically supply.3 The institutions holding this data are being courted by industry sponsors, technology companies, and data brokers at a pace that did not exist two years ago.

The question is how to steward this asset in a way that advances the science, honors the consent of the patients whose data it is, and preserves institutional integrity. Finding the structural answer is complex but necessary. Patient authorization — through informed consent, IRB-approved protocols, or the narrow framework HIPAA provides for de-identified data — is the foundation on which any responsible model must be built, not a box to be checked after the fact. The consequences of getting that foundation wrong are foreseeable: erosion of patient trust, regulatory exposure, and the quiet transfer of scientific and commercial value from the institutions generating the data to the intermediaries aggregating it.

Much of the current framing treats clinical data licensing as an extension of technology transfer. The structural parallels are real — defined grants of rights, field-of-use restrictions, exclusivity provisions, royalty and subscription structures — but the analogy breaks in three important ways.

First, the legal foundation is different. Patents and copyrights are statutory creations, with centuries of jurisprudence enforcing defined exclusionary rights. Clinical data, by contrast, enjoys no analogous statutory property right in U.S. law. Raw data is not copyrightable, there is no patent on a dataset as such, and trade secret protection is narrow and contested in the licensing context.4 What looks like a property transaction is in substance contractual, and its enforceability depends entirely on the quality of the agreement rather than on a background regime of property rights.

Second, the asset is not the product of an inventor’s labor. Clinical data is generated through the care of patients. That asymmetry introduces an ethical dimension with no direct parallel in traditional intellectual property transactions. The patient is not an inventor who has assigned rights; the patient is a person who sought medical care, and whose data was created as a byproduct. Respect for that relationship — operationalized through consent, de-identification, and purpose limitation — is not an obstacle to licensing but the precondition for doing it well.

Third, ownership itself is contested. AMCs control access to clinical data, but whether they own that data in any sense that cleanly supports a licensing transaction remains unsettled.5 IP licensing rests on holding title. Clinical data licensing rests on something softer — stewardship, custody, fiduciary obligation — and that difference should change how agreements are structured.

Structural differences aside, the practical consequences of getting clinical data licensing wrong are concrete and avoidable. Three risks in particular warrant close attention from general counsel in academic medicine.

The first is patient trust. Even where data has been appropriately consented and de-identified, public perception of commercial data flows can shape how patients engage with the health system. Surveys consistently find that large majorities of patients want clearer information about how their health data is used, and are uncomfortable when they learn of commercial uses only after the fact.6 When trust erodes, patients may withhold clinically important information, avoid care, or opt out of health information exchanges — outcomes that not only compromise care but also degrade the completeness of the very datasets institutions seek to license. Transparent, proactive communication about how data is used is therefore not a public-relations exercise; it is part of the stewardship obligation itself.

The second is re-identification. Seminal research by Latanya Sweeney demonstrated that ZIP code, date of birth, and sex uniquely identify a majority of the U.S. population.7 More recent work has shown that even data redacted to the HIPAA Safe Harbor standard can be re-identified at meaningful rates when linked with external sources such as newspaper accounts — 3.2 percent in Maine, 10.6 percent in Vermont.8 A re-identification event in a licensed dataset would expose the institution to civil and criminal liability under 42 U.S.C. § 1320d-6, class litigation, and reputational damage that could take a generation to repair.9

The third is regulatory volatility. The HIPAA de-identification framework that underpins most current licensing practice was designed in a very different technological era. State laws are moving quickly to fill the gap. Washington’s My Health My Data Act took effect in March 2024; comparable statutes are now in force in Nevada and Connecticut, and similar proposals are pending in additional states.10

The answer to these risks cannot be avoidance. The market is real, the public-health benefits of well-governed real-world evidence generation are substantial,11 and the scientific case is even larger than the financial one: responsibly shared clinical data is how the field will generate evidence on rare diseases, underrepresented populations, and questions that randomized trials cannot answer alone. The answer is to build the governance infrastructure the practice requires before scaling the practice itself.

Some institutions have already begun this work, and their experience can shed light on what a functional framework looks like. It has five recognizable elements. The first is an institutional data governance committee that reviews commercial use against ethical, legal, and strategic criteria — modeled on an institutional review board but distinct in mandate, with explicit authority to verify that the consent and authorization basis for each proposed use is sound. The second is a standardized set of data use agreements that clearly define permitted and prohibited uses, rather than ad hoc negotiations that import inconsistent terms across deals. The third is a formal data access process adapted from the genomic data sharing community, which has spent two decades working out controlled access at scale.12 The fourth is layered technical safeguards, with de-identification protocols re-evaluated against current re-identification techniques and audit trails for every access event. The fifth is patient engagement that moves beyond the legal minimum toward genuine partnership — clear notice, meaningful channels for questions, and, where appropriate, broad consent frameworks that give patients real choice.

What makes this work genuinely hard is that it must cut across functions that are usually siloed — a privacy office here, a tech transfer office there, a research compliance committee somewhere else.13 The institutions that do this well have invested in cross-cutting governance bodies with real decision-making authority. That investment is non-negotiable where the stakes include patient privacy, institutional reputation, and research integrity.

The data exists. The market demands it. Science needs it. The question is whether this asset will be stewarded with the rigor the field brings to its other defining responsibilities, or whether institutions will discover too late that they treated a fiduciary obligation as a transaction.

Disclosure: Monica Jang, JD, CLP is Associate Director of AI Innovation and Data Strategy at Boston Children’s Hospital, where she works on licensing, data use, collaboration, consortium, and sponsored research agreements. The views expressed are the author’s alone and do not represent the positions of her employer. The author reports no financial conflicts of interest.

Endnotes

1. Fortune Business Insights. Real-world evidence solutions market size, share & industry analysis. 2025. Available from: https://www.fortunebusinessinsights.com/real-world-evidence-solutions-market-104252

2. Food and Drug Administration. Use of real-world evidence to support regulatory decision-making for medical devices: guidance for industry and Food and Drug Administration staff. Silver Spring (MD): FDA; 2025 Dec 18. Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/use-real-world-evidence-support-regulatory-decision-making-medical-devices. See also: FDA eliminates major barrier to using real-world evidence in drug and device application reviews [press release]. 2025 Dec 15.

3. Wornow M, Xu Y, Thapa R, Patel B, Steinberg E, Fleming S, et al. The shaky foundations of large language models and foundation models for electronic health records. npj Digit Med. 2023;6:135. doi:10.1038/s41746-023-00879-8.

4. Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991) (holding that factual compilations lacking originality are not copyrightable). See generally Contreras JL, Reichman JH. Sharing by design: data and decentralized commons. Science. 2015;350(6266):1312–4.

5. Minssen T, Rajam N, Bogers M. Towards a paradigm shift in governing data access and related intellectual property rights in big data and health-related research. IIC Int Rev Intellect Prop Compet Law. 2019;50:1069–73. See also Piasecki J, Cheah PY. Ownership of individual-level health data, data sharing, and data governance. BMC Med Ethics. 2022;23:104.

6. Bioethics Today. Continued erosion of patient trust in electronic health records [Internet]. 2023. Available from: https://bioethicstoday.org/blog/continued-erosion-of-patient-trust-in-electronic-health-records/

7. Sweeney L. Simple demographics often identify people uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh (PA): Carnegie Mellon University; 2000.

8. Yoo JS, Thaler A, Sweeney L, Zang J. Risks to patient privacy: a re-identification of patients in Maine and Vermont statewide hospital data. Technology Science. 2018 Oct 8;2018100901.

9. Health Insurance Portability and Accountability Act, 42 U.S.C. § 1320d-6 (2024). See also U.S. Department of Health and Human Services. Guidance regarding methods for de-identification of protected health information in accordance with the HIPAA Privacy Rule. Washington (DC): HHS; 2012, updated 2022. Available from: https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/

10. Washington My Health My Data Act, RCW ch. 19.373 (effective 2024 Mar 31); Nevada Consumer Health Data Privacy Law, SB 370 (2023); Connecticut Data Privacy Act, as amended by SB 3 (2023).

11. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-world evidence—what is it and what can it tell us? N Engl J Med. 2016;375(23):2293–7.

12. Shabani M, Dyke SOM, Joly Y, Borry P. Controlled access under review: improving the governance of genomic data access. PLoS Biol. 2015;13(12):e1002339. See also Global Alliance for Genomics and Health. Framework for responsible sharing of genomic and health-related data. Toronto: GA4GH; 2019.

13. Pellegrini VD, Guzick DS, Wilson DE, Evarts CM. Governance of academic health centers and systems: a conceptual framework for analysis. Acad Med. 2019;94(4):498–504.