Back To Blog

On Tech Ethics Podcast – Understanding Big Health Data Research’s Unique Issues

Season 1 – Episode 8 – Understanding Big Health Data Research’s Unique Issues

This episode discusses some of the unique ethical, logistical, and regulatory issues associated with big health data research.


Episode Transcript

Click to expand/collapse


Daniel Smith: Welcome to On Tech Ethics with CITI Program. Our guest today is Sara Meeder, who is the Director of the Human Research Protections Office at the Maimonides Medical Center. Sarah’s primary areas of expertise include ethics and regulatory compliance for big data, emerging technologies, social science, and biomedical research, and informed consent. Today we are going to discuss big health data research. Before we get started, I want to quickly note that this podcast is for educational purposes only. It is not designed to provide legal advice or legal guidance. You should consult with your organization’s attorneys if you have questions or concerns about the relevant laws and regulations that may be discussed in this podcast. In addition, views expressed in this podcast are solely those of our guest. And on that note, I want to welcome Sara to the podcast.

Sara Meeder: Hi. Thank you.

Daniel Smith: It’s great to have you. So I gave you just a brief introduction. Can you tell us a bit more about yourself and what you currently focus on?

Sara Meeder: Sure. So I’m at Maimonides Medical Center in Brooklyn, New York, and Maimonides is a healthcare network. We serve in incredibly diverse population, and we have a research profile that runs from simple resident retrospective chart reviews to big health data research and mobile health research. I am in charge of the institutional review board for Maimonides, and what I really kind of focus on right now is helping our administrators, researchers and the public in our community to understand the ethics, regulations, and logistics involved in mobile health research and big health data research.

Daniel Smith: Wonderful. So on that note, since we’re here today to talk about big health data research, can you define that for our audience and can you also share an example of this type of research?

Sara Meeder: Absolutely. So big health data research, it kind of runs the gamut in terms of different types of studies, but big health data research involves the collections of large amounts of data for the purposes of research. The data typically is not generated specific to a study. It can be data collected from say the patients in a specific department in a hospital, or all the data from all the patients in a hospital, or all the patients in all the hospitals in a specific geographic region. It can also involve pulling in information, your kind of phenotypic data. So your information that you generate as a human being, all the bits of you that are out there, social media, your environmental data, data from any of your wearable devices, your internet searches, all that fun stuff can go into databases that are then used for the purposes of research.

And an example of big health data research that a lot of people may have heard of is the National Institute of Health’s all of Us Project. So this is a project where their aim is to enroll 1 million people across the United States into a longitudinal study that involves collecting big health data. They are hoping that they manage to enroll a participant population that matches the diversity of the United States because one of the problems in research as a whole is that enrollment tends not to be very diverse, and so it’s not really representative of large parts of the population. So the All of Us Project collects medical record information, pharmacy data, and genetic testing. They have patient surveys throughout the course of this study, and they also collect information from wearable devices, environmental sensors and all of that type of stuff. So that’s a pretty well-known example of big health data research.

Daniel Smith: So with all those different types of data and the amounts of data that they’re collecting in studies like All of Us, are there some unique ethical issues associated with that type of research?

Sara Meeder: Oh my, yes. So there are actually both ethical issues and logistic issues, and I’ll talk about the ethical issues, but I also want to talk briefly about the logistical issues. Because with any research, but particularly with big health data research, you can look at the ethical issues, you can minimize the risk, but if you don’t have the infrastructure to do the research or you have to make compromises in how the research is set up, that could potentially affect the outcomes, then that’s going to be a problem. With big health data research and mobile health research, it’s really important to think about the ethics, the regulations, and whether you can actually do the research.

So as far as ethical issues are concerned, there are a number of ethical issues. Key among them is that the quality of the research is only as good as the quality of the data that creates the database. So what that means is that when you’re doing these huge studies where you’re collecting data, say from a region’s healthcare networks like you would do if you were doing research using a health information exchange, that data is only good as the healthcare networks that have put that data into the database.

So if you have hospitals where say they have a large part of their data that is actually still on paper, which some hospitals oddly enough have, then you’re losing that part of the data that goes into what makes the full record. And if you have healthcare network that has issues with inaccuracies in their records, then that can cause a problem with when you’re looking at doing a big health data research study. Because if you’re trying to figure out what’s going on with a certain disease or with a certain population in terms of overall health and wellness, and you’re looking at records that are not correct, then you’re not going to get a really good sense of what’s going on. So that’s one issue.

Another issue is that a lot of our records are not representative. So you can be pulling information for these studies and these databases that doesn’t include a large part of the population. Say you have records from a predominantly white middle class population and you’re trying to have research that has findings that would be applicable to non-white, non-middle class populations. It just doesn’t work well.

And the other big issue right now is … Well, there are a couple of issues. One is that we have kind of been going along the path to big health data research. And with a lot of these studies we’ve been thinking about them in terms of identifiability and saying, well, we’ve de-identified the data that goes into this database. So in theory, it’s not even human subject research because with human subject research, you have to have the intervention or interaction that’s systematically designed to contribute to generalizable knowledge, but you also have to have the human beings.

And historically when we have been looking at data that’s stripped of key identifiers like name, date of birth, social security number, that type of thing, we have said, okay, this is not data that contains information about human beings anymore. So it’s not human subject research. But, and this is not necessarily new, but people are just starting to really pay attention, almost everything is identifiable, especially now, especially with big health data research. It only takes, I think four or five days of [inaudible 00:08:28] to re-identify someone based on what they’ve put on their social media. Really everything is re-identifiable. So you can’t say, okay, this is as minimal risk because nobody’s identifiable information is involved.

So that’s kind of on the ethical side of things. And also with big health data research, because so much of this research is trying to look at what’s going on with health and wellness for whole populations and looking at kind of trends and different groups of people, there’s a very real possibility of group harms. So when we think about human subject research and how we think about approving a research study, typically we’re thinking about the individual risk to a participant and kind of countering that with the potential benefit to the participant or the potential benefit to society. Which is kind of a weird thing because in our regulations we are told we can’t think about group terms.

And so we are allowed to think about group benefits when we’re thinking about approving a research study, but we can’t think about if a group involved in the research would be harmed either in the short term or in the longer term. And that’s a real problem with big health data research because there are findings that come out of this research that can have very real harms down the road for different groups and populations. So all of that’s on the ethics side of things.

On the logistics side, it’s really important when you’re thinking about big health data research to consider the quality of your data, and also to consider how your systems operate together. So we can have somebody who proposes a wonderful big health data research study that loves AI and all of the bells and whistles, and they want to use all the data from our healthcare network, but our healthcare network doesn’t have systems that speak to each other in that way. And so there’s no way to have all that data compiled into a data repository for this magnificent research. So you have to be aware of that and aware of what’s going to go into making the research work logistically when you’re thinking about these things.

Daniel Smith: You touched on a few issues that kind of get into the oversight of big health data research. But before we get into that, maybe we could just take a step back a moment and could you provide us with just a brief overview of the current regulatory framework for research involving human participants?

Sara Meeder: Sure. So there are levels of regulation that go into reviewing and approving human subject research. And there are different players that have different rules when it comes to what goes into human subject research. So if you are doing federally funded research, you have to follow federal regulations. If it is something that’s under the purview of the Food and Drug Administration, then you have to follow their regulations. If it’s under the purview of the Office of Human Research Protections, you have to follow their regulations. Their regulations are called the common rule, so it’s 45 CFR 46. And then underneath that you may have state or local regulations that have layering on top of the federal regulations or be in place of the federal regulations. And then you have institutional policy. These are for organizations that have to have institutional review boards or ethics review boards so that those are the ones that are like your academic medical centers, your healthcare networks, that type of thing. We all have to follow those sets of regulations.

Private corporations, social media networks, that type of thing, they don’t have to follow those federal regulations. They do have state policies that they have to follow, but they’re not under the purview of OHRP or sometimes the FDA, and so they can go ahead and choose their own route to how they’re looking at the ethics of research.

Daniel Smith: That’s really helpful. So when you were thinking about all of the unique ethical and logistical issues that you mentioned previously, how do you think big health data research fits into that current regulatory framework? And similar to an earlier question, does it raise any unique issues that we should be thinking about?

Sara Meeder: Up until fairly recently, there was a lot of discussion and sometimes argument about whether big health data research, especially the big health data research with de-identified data even fell under the purview of an institutional review board or required ethical review because it was considered not to involve human beings. And so therefore it didn’t have the same regulatory requirements or even a perceived need for ethical considerations. Because of that, we started seeing in the media, reports of big health data research that were causing harms or that were being done in a matter that wasn’t necessarily ethical.

And so that caused this kind of clash between the people who wanted to have some sort of ethical and regulatory oversight over big health data research and those who really felt like that was stretching the mission of the institutional review boards or the ethical review boards. I think now you can’t really make the argument that any of this is not human subject research because if it comes from human beings and if almost everything is identifiable, you’ve got your human beings and you can’t pretend you don’t. So it really does mean that this is something that needs to go before an institutional review board. So that for me does mean that it pulls squarely within the regulatory and ethical confines that we think about for regular research.

Now, not everybody has to follow the federal regulations. In New York, we do because the state mandates that we follow the common rule regardless of the funding for the research for good or ill. But when that’s the case where you have to follow the federal regulations, it becomes a little bit problematic with big health data research because the federal regulations specifically say that you should not consider possible long-term negative effects when you are reviewing research as an institutional review board. It says that in the regulations and my research on this, it has kind of indicated that the reason for that is that it was felt that you can’t necessarily predict what the long-term outcomes would be in say, a traditional biomedical study where you’re testing a drug.

But the problem with big health data research is that you can sometimes see that there’s going to be potential negative outcomes. You may not know exactly what they are, but if you know that your data may be flawed or if you know that the proposed research is looking at something going on in a specific vulnerable population, then you can pretty much predict that there could be possible long-term negative outcomes. And that has to ethically go into your assessment of whether the study should be approved or if it needs mitigation to minimize the risk of those long-term outcomes.

And for quite some time, that was a comment that would’ve caused huge arguments at any conference I attended, people were very conflicted about it because there are people who read the regulations and they want specifically what the regulations say, they read the should as do not, and say that IRBs can’t look at possible long-term risk to groups or populations. But I just don’t see how we can avoid it now. And it does seem like more and more I’m hearing people say, “Yeah, this is a problem with the common rule, that it’s saying you can’t look at these outcomes. We have to look at these outcomes.” So that’s, I’d say one of the biggest things that’s a regulatory issue at this point for people who have to follow the federal regs with these types of studies.

Daniel Smith: Absolutely. So I guess going off of that a bit, do you have any recommendations for IRB folks as they kind of navigate that issue?

Sara Meeder: Well, yes, navigate it, don’t look away from it. I think one of the things that has always been discussed as a potential issue with having IRBs really looking at these types of studies is the possibility that the IRBs don’t have the expertise to do that. So I’d say one of my biggest recommendations for IRBs is if you feel like you don’t have the expertise on your committee to look at these types of studies, reach out to people you know are experts and ask them if they can help or if they can recommend tools so that you can teach your committee members and learn how to look at these types of studies. It’s complex, but so is cancer research, and we have to learn that so we just have to learn this too.

There are a lot of resources out there. I mean, you could just go ahead and search the internet on big health data research. You’re going to get all sorts of articles that talk about the potential ethical issues and all of that, and also how IRBs could start thinking about these things. And then just kind of review it the same way you would normal research. If there’s a risk, think about it, think about how bad that possible outcome could be if that risk does happen, and act accordingly.

Daniel Smith: Certainly, I think that’s very helpful advice. And among those resources out there, are there any specific ones that you recommend that our listeners could check out?

Sara Meeder: Well, CITI has modules on wearable health research, big health data research, I believe. So that’s a good place to go for that. And there are different groups out there who are looking specifically at digital health, which includes this realm. So the Digital Medicine Society, groups like that. There’s also some of the other large groups, Public Responsibility in Medicine and Research to have resources, and there are resource toolkits that can help with review of big health data and also artificial intelligence studies. So there’s a lot of information out there that I think is very useful. And just reading as much as you can get your hands on really.

Daniel Smith: Thank you, Sara, and I will definitely include links to some of those resources in our show notes so our listeners can learn more. And on that note, Sara, do you have any final thoughts you would like to share that we did not already touch on?

Sara Meeder: Ooh, yes. I just wanted to address this. This is something I’ve been thinking about for quite some time in terms of big health data research. For as long as I’ve been in this field, people have referred to the regulatory and ethical landscape for big health data research as the Wild West. And I started in this field a while ago now. We’re talking years. I think it’s time that we start reaching for something better and try and find a way to refer to the regulatory and ethical landscape that doesn’t involve the Wild West. We can do better. We actually are starting to look at these regulations. We understand more of the ethics. Let’s find a different term. I would say that’s my biggest thing right now.

Daniel Smith: I think that’s great, and I think that’s a great place to leave our conversation for today. So thank you again, Sara, and thanks to all our listeners for tuning in for today’s conversation. As Sara mentioned, we have quite a few resources out there related to this conversation that you may find interesting. So I invite you to visit to learn more about our courses that address big data research ethics. For instance, we have a new course on big data and data science research ethics, a course on tech and ethics that discusses wearable technology, AI, robotics, biometrics, and other technologies in research and healthcare. And we also have content on human subjects considerations for big data research. And with that, I want to thank Sara again, and I look forward to bringing you all more conversations on all things tech ethics.


How to Listen and Subscribe to the Podcast

You can find On Tech Ethics with CITI Program available from several of the most popular podcast services. Subscribe on your favorite platform to receive updates when episodes are newly released. You can also subscribe to this podcast, by pasting “” into your your podcast apps.

apple podcast logo spotify podcast logo amazon podcast logo

Recent Episodes


Meet the Guest

content contributor sara meeder

Sara Meeder, CIP – Maimonides Medical Center

Sara Meeder is the Director of Human Research Protections at Maimonides Medical Center in Brooklyn, New York. She specializes in the intersection between research ethics, regulations, infrastructure, and wearable technology.


Meet the Host

Team Member Daniel Smith

Daniel Smith, Associate Director of Content and Education and Host of On Tech Ethics Podcast – CITI Program

As Associate Director of Content and Education at CITI Program, Daniel focuses on developing educational content in areas such as the responsible use of technologies, humane care and use of animals, and environmental health and safety. He received a BA in journalism and technical communication from Colorado State University.