I wanted to create an accessible, community-focused forum for effective communication about what makes, or contributes to, “ethical practice of statistics and data science.”

Rochelle Tractenberg

Technology is not my “thang” but communication kind of is. I am a researcher but really spend most of my time writing—about biomedical science; teaching and learning in higher education; measurement considerations; and ethical practice in research, statistics, and data science. I wanted to create an accessible, community-focused forum for effective communication about what makes, or contributes to, “ethical practice of statistics and data science.” It has to be accessible in all dimensions: free, sharable (FAIR), understandable, and applicable to daily life/work. But it also has to be accessible to people who don’t necessarily see themselves as “statisticians” or “data scientists.”

I have argued for a decade that everyone who uses statistics or data science (or data!) has obligations to follow ethical guidelines for statistical practice—those of the American Statistical Association, Association for Computing Machinery, International Statistical Institute, Royal Statistical Society, or other professional body.

The ethical practice standards these organizations have articulated are intended for all practitioners, but just reading—or being aware of—the ethical practice standards is not enough to support ethical practice. All training or instruction in and around statistics and data science can be bolstered with support for ethical practice, whether initiated by the instructor or the self-directed learner. However, this could be challenging for someone who is starting from scratch.

Why a Blog About Ethical Practice?

Depending on your role, or the role of statistics and data science in your work or life, the idea of practicing ethically will likely be highly variable. My hope is this blog will help raise awareness of the ethical obligations everyone has whenever they are using statistics, data science, and/or data. Not only that, but I hope this blog and site will incite conversations about ethical practice—helping to normalize people’s interest in it and ability to find useful resources to help promote it.

Right now (July 2021), it seems people say, “there should be more ethical statistics” or “there should be more ethical data science,” but the next steps—how to be an ethical practitioner and what resources are useful for promoting ethical practice, for example—are hard to find.

Ethical guidelines have been developed over several decades to support ethical professional practice with and application of tools, techniques, and methods from both statistics (American Statistical Association, ASA, 2021) and computing (Association of Computing Machinery, ACM, 2018). I hope to do the following:

  • Find and share useful resources (like these ethical practice standards)
  • Find and share help for integrating these into courses or training
  • Enable all instructors who teach researchers to use statistics and data science to also teach how to use this disciplinary knowledge ethically
  • Encourage any self-directed learner to access material on the ethical practice of statistics and data science

Professional ethical standards that can be leveraged to inform what constitutes ethical data science are not rules, but guidance. The ability to explain/justify why one or another ethical guideline or principle pertains to a situation or should be prioritized over another is the minimum required for engaging in “ethical data science practice.”

Similar complexity arises from assessing the potential harms and benefits for stakeholders—especially when these conflict. Thus, to be useful/effective, the application of these practice standards requires judgement, and there must be training for that judgment to develop. Experience on the job does not suffice.

Why Me?

I have been a practicing biostatistician working on federally funded biomedical research since 1997. I earned PhDs in cognitive sciences (1997) and statistics, measurement, and evaluation (2009) and have been training university-level instructors and providing faculty development around teaching and learning since 1993. I am actively engaged in research in basic science and rehabilitation medicine and have published consistently in ethical research training and ethical practice in biomedical research since 2012.

Throughout my 19 years at Georgetown University, I have developed and taught courses in ethical reasoning for scientists and ethical practice of statistics and data science workshops for national and international researchers. I served as the vice chair and chair of the American Statistical Association Committee on Professional Ethics and chaired or co-chaired the working groups for revising the ASA Ethical Guidelines for Statistical Practice in each of the three revision projects (2014–2016; 2018; 2021). Additionally, I have written two books about ethical reasoning with the ASA and ACM ethical practice standards.

You shouldn’t have to start from scratch if I can help at all!

Rochelle Tractenberg

These experiences have strengthened my commitment to both promoting the ethical practice of statistics and data science and sharing instructional and mentoring resources that can be used to effectively integrate ethical training into any course; program of courses; or curriculum where statistics, data science, or data are contemplated. You shouldn’t have to start from scratch if I can help at all!

More importantly, the possibly daunting task of finding suitable material/instructional approaches should not limit your interest in, or willingness to integrate, ethical training into your statistics or data science teaching.

Why Now?

In August 2021, the world is in the process of returning to “normal” after months or years of COVID-19–enforced isolation. Instructors across the world are reviewing their teaching and materials as they consider returning to face-to-face teaching, and workplaces are adjusting to diverse models of remote work and other strategies that could create new opportunities for teams (and supervisors) to engage in meaningful conversations about the ethical dimension of their work. In addition, the isolation has led many people to depend more on computing, generating massive amounts of data that could potentially be obtained and analyzed inappropriately, unethically, or even illegally.

Beyond the inappropriate, unethical, and illegal data mining happening all over the world, many governments and businesses/employers are initiating data-collection efforts to ensure people are being safe and can be contacted for epidemiologic and surveillance purposes.

Against this backdrop is the multitude of colleges, universities, and other organizations preparing to offer or launching new training programs in data science. The National Academies of Sciences, Engineering, and Medicine recommended ethics be incorporated into undergraduate curricula in data science, but they did not discuss how this could or should be accomplished.

There are many reasons why people are interested in the ethical practice of statistics and data science. This blog is my attempt to support these people!