A course using this book could (therefore) be useful for both the initiation of professional identity development and inculcation of ethical practice habits in statistics and data science, as well as satisfying current U.S. federal requirements for training in research ethics.
Rochelle Tractenberg
Ethical Reasoning for a Data-Centered World features a six-step paradigm for ethical reasoning and focuses on what people do and the choices they makeโwhether or not they realize/recognize/acknowledge themโduring practice with data. This is true for statisticians, data scientists, people doing AI/machine learning and algorithm building, and those who just use these techniques and technologies in the course of their other work. Moreover, the book shows how to reason and practice ethically with professional practice standards from the Association of Computing Machinery and American Statistical Association. The reasoning paradigm can be applied to any practice standard, regulation, or law, but the ASA and ACM guidelines are specifically suited to helping any practitioner at any level promote a more ethical data-centered world.
This book could be used in a philosophy department (as applied ethics); in departments or programs in statistics or data science; and in math, computer science, education, or any other disciplineโs courses encouraging or teaching students how to use data, statistics, and/or computational statistical methods in any way. Ethical reasoning is a universally useful component of critical thinking and other โgeneral educationโ priorities across college and university curricula world-wide.
Rationale and Scope
According to the National Academies of Sciences, Engineering, and Medicine, โdata science spans a broad(er) array of activities that involve applying principles for data collection, storage, integration, analysis, inference, communication, and ethics.โ
With respect to โethics,โ the Committee on Envisioning the Data Science Discipline โโฆ underscores the centrality of studying the many ethical considerations that arise as workers engage in data science. These considerations include deciding what data to collect, obtaining permissions to use data, crediting the sources of data properly, validating the dataโs accuracy, taking steps to minimize bias, safeguarding the privacy of individuals referenced in the data, and using the data correctly and without alteration.โ The committee maintains it is important for students to recognize ethical issues and apply a high ethical standard.
This echoes the 2013 undergraduate guidelines for computer science curricula: โCurricula must prepare students for lifelong learning and must include professional practice (e.g., communication skills, teamwork, ethics) as components of the undergraduate experience.โ
Two of the 2018 NAS action items follow directly from the committeeโs focus on the centrality of ethical professional practice in data science:
- Recommendation 2.4: Ethics is a topic that, given the nature of data science, students should learn and practice throughout their education. Academic institutions should ensure that ethics is woven into the data science curriculum from the beginning and throughout.
- Recommendation 2.5: The data science community should adopt a code of ethics; such a code should be affirmed by members of professional societies, included in professional development programs and curricula, and conveyed through educational programs. The code should be reevaluated often in light of new developments.
Data science is a discipline that has emerged at the intersection of computing and statisticsโtwo disciplines with long-standing guidance for ethical practice that feature professional integrity and responsibility. The American Statistical Association and Association of Computing Machinery recently revised their professional ethical practice standards (ACM in 2018; ASA in 2022). Both sets of guidance represent the perspectives of experienced professionals in their respective domains, but both organizations explicitly state the guidelines apply to and should be used by all who employ the domain in their work, irrespective of job title or training/professional preparation. Given that both statistics and computing are essential foundations for data science, their ethical guidance should be a starting point for the community as it contemplates what โethical data scienceโ looks like.
Ethical Reasoning for a Data-Centered World offers an alternative/solution. Organized around the following seven easily recognized tasks, the book prioritizes ethical and professional behaviorโfeaturing the ASA and ACM practice standardsโwhen using statistics and computing, rather than the practice-neutral concepts on traditional topics lists for โtraining in responsible conduct of researchโ:
- Planning/Designing
- Data Collection/Munging/Wrangling
- Analysis (Perform or Program to Perform)
- Interpretation
- Reporting Your Results/Communication
- Documenting Your Work
- Engaging in Team Science/Team Work
This book and its companion, Ethical Practice of Statistics and Data Science, are the first books to focus on how to use the ASA and ACM practice standards, although ethical reasoning is independent of these guideline documents. The books are the first and only to integrate ethical reasoning, ideas of ethical practice, and a sense of responsibility for the application of the tools relating to data (e.g., statistics, data science, computational statistics, business analytics, etc.).
Whether the reader wants to demonstrate their engagement with these practice standards; learn how to be ethical in their practice of statistics, data science, or computing; or teach the standards to undergraduates, graduates, or mentees, this book supports all these goals.
Ethical Reasoning for a Data-Centered World offers a six-step ethical reasoning paradigm at its core, featured in Sections 1 and 3. However, โethical reasoningโ is a process for identifying an ethical problem and responding to it, which is a traditional approach to training in ethics or โresponsible conduct of research.โ By contrast, professional practice standards exist to enable ethical behavior on a daily basis.
Training, especially in the United States paradigm for โresponsible conduct of research,โ typically does not provide guidance in โhow to do your statistics/data science ethically.โ This book features that prominently through the focus, in Section 2, on how to use the ASA and ACM ethical practice standards in the seven daily tasks any statistician and/or data scientist might perform. There is some attention to the full six-step ethical reasoning process, with one case per task following all six steps (Section 3).
This book includes suggestions for alignment of ethical reasoning and the seven tasks with key topics required for training in responsible conduct of research (in the United States) and discussion questions tying the seven tasks to these topics. A course using this book could therefore be useful for both the initiation of professional identity development and inculcation of ethical practice habits in statistics and data science, as well as satisfying current U.S. federal requirements for training in research ethics.
Suggestions for integrating this material can be found in “Ten Simple Rules for Integrating Ethics into Statistics and Data Science Instruction, 2E.โ
Table of Contents
List of Cases | ix |
Introduction | xi |
Section 1. Background and Introduction | |
Chapter 1.1 Introducing Frameworks for Ethical Practice in Statistics and Data Science | 1 |
Chapter 1.2 Ethical Reasoning – Learnable, Improvable Knowledge, Skills, and Abilities (KSAs) | 17 |
Chapter 1.3 ASA Ethical Guidelines for Statistical Practice | 22 |
Chapter 1.4 The ACM Code of Ethics and Professional Conduct | 48 |
Chapter 1.5 Prerequisite Knowledge That Is Common to Both ACM CE and ASA GLs | 69 |
Chapter 1.6 Stakeholder Analysis and the Utilitarian Decision-Making Framework | 87 |
Chapter 1.7 Returning to Ethical Reasoning: Summarizing KSAs 1&2 | 97 |
Chapter 1.8 Aligning Prior NIH/NSF Training to Promote Ethical Quantitative Practice | 106 |
Chapter 1.9 Identify or Recognize the Ethical Issue: KSA 3 | 125 |
Chapter 1.10 Identify Alternative Actions (on the Ethical Issue ): KSA 4 | 135 |
Chapter 1.11 Make and Justify Decision and Reflect on That Decision: (KSAs 5-6) | 147 |
Section 2: Establishing Familiarity with ASA and ACM Principles/Elements as They Relate to the Seven Tasks of the Statistics and Data Science Pipeline: Anticipating What Problems May Arise | |
Chapter 2.1 Introduction to Section 2 | 163 |
Chapter 2.2 Planning/Designing | 165 |
Chapter 2.3 Data Collection/Munging/Wrangling | 178 |
Chapter 2.4 Analysis (Perform or Program to Perform) | 193 |
Chapter 2.5 Interpretation | 210 |
Chapter 2.6 Documenting Your Work | 224 |
Chapter 2.7 Reporting Your Results/Communication | 240 |
Chapter 2.8 Engaging in Team Science/Work | 255 |
Chapter 2.9 Summary of ASA and ACM Guidance on Six Tasks Plus Teamwork | 277 |
Section 3: Ethical Reasoning Using ASA and ACM Principles/Elements: Case Vignettes | |
Chapter 3.1 Introduction to Section 3 | 281 |
Chapter 3.2 Planning/Designing | 288 |
Chapter 3.3 Data Collection/Munging/Wrangling | 298 |
Chapter 3.4 Analysis (Perform or Program to Perform) | 310 |
Chapter 3.5 Interpretation | 321 |
Chapter 3.6 Documenting Your Work | 331 |
Chapter 3.7 Reporting Your Results/Communication | 340 |
Chapter 3.8 Engaging in Team Science/Work | 355 |
Chapter 3.9 Embracing Your Inner Ethical Practitioner: Engaging in Open Conversations | 367 |
Chapter 3.10 Summary of Section 3 and the Book: Career-Spanning Engagement in Professional and Ethical Practice | 370 |
References | 378 |