Ethics in Data Science

Ethics in data science is a complex and ever-evolving discussion. This section’s goal is to start you thinking about ethics in relation to data science. Not to provide a set of rules or guidelines that must be followed on any data science project. As with any topic, the Data Mine team encourages you to research further on your own and to learn more about ethics in data science.

Ethics for Data Security

Often in a corporate setting the first conversation around ethics in data science is in regard to data security. This is usually closely aligned with a company’s data governance policies. Evolving and strict laws, such as Europe’s General Data Protection Regulation (GDPR), have brought data security to the forefront for many corporations.

The penalties associated with violating these laws often come in percentages of the company’s income. Meaning that they can run into the hundreds of millions or billions of dollars. More importantly they are designed to keep users safe and to ensure that they know how their data is being used.

As a data practitioner it is your job to think about how the data is being used, where it is being stored, and how it should be removed if the need arises. Like security from viruses, everyone is involved in securing user data. This is even more important for data practitioners as you may have access to data that an average user would not be allowed.

It is most important to follow the security policies of the company that you work for. However, it can also be helpful to ask yourself some of the questions below:

  • Could this data be easily accessed by a 3rd party that I don’t know? (Roommate, coffee shop stranger, etc.)

  • Is the data personal? (Address, Name, Heath Conditions, Income, etc.)

    • If it is, can I ensure that I can remove all traces of it when needed?

    • Is there a way to anonymize the data?

  • Are the intended uses of the data documented?

Ethics for User Perception

Data practitioners live in an interesting area. Analytics is already integrated in many aspects of human life. Product suggestions, driving directions, and food in stores are all largely run by analytics. At the same time the public is generally distrustful of analytics (AI) and weary about new applications.

A common saying in the analytics world is don’t be creepy. While the original intention of this statement was good the perspective around it has changed a bit. We’ll discuss it in a larger context in the next section. However, it is important to think about how your customer base will perceive your analytics before you build them.

One common industry example is the idea of modeling for employee retention. These models use different factors from the company and general workforce to attempt to predict if someone is going to leave the company. Often these conversations start with good intentions. If it’s known why employees leave, then we can address those issues and help improve employee happiness. However as soon as someone looks at the same prediction and says "I need to make cuts and they show as likely to leave" or "I would hire them, but they show as a high probability of leaving soon" the ethics get much more unclear.

It’s important to think through these different situations and identify how they align with you as a practitioner. There are no single solutions, but it’s our job to consider the options as part of our work.

Ethics for Society

As the conversation around ethics in data science continues to evolve the discussion has started to focus on even larger systemic issues. Building on the section above the question has shifted from "is this creepy" to "does this cause harm". In this case harm has no clear definition. It could be physical, mental, emotional, spiritual, or other.

In addition, predictive models are at their core propagators of patterns. They attempt to learn past patterns based on different criteria and then make predictions based on those patterns and the criteria that they are fed in the future. If a pattern contains systemic injustice, bias, or racism (often more than 1) the model will not make decisions on if or how those patterns should be propagated. When combined with the decision making power and fast automation that models often have these can be powerful propagators of extraordinarily detrimental trends.

The Data Mine team are not experts in this area of research. However, we feel that it is an important point to discuss as you build skills focused on analytics. These discussions have far reaching impacts in areas such as law enforcement, healthcare, education, and many others. We encourage you to learn more and share in discussions on these important topics.