6 minute read

While Data Science, like Software Engineering, is among the professions that are very friendly to remote working, face-to-face interaction is still highly valued as a way to exchange ideas. As the COVID-19 pandemic pushes us to transition to being fully remote, it is crucial that our productivity remain unaffected.

One way for me to maintain my Data Science team’s pre-pandemic productivity level is to improve the workflow of tasks that require collaboration between different team members. In searching for inspirations online, I could not find any article that fits. In fact, the article that I find the most helpful was about 7 Guidelines for Delegating Tasks to Employees.

However, it is still only a partial fit. So I decided to put my thoughts into a short article as a reference for my future self and anyone who would find this useful. This article will provide a simple, yet effective, framework for task delegation in a data science team, as well as examples to help you put it into your day-to-day workflow.

Why delegation in a data team?

First of all, why do we need to delegate in the first place? For many data scientists, delegation is something they are not familiar with since, from their perspective, most of their work is just too complicated or domain-specific that it is just faster to do it themselves instead of taking time to explain it to someone else. Similarly, data analysts often think the analysis will just be “a few lines of code away”, and end up doing all the work themselves. While that is true for some types of tasks in some specific contexts, most of the time it simply is not true. In an industry environment, unlike in academia, data scientists and data analysts do not have months to build and tune a machine learning model or to produce an analysis. The value of the output is often tied to whether it is timely delivered or not. In addition, there are data products that require different skill sets to EDA, build models, deploy to production, and present the result to stakeholders. Thus, teamwork, and more specifically, delegation is becoming more and more important to the success of a data science team.

Who delegates to whom in a data team?

When delegation is mentioned in a traditional working environment, people often think about the supervisor delegating a task to his/her subordinate. However, in a product team, where the creativity of the solution and the time to production are valued as much as getting the job done itself, the traditional view no longer holds. The “delegator” is not necessarily the team leader: it is perfectly normal for team members to delegate tasks to each other to optimize the resources of the whole team. In the same way, the “delegatee” could be anyone who has the resources and suitable skill set for the task. For example, if you are a new data scientist, your team needs to build a prototype of the model quickly, but you are not very familiar with the data yet. In that case, delegating a part of the EDA task and even the entire pre-processing pipeline to an experienced data analyst would be a great way to get the job done on time and also help you hit the ground running.

What to include in a delegation?

Finally, we get to the important question! Most of the time, delegation fails due to the insufficient communication of the team members. When you delegate a task to somebody else, it is important that you give that person enough information to accomplish it. That means you have spent time digging for the requirements, thinking about what needs to be done, what you will do and how your co-worker can help.

Generally, you should provide the following:

  • Context: Why is this task important? It is easy to provide with a “As a … I want … So that …” statement (similar to writing a User story).
  • Objectives: What does this task want to achieve? Usually there are more than one objective, so it is important that the delegatee know.
  • Methodology (suggestion): Your ideas, if any, of how the task could be conducted.
  • Expected output: What is the definition of done for the task? What form should the result take?
  • Timeline: When will the task need to be partially or completely finished? This should be based on other parts of the bigger picture that your team is building. It is also pivotal for you to check in regularly with your co-worker along those milestones to make sure things are moving in the right direction.

Besides from the timeline, which is universal to any type of collaboration, other items could be described more specifically in the context of a Data Science team. Broadly speaking, there are 2 main types of tasks: Analysis task and Modeling task. I will try to provide a few bullet points as guidelines of what could be included in description to your colleagues for these 2 types.

A. Analysis task

Context

  • What stakeholders are related to this analysis? What are their problems/agendas?
  • How is this analysis useful in the bigger picture? How will the results be leveraged by others?

Objectives

  • What question(s) should the analysis answer?
  • What relationship should the analysis discover?

Methodology

  • What type of segmentation could be used?
  • What metrics/types of metrics that might be carrying hidden insights?
  • What data sources should be looked into?

Expected output

  • Format of the output: Spreadsheets, slide decks, a visualization, a table on the company’s database.
  • More details: Any particular types of visualization, Schema of the output table in the database.

B. Modeling task

Context

  • What business problem will the model help solve?
  • How will the model be used if one day it gets to production?

Objectives

  • What are the measurement metrics for model performance? What is the goal for the model (e.g. roc_auc > 0.85)
  • How will the model be measured in a production environment (e.g. response rate)?

Methodology

  • Algorithms that may be suitable for the job with the explanation of your reasoning
  • Features that in your opinions may have good predicting power
  • Hyper-parameters that your co-workers could focus on tuning — again, explain your reasoning
  • Any pitfalls that you foresee that should be avoided

Expected output

  • Just a prototype on a notebook or fully functioning, deployable pipeline?
  • Any log / documentation of the model performance that should be included?

Example

To help you better integrating the framework above into your team’s workflow, below is an example of information needed for the delegation of a modeling task.

Context

As a business owner

I would like to find more customers for one of our rising insurance products, “Unicorn’s Hope”

So that we can target those customers and increase sales.

If the effectiveness of the model is proven, it will be used to identify potential customers in the recurring monthly campaign.

Objectives

  • Offline evaluation metric: roc_auc; f1_score.
  • Online evaluation: Response rate to the campaign. “Response” means the customer responds to the advertisement and ultimately buys the insurance policy.

Methodology

  • Since we have not have any marketing campaign data on this particular product, you might want to proxy the label with current policyholder of “Unicorn’s Hope”.
  • Also, because the number of labels is limited at the moment, please take a look at some semi-supervised techniques such as label spreading.

Expected output

  • An ML model ready to deploy to production with reports on its offline performance.
  • A list of the most potential customers for “Unicorn’s Hope”.

Timeline

  • Feb 25: Methodology and First prototype.
  • Mar 09: Final model with full expected outputs.

Conclusion

Collaboration is important in any team, especially a Data Science team. One pivotal part of that process is to delegate your work effectively to your fellow team members. I hope this article could help you achieve that goal.

Note: The original version of this article could be found on Medium. The reason I created this copy is due to restriction to access to Medium in some country.


Let’s get in touch

This is Kien, I’m enthusiastic about data science and how it could help companies grow. Have a chat with me!

Updated: