- Summary
- What Is A Data Science Project?
- Context: Case Company & Project
- Outcomes
- People
- Process
- Key Learnings
Summary:
- Data science projects are a new beast. Very few companies understand the difference between data science and analytical project. Analytical projects require stakeholders to decide before the value is unlocked. Data science projects build digital products that deliver value via fully automated micro decisions.
- The case study follows the development of the recommendation engine for an Australian leading online business. It shares the results of the phase 1 of the project.
- Setting up the right team and the right process is key. The team should combine creative, technical and commercial talent. The process should start with clear success measures, stimulate enough challenge of the current way of thinking of the problem, release a hypothesised solution as soon as possible and continue to iterate.
- Key learnings were clearly communicating the vision for the project, bring a diverse team together to a non-hierarchical ideation process, and deliver value incrementally.
What Is A Data Science Project?
Before we jump into the case study, I felt it was important to briefly address the misconception about what a data science project is by giving an example of a side-by-side comparison. A lot of Australian companies are currently misusing the term and refer to a business analytics project as data science or big data project. Data science is a very different beast. Here is an example of how the projects differ:
Analytical project Data science project
What it is?
Any project that uses statistical analysis or logic driven process supported by data. The outcomes are a set of recommendations or conclusions that stakeholders need to make a decision on to unlock value
A project that uses large volumes of data to build self-learning and self-improving algorithms that require no human input to unlock value
Examples:
- Pricing strategy
- Choosing the right products for promotional discounting
- Supply chain optimisation models
- Recommendation engines
- Search optimisation bidding models
- Fraud detection and prediction models
Capability:
An analyst usually needs to have:
- Solid logic
- Basic statistical toolkit
- Basic or often advanced commercial acumen
- Ability to build models in excel
A data scientist usually needs to have:
- Advanced statistical toolkit
- Basic commercial acumen
- Basic or advanced technical ability (at least R or better basic coding)
Measuring success
The success of an analytical project is often an effective strategic recommendation that requires a decision. As a result of the decision, certain business benefits are expected to be gained. E.g. setting a different price for products within a category - 10% increase.
The success of the data science project is often an establishment of a self-running data product that delivers value via micro decisions that require no human input. They automatically get better over time with more data collected. You can improve them by tweaking the algorithms. A/B is often used to validate success. E.g. recommender engine consistently delivering better product recommendations over time. n our case study, we will focus on a data science project - improving a recommendation engine for an online business.
Context: Case Company & Project
Company: Our case company is a sizeable online retailer that sells merchandise to consumers all over the world. The company has a strong acquisition strategy that consistently delivered traffic growth. The key business challenge was loyalty. Project: An improved recommendation engine was meant to deliver better product suggestions, that would lead to improved conversion at the time of purchase. However, the key success measure for the project was retention. Both revisitation and repeat purchase rates we key metrics of success. Loyalty was a bigger goal than one-off conversion. This affected design and delivery and made the project different from many recommendation engines targeting immediate behaviour changes.
Outcomes Of The Data Science Project
Any data science project is rarely ever final. I was personally involved in phase 1 of the project delivery. It is still ongoing. Measures of success (in order of priority):
- Repeat purchase: how many more customers came back to the site and bought again compared to old recommendation experience
- Revisitation: how many more customers came back to the site again compared to old recommendation experience
- Favouriting: how many more customers favourited the products recommended compared to old recommendation experience
Phase 1. Email test We first tested the recommender via the email recommendation and these are the movements in core metrics we observed:
- Repeat purchase: +5% better
- Revisitation: +20% better
- Favouriting: 5x better
- Conversion: 3x better
We also observed much more impressive movement in the earlier stages of the funnel. Our open rates were 5 times better than an average email campaign and click through from email was 80% better than average. We discovered that our loyalty metrics - revisitation and repurchase had a more significant delay and were understated.
Phase 2. Onsite test After an email recommendation test was successful, we moved the A/B test to site. At this point the results are being measured and the company is still working on further improvements to its recommendation engine.
People
Choosing the team to deliver the project was the most important factor for us. What we found is that bringing creative talent to collaborate with the technical talent from the very beginning was an absolute key to innovation and led to a step change in recommendation methodology.
Commercial: having someone on the team with a commercial skill set and perspective was key, particularly at the beginning. You can get key leadership stakeholders to set project goals. For us, a product manager played that role. They also helped the team to make trade offs relating to impact along the journey. E.g. should we invest more time in refining the algorithm or release in small parts to test the impact? How much time can we afford to invest in marginal algorithm improvements given the benefits we expect to see?
Technical: this is a default capability in any data science project and is unlikely to get missed. The point of difference for us was that we had a data scientist as well as engineers involved. Data scientist was able to focus on the algorithm logic while engineers focused on making the solution scalable and fast.
Creative: design or customer insights teams rarely come into a data science project team. This was a key differentiator for us. Our designer not only brought more creativity into redefining the algorithm. He really challenged the team to take more risk and take a customer point of view. It was a designed who kept insisting that we can deliver a better recommendation experience for a smaller group of customers if we use different signals in our engine.
Process
Step 1. Define success
Setting up your project with clear metrics of success is key. This is pretty much the only thing that will stay constant. For us it was important to keep in mind that loyalty was always a number one goal. Having won in conversion was a positive stepping stone, but not the final prize. Even when we had exceptional open rates and email engagement, we had to keep working until we got to meaningful loyalty gains. Most meaningful data science projects always have a strong or direct connection to a top company goal.
Step 2. Push the limits
With projects like a recommendation engine, it is easy to fall into the trap of marginal improvements. It is particularly difficult to step outside of the signals that are already clearly linked to the outcome. For us the options were:
- Option 1. Iterate transactional signals by changing weights in the algorithm (product sales, add to cart, product page visits)
- Option 2. Use low coverage signals - those that could potentially be higher quality but work for a smaller number of people. Signals like favouriting is a much more sparse signal than add to cart, and therefore got ignored in the original algorithm design. We brought that back. We also discovered a completely new unconventional signal that had to do with the design of the pattern on the product (e.g. recommend red products if a customer was looking at red).
Putting together a very unconventional combination of signals was a crazy move. But it did pay off. These options would not have been on the table if not for the diversity within the team and the process we followed.
Step 3. Release quick
Data science is really not that different from any digital product project in the way it uses MVP thinking (Minimal Viable Product). The quickest possible way you can test your product always wins. It does not mean a compromise on things that really matter. For instance, we took the time on step 2 to really challenge the status quo. But we did try to think of the quickest possible way to test. For instance, doing an email test vs releasing on site meant 0.5 days of work vs 1 week.
Step 4. Learn and refine
Constant iteration is another digital product delivery concept that applies to data science and makes it very different from an analytical project. The only way a data science model or algorithm can be better is by collecting positive and negative signals. For us, thinking about exposing our recommendation engine via different channels and to different audiences was key in making sure the model got refined and was able to deal with a variety of situations.
Key Learnings
So what have we learnt? If I were, to sum up, the three most important things:
Communicate vision
Once we discovered that a certain group of users would have a better recommendation from signals that have low coverage across the board, that helped us paint a picture for a truly personalised recommendation experience (see below). Communicating the vision for where a data science project can go is critical. It inspires the team. It helps stakeholders back you. It keeps you focused on the most important milestones.
Empower the team
Assuming that your product owner will do the vision and data scientist will build the product is a wrong way to start the ideation. If you start the ideation process with everyone having an equal role, you may be surprised by the level of insight coming from most unlikely team members.
Deliver incrementally
While having a vision as a goal post is great, the focus of the day to day should be on proving it. Find the quickest and most feasible way to test the vision. If your aspiration is to build a truly personal recommendation experience, start by delivering a user cluster version of it. If your aspiration is to deliver recommendations to 10 million visitors on site, start by sending recommendation email to 500,000 people to test it. A data product is just another digital product. You can read more on agile delivery in my guide to agile transformation.