Prepare your eCommerce for ML. Now!

The big eCommerce guys, like Amazon and Walmart, are already extensively using Machine Learning (ML) to control their marketing efforts and to personalize the interaction with their users. The main question is how long it will take for those technologies to trickle down to smaller sellers and eventually become a standard component of every eCommerce platform. Nobody knows the specific date, but the experts expect the entire eCommerce market will transition to a fully integrated ML within 3 to 5 years.

the experts expect the entire eCommerce market will transition to a fully integrated ML within 3 to 5 years.

Small sellers might encounter issues preventing or compromising the integration of ML technologies. Those issues are linked to what data the seller’s store is collecting and how. My recommendation is to start preparing for the inevitable arrival of ML, even if you don’t have yet a plan to adopt ML. Start by revising your data collection processes, making sure you collect all the data in the right way. In 2-4 years, when ML will be mainstream even for small players, you will be ready to jump on the ML bandwagon and take full advantage of it. The small investment you are doing now in preparation will give you a significant ROI and a competitive advantage in your market segment. New York Times estimated that up to 80% of a data scientist’s time is spent “data wrangling”. CrowdFlower estimates “data preparation” at 80%.

What data scientist spend the most time doing?

Those are the potential issues in the way you collect and store customers’ data that you should double check:

  • Not Enough Data. To provide meaningful and useful results about a user, ML needs to work on data breath (the number of features/fields collected for each user), and data width (the number of users/records and the length of time for which we have been collecting data). The minimum number of users to run a linear regression, according to many ML professionals, is about 10K+. More complex and sophisticated is your ML model, the more data you are going to need.
  • Unaggregated Data. Data should reside in a centralized repository, or there should be a sustainable and repeatable process to aggregate all data into a single source, usually a database. For example, some data reside on the mailing system, while the core information is on a SQL database. Some data might have to be preprocessed before it can be added to the central source and fed to the ML system. For example, email conversations can be instrumental in creating an ML model, but they have to be preprocessed to extract critical features.
  • Empty Data. Over time, for each user, we should be collecting the same information. For a data set to be used to train an ML system, we need the same features for every user. It could be an issue, for example, if the seller moved to a different mailing system or a different credit card processing service. Another case is when you recently added a field to a form. All users who filled the form before the update are going to miss that information. 
  • Inconsistent Data. For example, if you have purchasing agents buying from your store, their data can confuse the ML of the recommendation system. It happens because a purchasing agent appears as an individual user, while in reality, is aggregating the purchases of several buyers.
  • Not Clean Data. You might have missing or wrongly recorded data. Also, watch out for exceptions data and duplicates. A subset of the data might have to be excluded to keep the entire set consistent.
  • Unbalanced Data. The most common example is when you have male and female customers, and the female segment is only 5%-10% of your total customer base. If this is the nature of your business, there is no match you can do to improve the data collection stage. This problem will have to be addressed later by a data scientist in the data preparation stage.

There are ways to overcome some of those issues, but they are going to take time and resources. It’s important to start collecting and storing the right data and to establish a process to ensure its quality and consistency. Start now, even if you don’t have a plan or a date to implement an ML system. Your competitors are already doing it!


Franco Folini lives and works in the eCommerce territory, a wild area between the Kingdom of Technology and the Kingdom of Marketing. He speaks fluently the language of both realms. For many years, Franco has been helping people bridge the divide and successfully collaborate.

If you want to find out more about Franco, visit his LinkedIn profile or send him an email folini[at]gmail.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s