Is your eCommerce ready for the Machine Learning era?

Do you know if your eCommerce is ready for the Machine Learning era?
You need to be ready to transition your eCommerce to machine Learning technologies. The big eCommerce guys, like Amazon and Walmart, are already extensively using Machine Learning (ML) to process customer data to control their marketing efforts and to personalize the interaction with their users. The main question is how long it will take for those technologies to trickle down to smaller sellers and eventually become a standard component of every eCommerce platform. Nobody knows the specific date, but the experts expect the entire eCommerce market will transition to a fully integrated ML within 3 to 5 years.

the experts expect the entire eCommerce market will transition to a fully integrated ML within 3 to 5 years.

What you need to do now

Small online sellers are expecting to be able to take advantage of the power of the most recent Artificial Intelligence (AI) tools. In the process, they might encounter issues preventing or compromising the integration of ML technologies with their current eCommerce platforms and systems. Those issues originate in the type and amount of data the seller’s store is collecting and how the data is collected. My recommendation is to start now preparing your tools and data for the inevitable arrival of ML, even if you don’t yet have a plan to adopt ML.

One of the most common question in the eCommerce industry is “Where do I start preparing my data for a technology I’m not yet familiar with?” Start by revising your data collection processes, making sure you collect all the data in the right way. In 2-4 years, when ML will be mainstream even for small players, you will be ready to jump on the ML bandwagon and take full advantage of it. The small investment you are doing now in preparation will give you a significant ROI and a competitive advantage in your market segment.

New York Times estimated that up to 80% of a data scientist’s time is spent “data wrangling”. CrowdFlower estimates “data preparation” at 80%.

What data scientist spend the most time doing?
What data scientist spend the most time doing?

Prepare your Customer Data for Machine Learning

I prepared a list of the potential issues related to the way you collect and store customers’ data that can affect the integration of a machine Learning system into your eCommerce platform. Those are the potential issues you should double check:

  • Not Enough Data. To provide meaningful and useful results about a user, ML needs to work on data breath (the number of features/fields collected for each user), and data width (the number of users/records and the length of time for which we have been collecting data). The minimum number of users to run a linear regression, according to many ML professionals, is about 10K+. More complex and sophisticated you want your ML model to be, the more data you are going to need.
  • Unaggregated Data. Customer Data should be stored in a centralized repository, or there should be a sustainable and repeatable process to aggregate all data into a single source, usually a database. For example, some data reside on the mailing system, while the core information is on a SQL database. Some Customer Data might have to be pre-processed before it can be added to the central source and fed to the ML system. For example, email conversations can be instrumental in creating an ML model, but they have to be preprocessed to extract critical features and identify patterns.
  • Empty Data. Over time, for each user, we should be collecting the same information. For a data set to be used to train an Machine Learning algorithm, we need the same features for every user. It could be an issue, for example, if the seller moved to a different mailing system or a different credit card processing service. Another case is when you recently added a field to a form. All users who filled the form before the update are going to miss that information. 
  • Inconsistent Data. For example, if you have purchasing agents buying from your store, their data can confuse the ML of the recommendation system. It happens because each purchasing agent appears to the ML algorithm as an individual user, while in reality, the agent is aggregating the purchases of several buyers under a single account.
  • Not Clean Data. You might have missing or wrongly recorded data. Also, watch out for exceptions data and duplicates. A subset of the data might have to be excluded to keep the entire set consistent.
  • Unbalanced Data. The most common example is when you have male and female customers, and the female segment is only 5%-10% of your total customer base. If this is the nature of your business or it appears because of specific products in your catalog, there is no much you can do to improve the data collection stage. This problem will have to be addressed later by a data scientist in the data preparation stage.

There are ways to overcome and fix some of the issues I listed. Fixing each potential issue is going to take time and resources. It’s essential to start ASAP. Start collecting and storing the right data and establish a process to ensure its quality and consistency. Start now, even if you don’t have a plan or a date to implement or integrate an ML system. Your competitors are already doing it!

More eCommerce Articles

This article was first posted on LinkedIn on November 1st, 2019.


Franco Folini lives and works in the eCommerce territory, a wild area between the Kingdom of Technology and the Kingdom of Marketing. He speaks fluently the language of both realms. For many years, Franco has been helping people bridge the divide and successfully collaborate.

If you want to find out more about Franco, visit his LinkedIn profile or send him an email folini[at]gmail.com