Do you need "all the data" to build and run your algorithm?

Albert Derasse is the founder of the belgian subsidiary of D-AIM, a marketing agency putting AI and Data Science at the service of Marketeers. After having written a book about data and algorithms, Albert raises the question : do we really need all the data to run our algorithm ?


A year ago, I was interviewed by the Belgian news channel LN24 about the use of predictive algorithms and Artificial Intelligence in Marketing.  First question from the journalist (who had grasped the essence of the subject in 24 hours, congrats!):

Do we need to have all the consumer data to build a predictive model for a Marketing Purpose ?

No, certainly not… fortunately!

To understand this, let’s quickly go back to a methodological point related to the construction of an algorithm:

- Briefly, the algorithm will base its observation on the category of individuals who have adopted the behaviour « to be predicted » in the past. It will look at these data and finely note their characteristics in order to associate a coefficient to them, certain characteristics being more heavily discriminating (without any value judgement) in the adoption of the behaviour to be predicted.

- It will then look at the extent to which these discriminatory characteristics are shared with individuals who have not adopted the behaviour in question. In a simplified manner, the sum of the coefficients attributed to the common discriminatory characteristics will make up the score.

On what data will the algorithm base its observation? We can classify them into 4 categories:

1. The descriptive or socio-demographic data known to the advertiser: date of birth, gender, city of residence, etc.
2. Transactional data: the length of time as a customer, products purchased, at what price, method of payment, purchase channel (e-shop or physical shop), etc.
3. Relational type data: known e-mail, given opt-in (which can be a commitment factor), behaviour in relation to e-mails received (opening, clicking, unsubscribing, etc.), participation in contests, number of points in the loyalty programme, calls, visits, etc.
4.   Operational type data: personally I like to put in the data that is specific to the sector of activity under consideration.  If we work for an assistance company, we will include information such as the waiting time for the tow truck, access to a replacement car, …

The data to be taken into account must be considered broadly and an attempt must be made to have a maximum of data that will be potentially discriminatory, heralding the behaviour to be predicted… without knowing before the algorithmic exercise which ones will be discriminatory.

But let’s be realistic, we will never have all this data, from a holistic view… which is ethically preferable!  We remain consumers with a good deal of unpredictability and that’s good ! It is very difficult (but not impossible thanks to neuroscience) to probe our emotions, our moods and all the feelings that guide most of our (sometimes impulsive) purchases.

« Do I have enough discriminating data on the behaviours I want to predict”?

This is one of the questions that advertisers ask themselves (or to me) before making the big leap into algorithms… I can reassure them on the basis of a similar experience, of an intuition, but only the results of the algorithm will appease them completely.

I was at the head office of a perfume and personal care company when I was confronted with this same question from the marketing director at the end of 2019:

- Marketing Director: “Do you think I have the right data to predict my customers’ buying behaviour? »

- Albert: “You have a series of descriptive, transactional, relational data… but you are missing some information. You don’t know that the average customer who enters one of your perfume shops on Tuesday has been in love since Saturday… and you don’t have to know it! And yet, you can imagine that this feeling will guide most of her purchases this week… and as long as the magic works.  Therefore, you need to accept it. “The exhaustive view, the famous 360° view of the consumer is an illusion and would be an ethical catastrophe.  

The systematic lack of completeness of data has a direct consequence on the performance of models and results in a nice expression (called an “aphorism”): “All statistical models are wrong, but some are useful”.

You don’t have expertise in Data Science but would like to know more?

I wrote the book « Algorithmes & Blues du Directeur Marketing. L’IA au service du Marketing Moderne »   for you in 2020. It shows numerous cases of the use of algorithms in marketing and commercial practices. The copyright of the firt editions  are donated to the non-profit organisation Les Amis des Aveugles… Another good reason to read it…

The idea of the campaign and the material were all conceived by Serviceplan. Interested in promoting your brand in a creative way? Contact us so that we can share our ideas.

Do you want to know more?
Feel free to contact our expert
Thank you for reading
Interested in doing
projects with us?
We are dedicated to your entrepreneurial success. For us, flexibility, agility and consideration of your individual needs always has the highest priority. From strategy to analytics, from print to digital, from Brussels to New York, we provide all communication services, everywhere. Just give us a call.

Let’s grab a coffee and make your project happen.