The Risks of Blending Customer Signals from Disparate Sources
May 16, 2016
Originally Posted on Data Informed
One of the more perilous steps in building a data model is determining the right signals to include. When it comes to business-to-business customer analytics, there’s a wide range of signals to choose from – a company’s business model, technology vendors, relevant job postings, public filings, social presence, website activities, marketing engagement, third-party intent data, and other attributes.
But some data scientists forget that all of these signals aren’t created equal, and they shouldn’t be treated the same.
Several important considerations can have a dramatic impact on the accuracy of your predictive model if ignored. Timeliness, for example, is especially critical with behavior signals, which offer insight into how much a customer or prospect is engaged with your company in a given day, week, or month. These signals could include a lead’s website visits, form completes, email clicks, and maybe even free trial application usage data.
Fit signals, on the other hand, focus on how much an incoming prospect resembles a likely buyer, and don’t change much over time. For example, you might look at the lead’s company size, geographic location, industry, and job title to determine whether the lead is a fit for your product.
Our team is often asked about the various signals that go into customer scoring models, and one question that’s been coming up a lot lately is whether behavior or engagement signals should be blended into existing fit models. This is an approach we have evaluated methodically and determined that it’s crucial for fit and behavior models to be kept separate. Here are some reasons for this best practice, and background on the risks of merging the two types of signals into a single predictive model.
The Importance of Time
When it comes to data about your customer or prospect’s behaviors and activities, the time component is of fundamental importance to extracting the signal from the data. That’s because people’s engagement comes in a stream of dynamic activities that may be very old or very recent.
Fit modeling data, on the other hand, is generally static in nature – providing an everlasting profile about each lead – and these two signal types do not mix well.
t can be tempting to merge them for a more comprehensive profile, but that approach brings some major pitfalls that many vendors overlook. To merge these fit and behavior datasets together, you’d have to find a way to “flatten” the behavior data, which is impossible to do without removing valuable information from your signals (i.e., aggregating activity counts across a lead’s lifetime) and throwing away data that could ultimately be key to your model.
Sometimes 1 + 1 ≠ 2
Combining fit and behavioral scores can blind a data model to the difference between a lead that is showing a lot of buying activity but is a poor fit for your business, and a lead that is a good fit for your business but has relatively little activity. By separating the data into two distinct models, you can make sure that the signal in each of these data sources can shine through. Adding more signals into a predictive model does not always lead to an increase in quality. Separating fit from behavioral can help you keep these two independent sources of predictive power from inadvertently interfering with each other. In addition, you’ll have clearer visibility into why each lead is or isn’t likely to convert.
How to Operationalize Distinct Fit and Behavior Scores
Consider, for example, a lead that entered your Marketo system by visiting your website six months ago. This is an unlikely indicator that the lead is looking to purchase this week and, therefore, this lead would get a lower behavior score. However, if you can look at that same lead through a completely separate lens, the lead might still receive a high fit score, indicating that it’s a good fit to buy your product. Your strategy for pursuing these high-fit, low-engagement leads may be to reach out to them with tailored content to see if it increases their behavior scores. This can tell you when the lead is back in the market to buy.
Another example is a lead that has a high behavior score from many website visits and downloads, but a low fit score because the lead is a student, analyst, or in an industry that your product doesn’t serve. Your strategy for dealing with this type of lead would look very different than in the above example. In this scenario, you might not want to spend too much sales effort despite the person’s high behavior score, because you know the lead is unlikely to turn into a real opportunity.
If your model produced only a combined fit and behavior score, these two different types of leads would look very similar, leading you to waste your effort on less effective follow-up strategies. However, with two distinct models, you can look at each lead through multiple dimensions and easily see your different types of leads so that you leave no stone unturned. This insight is crucial not only for helping your business figure out how to achieve the greatest impact with each prospect or customer, but also for accurately evaluating your model performance.
Joel Dodge joined predictive software leader Infer Inc. after several years of teaching and research as a postdoc at Binghamton University. He has a Ph.D. in mathematics from University of California, San Diego, and has published papers in algebraic number theory and algorithmic combinatorics.
Transform Your Pipeline Today
See Firsthand How Infer Uses Your Own Data To Create Custom Scoring Models