51黑料不打烊

Auto-optimization models auto-optimization-model

An Auto-optimization model aims to serve offers that maximize the return (KPIs) set by business clients. These KPIs could be in the form of conversion rates, revenue, etc. At this point, Auto-optimization focuses on optimizing offer clicks with offer conversion as our target. Auto-optimization is non-personalized and optimizes based on 鈥済lobal鈥 performance of the offers.

Limitations limitations

The use of Auto-optimization models for decision management is subject to the limitations below:

  • Auto-optimization models do not work with the Batch Decisioning API.
  • Feedback needed to build model must be sent in as an experience event. It should not be sent in automatically in Journey Optimizer channels.

Terminology terminology

The following terms are useful when discussing Auto-optimization:

  • Multi-armed bandit:鈥疉鈥痑pproach to optimization balances exploratory learning and exploitation of that learning.

  • Thomson sampling:鈥疶hompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. Learn more

  • : Set of continuous鈥痙efined on the interval [0, 1] 鈥痓y two positive鈥.

Thompson Sampling thompson-sampling

The algorithm that underlies Auto-optimization is Thompson sampling. In this section, we discuss the intuition behind Thompson sampling.

, or Bayesian bandits, is a Bayesian approach to the multi-armed bandit problem. The basic idea is to treat the average reward 饾泹 from each offer as a random variable 鈥痑nd use the data we have collected so far, to update our 鈥渂elief鈥 about the average reward. This 鈥渂elief鈥 is represented mathematically by a posterior probability distribution 鈥 essentially a range of values for the average reward, along with the plausibility (or probability) that the reward has that value for each offer.鈥疶hen, for every decision, we will sample a point from each of these posterior reward distributions and select the offer whose sampled reward had the highest value.

This process is illustrated in the figure below, where we have 3 different offers. Initially we have no evidence from the data and we assume all offers have a uniform posterior reward distribution. We draw a sample from each offer鈥檚 posterior reward distribution. The sample selected from Offer 2鈥檚 distribution has the highest value. This is an example of exploration. After showing Offer 2, we collect any potential reward (for example conversion/no-conversion) and update the posterior distribution of Offer 2 using Bayes Theorem as explained below. We continue this process and update the posterior distributions each time an offer is shown and the reward is collected. In the second figure, Offer 3 is selected 鈥 despite Offer 1 having the highest average reward (its posterior reward distribution is furthest to the right), the process of sampling from each distribution has led to us choosing an apparently suboptimal Offer 3. In doing so, we give ourself the opportunity to learn more about Offer 3鈥檚 true reward distribution.

As more samples are collected, the confidence increases, and a more accurate estimate of the possible reward is obtained (corresponding to narrower reward distributions).鈥疶his process of updating our beliefs as more evidence becomes available is known as鈥 Bayesian Inference.

Eventually, if one offer (e.g. Offer 1) is a clear winner, its posterior reward distribution will be separated from others. At this point, for each decision, the sampled reward from Offer 1 is likely to be the highest, and we will choose it with a higher probability. This is exploitation 鈥 we have a strong belief that Offer 1 is the best, and so it鈥檚 being chosen to maximize rewards.

Figure 1: For every decision, we sample a point from the posterior reward distributions. The offer with highest sample value (conversion rate) will be chosen. In the initial phase, all offers have uniform distribution since we don鈥檛 have any evidence about the conversion rates of the offers from the data. As we collect more samples, the posterior distributions get narrower and more accurate. Ultimately, the offer with highest conversion rate will be chosen every time.

Technical Details

To calculate/update distributions, we use Bayes Theorem. For each offer i, we want to calculate their P(饾泹颈 | data), i.e. for each offer i, how likely a reward value 饾泹颈 is, given the data we have collected so far for that offer.

From Bayes Theorem:

Posterior = Likelihood * Prior

The prior probability is the initial guess about the probability for producing an output. The probability, after some evidence has been collected, is known as the posterior probability.

Auto-optimization is designed to consider binary rewards (click/no-click). In this case, the likelihood represents the number of successes from N trials and is modeled by a Binomial distribution. For some likelihood functions, if you choose a certain prior, the posterior ends up being in the same distribution as the prior. Such a prior then is called a conjugate prior. This kind of prior makes the calculation of posterior distribution very simple. The Beta distribution is a conjugate prior to the binomial likelihood (binary rewards), and so is a convenient and sensible choice for the prior and posterior probability distributions.The Beta distribution takes two parameters, and .鈥疶hese parameters can be thought of as the count of successes and failures and the鈥痬ean value given by:

The Likelihood function as we explained above is modeled by a Binomial distribution, with s successes (conversions) and f failures (no-conversions) and q is a with a .

The prior is modeled by Beta distribution and the posterior distribution takes the following form:

The posterior is calculated by simply adding the number of successes and failures to the existing parameters , .

For Auto-optimization, as shown in the example above, we start with a prior distribution Beta(1, 1) (uniform distribution) for all offers and after getting鈥痵鈥痵uccesses and鈥痜鈥痜ailures for a given offer, the posterior鈥痓ecomes a Beta distribution with parameters鈥 (s+伪, f+尾) for that offer.

Related topics:

For a deeper dive on Thompson sampling, read the following research papers:

Cold-start problem cold-start

The 鈥渃old-start鈥 problem occurs when a new offer is added to a campaign, and there is no data available about the new offer鈥檚 conversion rate. During this period, we have to come up with a strategy regarding how often this new offer is chosen so that the performance drop is minimized, while we collect information about conversion rate of this new offer. There are multiple solutions available to tackle this problem. The key is to find a balance between the exploration of this new offer while we don鈥檛 sacrifice the exploitation much. Currently we use 鈥渦niform distribution鈥 as our initial guess about the new offer鈥檚 conversion rate (prior distribution). Basically we give all conversion rate values equal probability of occurrence.

Figure 2: Consider a campaign with 3 offers. While the campaign is live, Offer 4 is added to the campaign. Initially we have no data about the conversion rate of the Offer 4 and we have to deal with the cold-start problem. We use uniform distribution as our initial guess about conversion rate of Offer 4, while we collect data for this new offer. As explained in the Thompson sampling section, to choose which offer is going to be shown to a user, we sample points from the posterior rewards distributions of the offers and select the offer with the highest sample value. In the example above, Offer 4 is chosen and later based on the reward collected, the posterior distribution of this offer is updated as explained in the Thompson sampling section.

Lift Measurement lift

鈥淟ift鈥 is the metric used to measure the performance of any strategy deployed in ranking service, in comparison with baseline strategy (serving offers just randomly).

For example, if we鈥檙e interested in measuring the performance of a Thompson Sampling (TS) strategy used in ranking service, and the KPI is conversion rate (CVR), the 鈥渓ift鈥 of the TS strategy against baseline strategy is defined as:

recommendation-more-help
b22c9c5d-9208-48f4-b874-1cefb8df4d76