# A Guide to Dynamic Pricing Algorithms | Grid Dynamics

Price setting is one of the most important problems in retail because any price setting error directly results in lost profit. However, traditional price management methods almost never achieve optimal pricing because they are designed for traditional environments, where the frequency of price changes is inherently limited (e.g., brick-and-mortar stores), and the complexity of pricing models is constrained by the capabilities of off-the-shelf tools and manual processes.

Dynamic pricing algorithms help to increase the quality of pricing decisions in e-commerce environments by leveraging the ability to change prices frequently and collect the feedback data in real time. These capabilities enable a company to respond to demand changes more efficiently, reduce forecasting errors, and automate price management for catalogs with hundreds of millions of items.

This article is a deep dive into dynamic pricing algorithms that use reinforcement learning and Bayesian inference ideas, and were tested at scale by companies like Walmart and Groupon. We focus on the engineering aspects through code snippets and numerical examples; the theoretical details can be found in the referenced articles.

## Overview

Traditional price optimization requires knowing or estimating the dependency between the price and demand. Assuming that this dependency is known (at least at a certain time interval), the revenue-optimal price can be found by employing the following equation:
p∗=argmaxp  p×d(p)

where p is the price and d(p) is a demand function. This basic model can be further extended to incorporate item costs, cross-item demand cannibalization, competitor prices, promotions, inventory constraints and many other factors. The traditional price management process assumes that the demand function is estimated from the historical sales data, that is, by doing some sort of regression analysis for observed pairs of prices and corresponding demands (pi,di). Since the price-demand relationship changes over time, the traditional process typically re-estimates the demand function on a regular basis. This leads to some sort of dynamic pricing algorithm that can be summarized as follows:

1. Collect historical data on different price points offered in the past as well as the observed demands for these points.
2. Estimate the demand function.
3. Solve the optimization problem similar to the problem defined above to find the optimal price that maximizes a metric like revenue or profit, and meets the constraints imposed by the pricing policy or inventory.
4. Apply this optimal price for a certain time period, observe the realized demand, and repeat the above process.

The fundamental limitation of this approach is that it passively learns the demand function without actively exploring the dependency between the price and demand. This may or may not be a problem depending on how dynamic the environment is:

• If the product life cycle is relatively long and the demand function changes relatively slowly, the passive learning approach combined with organic price changes can be efficient, as the price it sets will be close to the true optimal price most of the time.
• If the product life cycle is relatively short or the demand function changes rapidly, the difference between the price produced by the algorithm and the true optimal price can become significant, and so will the lost revenue. In practice, this difference is substantial for many online retailers, and critical for retailers and sellers that extensively rely on short-time offers or flash sales (Groupon, Rue La La, etc.).

The second case represents a classical exploration-exploitation problem: in a dynamic environment, it is important to minimize the time spent on testing different price levels and collecting the corresponding demand points to accurately estimate the demand curve, and maximize the time used to sell at the optimal price calculated based on the estimate. Consequently, we want to design a solution that optimizes this trade-off, and also supports constraints that are common in real-life environments. More specifically, let’s focus on the following design goals:

• Optimize the exploration-exploitation trade-off given that the seller does not know the demand function in advance (for example, the product is new and there is no historical data on it). This trade-off can be quantified as the difference between the actual revenue and the hypothetically possible revenue given that the demand function is known.
• Provide the ability to limit the number of price changes during the product life cycle. Although the frequency of price changes in digital channels is virtually unlimited, many sellers impose certain limitations to avoid inconsistent customer experiences and other issues.
• Provide the ability to specify valid price levels and price combinations. Most retailers restrict themselves to a certain set of price points (e.g., $25.90,$29.90, …, \$55.90), and the optimization process has to support this constraint.
• Enable the optimization of prices under inventory constraints, or given dependencies between products.

In the remainder of this article, we discuss several techniques that help to achieve the above design goals, starting with the simplest ones and gradually increasing the complexity of the scenarios.