The 8 Best Papers on eCommerce Search Algorithms

The 8 Best Papers on eCommerce Search Algorithms

1. Improving Web Search Ranking by Incorporating User Behavior Information 

microsoft.com/en-us/research/publication/improving-web-search-ranking-by-incorporating-user-behavior/

The first paper on our list is one of the most foundational in terms of how successful eCommerce search systems are optimized today. This paper was one of the first to introduce the concept of “incorporating feedback into the ranking process” (ex. Using clicks, conversions, views, etc. to inform ranking), and it’s a great starting point for anyone looking to increase the effectiveness of their site search. 

2. Optimizing Search Engines using Clickthrough Data

https://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf

This paper is similar to the previously shared paper, but with a focus on optimizing search with clickthrough data. In the past, site search results were trained to achieve “relevance” with the help of humans. As the paper states, those training methods are difficult and expensive to apply (and most importantly, ineffective). If you’re looking to learn even more about the foundations behind how winning eCommerce companies optimize their search results automatically, this paper is also a great start. 

3. Amazon Search: The Joy of Ranking Products

https://assets.amazon.science/89/cd/34289f1f4d25b5857d776bdf04d5/amazon-search-the-joy-of-ranking-products.pdf 

Even the smallest improvements in speed and search result quality can have massive impacts on revenue and customer experience, and Amazon’s learnings are perhaps the best example of that. This paper dives deeper into the algorithms and ML frameworks Amazon’s A9 team have implemented to rank products in categories, blend separate rankings in All Product Search, implement NLP techniques used for matching queries and products, and more. 

4. Real-time Personalization using Embeddings for Search Ranking at AirBnB

Real-time personalization is one of the most advanced (and increasingly necessary) features of eCommerce search systems today. AirBnB is one of the best at serving users personalized search and browse results across a wide variety of content, and the solutions they write about in this paper will serve as an incredibly effective starting point for marketplaces (and any other business) looking to implement personalized search. 

5. Learning to Rank for Information Retrieval

https://dl.acm.org/doi/10.1561/1500000016

Learn to rank is “a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance.” This paper provides a solid framework and direction you can use optimize both your clickstream and personalization rankings, and it also provides detailed comparisons between the major learn-to-rank algorithms available today to help you choose the most viable solution for your use case. 

6. On Application of Learning to Rank for E-Commerce Search 

https://arxiv.org/abs/1903.04263 

While the previous paper we shared dives into the details of building a learn-to-rank system for broad information retrieval purposes, this paper focuses specifically on how learning-to-rank methods can be trained using clickthrough data for eCommerce companies. Like we stated before, these concepts are vital to every advanced eCommerce search engine that exists today (including Constructor). 

7. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation 

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36500.pdf

Implementing advanced machine learning algorithms in search is no easy task. Experimentation is inevitable. Hardly anyone knows this better than Google. This paper will shine a light on exactly how Google “tackles the problem of how to run more experiments, how to run experiments that produce better decisions, and how to run them faster.”

8. Hidden Technical Debt in ML 

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

While machine learning is a fantastic tool for optimizing search, if left alone, the maintenance costs of these tactics can grow at a rapid rate. This paper refers to this as “technical debt.” When designing machine learning systems, it’s important to account for the risk factors associated with machine learning, and this paper will teach you about many of them — from boundary erosion to entanglement, hidden feedback loops, and more.

Machine Learning and Vodka – Part 1

Why haven’t more retail companies implemented machine learning? What’s the starting point for machine learning as a retailer? What are the biggest challenges engineers run into when developing algorithms for retail (and does Vodka make it any easier)?

We sat down with our data science team a few months ago to learn exactly that. The results were awesome and insightful.