Amy wrote about running an experiment on our feed. And it’s time to revisit that experiment and make a decision.
In our previous feed experiments, we established six goals to track for our feed experiments:
- User creates a comment.
- User creates comments on at least 4 different days within a week.
- User views pages on at least 4 different days withint a week.
- User views pages on at least 4 different hours within a day.
- User views pages on at least 9 different days within 2 weeks.
- User views pages on at least 12 different hours within five days.
For this current experiment, which we’re wrapping up, we re-used those goals.
Here’s a link to the code that captures “conversions” for each of the goals.
We use the field_test gem to facilitate the Bayesian A/B hypothesis testing. As part of the experiment, I added an AbExperiment model to Forem. This provides numerous mechanisms to test and toggle experiments. Which proved fortuitous when I broke production.
We then introduced the code to select which Feed algorithm to use. And aside from the minor outages I introduced (and we corrected), we sat back and let the experiment run.
Below are the summary of results regarding the experiments:
|Scenario||Incumbent Conversion||Challenger Conversion||Likely Winner||Probability of Winner|
|Creates a comment.||5.58%||5.87%||Challenger||90%|
|Creates comments on at least 4 different days within a week.||0.23%||0.19%||Incumbent||78%|
|Views pages on at least 4 different days withint a week.||23.98%||23.52%||Incumbent||86%|
|Views pages on at least 4 different hours within a day.||14.17%||13.62%||Incumbent||94%|
|Views pages on at least 9 different days within 2 weeks.||9.60%||9.41%||Incumbent||73%|
|Views pages on at least 12 different hours within five days.||2.24%||2.13%||Incumbent||73%|
First, and foremost, it appears that both feed strategies encourage close to the same engagement. Which is reassuring that the experiment likely did not adversely affect the DEV.to experience.
Second, I’m prepared to call this first experiment in favor of the incumbent.
Third, it appears that the challenger encourage initial conversations, but those conversations dwindled overtime.
Why do I think that this is the behavior? My hypothesis is two primary changes for the challenger:
- The daily_decay_factor, the numeric multiplier we assign to the publication date, overly favored more recently published articles.
- Sorting the relevant feed entries by publication date, instead of the relevance score.
Let’s look at the change in publication date decay rate.
|Days Since Published||Challenger #1 Weight||Challenger #2 Weight|
|15 or more||0.001||0.9|
For the original challenger, I chose a more aggressive decay rate. For the second challenger, I’m significantly easing off of the decay.
I’m also removing the order by publication date, so the upcoming feed experiment will now sort things in relevance order.
I’ve begun the proposal for our next feed experiment. This introduces a few minor tweaks and is intended to be a point for a conversation around how to configure the challenger’s case statements.