Forem Feed Experiment One

January Results

Background

Amy wrote about running an experiment on our feed. And it’s time to revisit that experiment and make a decision.

The Goals

In our previous feed experiments, we established six goals to track for our feed experiments:

  1. User creates a comment.
  2. User creates comments on at least 4 different days within a week.
  3. User views pages on at least 4 different days withint a week.
  4. User views pages on at least 4 different hours within a day.
  5. User views pages on at least 9 different days within 2 weeks.
  6. User views pages on at least 12 different hours within five days.

For this current experiment, which we’re wrapping up, we re-used those goals.

Here’s a link to the code that captures “conversions” for each of the goals.

The Methodolgy

We use the field_test gem to facilitate the Bayesian A/B hypothesis testing. As part of the experiment, I added an AbExperiment model to Forem. This provides numerous mechanisms to test and toggle experiments. Which proved fortuitous when I broke production.

We then introduced the code to select which Feed algorithm to use. And aside from the minor outages I introduced (and we corrected), we sat back and let the experiment run.

Results

Below are the summary of results regarding the experiments:

Table 235: Forem Feed Experiment One Results
ScenarioIncumbent ConversionChallenger ConversionLikely WinnerProbability of Winner
Creates a comment.5.58%5.87%Challenger90%
Creates comments on at least 4 different days within a week.0.23%0.19%Incumbent78%
Views pages on at least 4 different days withint a week.23.98%23.52%Incumbent86%
Views pages on at least 4 different hours within a day.14.17%13.62%Incumbent94%
Views pages on at least 9 different days within 2 weeks.9.60%9.41%Incumbent73%
Views pages on at least 12 different hours within five days.2.24%2.13%Incumbent73%

Conjecture

First, and foremost, it appears that both feed strategies encourage close to the same engagement. Which is reassuring that the experiment likely did not adversely affect the DEV.to experience.

Second, I’m prepared to call this first experiment in favor of the incumbent.

Third, it appears that the challenger encourage initial conversations, but those conversations dwindled overtime.

Why do I think that this is the behavior? My hypothesis is two primary changes for the challenger:

  • The daily_decay_factor, the numeric multiplier we assign to the publication date, overly favored more recently published articles.
  • Sorting the relevant feed entries by publication date, instead of the relevance score.

Let’s look at the change in publication date decay rate.

Table 236: Forem Feed Publication Decay
Days Since PublishedChallenger #1 WeightChallenger #2 Weight
011
10.950.99
20.90.985
30.850.98
40.80.975
50.750.97
60.70.965
70.650.960
80.60.955
90.550.95
100.50.945
110.40.94
120.30.935
130.20.93
140.10.925
15 or more0.0010.9

For the original challenger, I chose a more aggressive decay rate. For the second challenger, I’m significantly easing off of the decay.

I’m also removing the order by publication date, so the upcoming feed experiment will now sort things in relevance order.

Next Steps

I’ve begun the proposal for our next feed experiment. This introduces a few minor tweaks and is intended to be a point for a conversation around how to configure the challenger’s case statements.