3 power-up ideas for your webhooks

If as a platform you support webhooks, here are 3 power-up ideas to make it easy for other applications to extend your service for users.

3 power-up ideas for your webhooks

The true beauty of webhooks come from other applications using that event data for different reasons like analytics, intelligence, reporting, security, messaging and more. It is how Engage can connect to Amazon SES and Mailgun for analytics and messaging. It is how Baremetrics and ChartMogul can connect to Stripe for metrics. If you allow consumers subscribe to events on your platform through webhooks, here are 3 ideas to make the experience better and allow other applications build on top of that to extend your service for user.

1. URL update through API

The way applications that want to utilise data from your webhook work is to set a webhook URL to get and process event data from the webhook. Allowing webhook URL update through API means these applications can abstract the webhook setup process behind a simpler onboarding process for the user. If this is not available, the application will have to inform the user to manually update the webhook URL.

2. Support for multiple URLs

Allow more than one URL per webhook. This means multiple applications can set URLs to receive event data. Let's take an ESP (email service provider) like Mailgun for example. The user might have set a URL  to get delivery events. Connecting Engage, should not mean overriding that URL. In fact, what we do is to create an additional webhook to also get the event data and process for analytics and reporting.

On the flip side, as a platform, as the number of URLs you send event data to grows, resource utilisation increases. You may therefore want to set a limit to the number of URLs. Mailgun for example allows a maximum of 3 URLs per webhook. Amazon SES lets you set a maximum of 10 event destinations per Configuration set.

3. Exponential backoff retries

Things go wrong. A webhook URL will not be available at all times. Resend the event data if the URL is unreachable—if the HTTP response code is not 2xx.

How often should you retry? And for how long?

How often can be done using exponential backoff. Incessantly retrying the URL will overwhelm your system. The idea of exponential backoff is that instead of retrying at fixed intervals, we increase our wait time after every retry. Here is a simple formula for this:

interval = initial_delay * (base^number_of_retries)

Where

  • base is the exponential base (should be greater than 1). Remember, the higher the number, the longer the wait between your retries.
  • initial_delay is the initial delay value. This is optional. If not used, the first interval will be 1, meaning "retry almost immediately".

Let's put this to test. Let's see the next 5 retry times if we hit a non-reachable webhook URL at 9:00 am and we use a base of 2 and initial_delay of 10 for our retry calculation.

  • First try ⇒ 9:00 am
  • Second try ⇒ interval (min) = 10 * 2^0 = 10 mins ⇒ 9:00 am + 10 mins = 9:10 am
  • Third try ⇒ interval (min) = 10 * 2^1 = 20 mins ⇒ 9:10 am + 20 mins = 9:30 am
  • Fourth try ⇒ interval (min) = 10 * 2^2 = 40 mins ⇒ 9:30 am + 40 mins = 10:10 am
  • Fifth try ⇒ interval (min) = 10 * 2^3 = 80 mins ⇒ 10:10 am + 80 mins = 11:30 am

You can see as the interval continues to grow exponentially.

How long should we retry? Well, it depends. Here are some ideas to consider though:

  • How mission critical is the event data you are sending?
  • How long do you expect consumers to fix a broken URL?

Mailgun for example retries 7 times across 8 hours. Stripe retries for 3 days. I will recommend you try at least 6 times. How long 6 retries will be would depend on your delay and initial_base values. Also, after the first few retries, notify the account owner of the unavailable URL.

What happens if the URL is still not available after the maximum period? Disable it and notify the account owner.

Extras

Here are two additional features. They should be treated as must-haves and not just optional features.

Security

Provide a way consumers can verify webhook data is coming from you. One idea is sending events from a set of IPs and telling consumers to whitelist them. An easier idea is to include a signature that can also be generated by the consumer using a shared key. The consumer generates the signature and compares this to the signature sent with the event data. See Mailgun's example.

Include parameters to identify account

When sending the event data, include a parameter or parameters to identify the user's account. Connecting applications may set the same webhook URL for multiple users within the same platform. Having a parameter in the event data that identifies the account helps know what user the data is for. For example, Amazon SES includes a ses:from-domain tag parameter in event data to specify the domain that owns the data. Stripe includes an account parameter that is the ID of the connect account.


Appendix: Implementation idea for Exponential backoff

One easy way to pull off exponential backoff is to use a message queue that supports delay + worker.

  1. Once there is a trigger, send the event data for each subscriber to a queue.
// Webhook trigger
for (const url of webhookURLS) {
  sendtoQueue({
    url: url,
    event_data: eventData
  })
}

2. In the queue worker, send the event data to the webhook URL. If fails, send back to the queue with a delay.

const MAX_RETRIES = 8
const INITIAL_DELAY = 10
const BASE = 2
const retries = data.retries ? +data.retries + 1 : 0
try {
  // Throws an error
  await got.post(data.url, {
    json: data.event_data
  })
} catch (e) {
  if (retries === MAX_RETRIES) {
    // Disable
    // Send notification
    return
  }
  if (retries === 3) {
    // Send warning notification
  }
  const delay = INITIAL_DELAY * (BASE^retries)
  sendtoQueue({
    url: data.url,
    event_data: data.event_data
  }, {
    delay: delay
  })
}