2022-10-24

The Push-Pull Publish-Subscribe Pattern (PuPuPubSub)

Johannes Ernst

https://reb00ted.org/tech/20221024-pupupubsub/

Preface

The British government clearly has more tolerance for humor when naming important things than the W3C does. Continuing in the original fashion, thus this name.

The Problem

The publish-subscribe pattern is well known, but in some circumstances, it suffers from two important problems:

  1. When a subscriber is temporarily not present, or cannot be reached, sent events are often lost. This can happen, for example, if the subscriber computer reboots, falls off the network, goes to sleep, has DNS problems and the like. Once the subscriber recovers, it is generally not clear what needs to happen for the subscriber to catch up to the events it may have missed. It is not even clear whether it has missed any. Similarly, it is unclear for how long the publisher needs to retry to send a message; it may be that the subscriber has permanently gone away.

  2. Subscriptions are often set up as part of the following pattern:

    • A resource on the Web is accessed. For example, a user reads an article on a website, or a software agent fetches a document.
    • Based on the content of the obtained resource, a decision is made to subscribe to updates to that resource. For example, the user may decide that they are interested in updates to the article on the website they just read.
    • There is a time lag between the time the resource has been accessed, and when the subscription becomes active, creating a race condition during which update events may be missed.

While these two problems are not always significant, there are important circumstances in which they are, and this proposal addresses those circumstances.

Approach to the solution

We augment the publish-subscribe pattern by:

  1. have all events, as well as the content of the resource that whose changes are supposed to be tracked, be time-stamped; alternatively a monotonically increasing sequence number could be used;

  2. having the publisher store the history of events emitted so far; for efficiency reasons, this may be shortened to some time window reaching to the present, as appropriate for the application; for example, all events in the last month;

  3. having the publisher provide a query interface to the subscriber to that history, with a “since” time parameter, so the subscriber can obtain the sequence of events emitted since a certain time;

  4. in addition to the callback address, providing to the publisher when subscribing

    • a time stamp, and
    • a subscription id;
  5. considering the actual sending if an event from the publisher to the subscriber to be a performance optimization, which when failing, is only an inconvenience rather than a cause of lost data.

Details

About the race condition

  1. The future subscriber accesses resource R and finds time stamp T0. For example, a human reads a web page whose publication date is April 23, 2021, 23:00:00 UTC.

  2. After some time passes, the subscriber decides to subscribe. It does this with the well-known subscription pattern, but in addition to providing a callback address, it also provides time stamp T0 and a unique (can be random) subscription id. For example, a human’s hypothetical news syndication app may provide an event update endpoint to the news website, plus time T0.

  3. The publisher sets up the subscription, and immediately checks whether any events should have been sent between T0 and the present. If so, it emits those events to the subscriber, in sequence, before continuing with regular operations. As a result, there is no more race condition between subscription and event.

  4. When sending an event, the publisher also sends the subscription id.

About temporary unavailability of the subscriber

  1. After a subscription is active, the subscriber disappears and new events cannot be delivered. The publisher may continue to attempt to deliver events for as long as it likes, or stop immediately.

  2. When the subscriber re-appears, it finds the time of the last event it had received from the publisher, say time T5. It queries the event history published by the publisher with parameter T5 to find out what events it missed. It can then re-subscribe with a later starting time stamp (say T10), or with T5, causing the publisher to send all events from T5. When it re-subscribes, it uses a different subscription id.

  3. After the subscriber has re-appeared, it ignores/rejects all incoming events with the old subscription id.

Observations

  • Publishers do not need to remember subscriber-specific state. (Thanks, Kafka, for showing us!) That makes it easy to implement the publisher side.

  • From the perspective of the publisher, delivery of events to clients that can receive callbacks, and those that need to poll, both works. (It sort of emulates RSS except that a starting time parameter is provided by the client, instead of a uniform window decided on by the publisher as in RSS)

  • Clients only need to keep a time stamp as state, something they probably have already anyway.

  • Clients can implement a polling or push strategy, or dynamically change between those, without the risk of losing data.

Feedback?

Would love your thoughts!