Tech

By Johannes Ernst

https://reb00ted.org/tech/

  • 2022-10-24

    The Push-Pull Publish-Subscribe Pattern (PuPuPubSub, or shorter: P3Sub)

    (Updated Dec 14, 2022 with clarifications and a subscriber implementation note.)

    Preface

    The British government clearly has more tolerance for humor when naming important things than the W3C does. Continuing in the original fashion, thus this name.

    The Problem

    The publish-subscribe pattern is well known, but in some circumstances, it suffers from two important problems:

    1. When a subscriber is temporarily not present, or cannot be reached, sent events are often lost. This can happen, for example, if the subscriber computer reboots, falls off the network, goes to sleep, has DNS problems and the like. Once the subscriber recovers, it is generally not clear what needs to happen for the subscriber to catch up to the events it may have missed. It is not even clear whether it has missed any. Similarly, it is unclear for how long the publisher needs to retry to send a message; it may be that the subscriber has permanently gone away.

    2. Subscriptions are often set up as part of the following pattern:

      • A resource on the Web is accessed. For example, a user reads an article on a website, or a software agent fetches a document.
      • Based on the content of the obtained resource, a decision is made to subscribe to updates to that resource. For example, the user may decide that they are interested in updates to the article on the website they just read.
      • There is a time lag between the time the resource has been accessed, and when the subscription becomes active, creating a race condition during which update events may be missed.

    While these two problems are not always significant, there are important circumstances in which they are, and this proposal addresses those circumstances.

    Approach to the solution

    We augment the publish-subscribe pattern in the following way:

    1. All events, as well as the content of the resource whose changes are supposed to be tracked are time-stamped. Also, each event identifies the event that directly precedes it (that way, the subscriber can detect if it missed something). Alternatively, a monotonically increasing sequence number could be used.

    2. The publisher stores the history of events emitted so far. For efficiency reasons, this may be shortened to some time window reaching to the present, as appropriate for the application; for example, all events in the last month. (Similar to how RSS/Atom feeds are commonly implemented.)

    3. The publisher provides a query interface to the subscriber to that history, with a “since” time parameter, so the subscriber can obtain the sequence of events emitted since a certain time. (Actually, since “right after” the provided time not including the provided time itself.)

    4. When subscribing, in addition to the callback address, the subscriber provides to the publisher:

      • a time stamp, and
      • a subscription id.

    Further, the actual sending of an event from the publisher to the subscriber is considered to be a performance optimization, rather than core to the functionality. This means that if the event cannot be successfully conveyed (see requirements above), it is only an inconvenience and inefficiency rather than a cause of lost data.

    Details

    About the race condition

    1. The future subscriber accesses resource R and finds time stamp T0. For example, a human reads a web page whose publication date is April 23, 2021, 23:00:00 UTC.

    2. After some time passes, the subscriber decides to subscribe. It does this with the well-known subscription pattern, but in addition to providing a callback address, it also provides time stamp T0 and a unique (can be random) subscription id. For example, a human’s hypothetical news syndication app may provide an event update endpoint to the news website, and time T0.

    3. The publisher sets up the subscription, and immediately checks whether any events should have been sent between (after) T0 and the present. (It can do that because it stores the update history.) If so, it emits those events to the subscriber, in sequence, before continuing with regular operations. As a result, there is no more race condition between subscription and event.

    4. When sending an event, the publisher also sends the subscription id.

    About temporary unavailability of the subscriber

    1. After a subscription is active, assume the subscriber disappears and new events cannot be delivered. The publisher may continue to attempt to deliver events for as long as it likes, or stop immediately.

    2. When the subscriber re-appears, it finds the time of the last event it had received from the publisher, say time T5. It queries the event history published by the publisher with parameter T5 to find out what events it missed. It processes those events and then re-subscribes with a later starting time stamp corresponding to the last event it received (say T10). When it re-subscribes, it uses a different subscription id and cancels the old subscription.

    3. After the subscriber has re-appeared, it ignores/rejects all incoming events with the old subscription id.

    Subscriber implementation notes

    • The subscriber receives events exclusively through a single queue for incoming events. This makes implementing an incoming-event handler very simple, as it can simply process events in order.

    • The event queue maintains the timestamp of the last event it successfully added. When a new event arrives, the queue accepts this event but only if the new event is the direct follower of the last event it successfully added. If it is not, the incoming event is discarded. (This covers both repeatedly received events and when some events were missed.)

    • The subscriber also maintains a timer with a countdown from the last time an event was successfully added to the incoming queue. (The time constant of the timer is application-specific, and may be adaptive.) When the timeout occurs, the subscriber queries the publisher, providing the last successful timestamp. If no updates are being found, nothing happens. If updates are being found, it is fair to consider the existing subscription to have failed. Then:

      • The subscriber itself inserts the obtained “missed” events into its own incoming event queue from where they are processed.
      • The subscriber cancels the existing subscription.
      • The subscriber creates a new subscription, with the timestamp of the most recent successfully-inserted event.

    Observations

    • Publishers do not need to remember subscriber-specific state. (Thanks, Kafka, for showing us!) That makes it easy to implement the publisher side.

    • From the perspective of the publisher, delivery of events to subscribers that can receive callbacks, and those that need to poll, both works. (It sort of emulates RSS except that a starting time parameter is provided by the client, instead of a uniform window decided on by the publisher as in RSS)

    • Subscribers only need to keep a time stamp as state, something they probably have already anyway.

    • Subscribers can implement a polling or push strategy, or dynamically change between those, without the risk of losing data.

    • Publishers are not required to push out events at all. If they don’t, this protocol basically falls back to polling. This is inefficient but much better than the alternative and can also be used in places where, for example, firewalls prevent event pushing.

    Feedback?

    Would love your thoughts!

  • 2022-08-30

    The 5 people empowerment promises of web3

    Over at Kaleido Insights, Jessica Groopman, Jaimy Szymanski, and Jeremiah Owyang (the former Forrester “Open Social” analyst) describe Web3 Use Cases: Five Capabilities Enabling People.

    I don’t think this post has gotten the attention it deserves. At the least, it’s a good starting framework to understand why so many people are attracted to the otherwise still quite underdefined web3 idea. Hint: it’s not just getting rich quick.

    I want to riff on this list a bit, by interpreting some of the categories just a tad differently, but mostly by comparing and contrasting to the state of the art (“web2”) in consumer technology.

    Empowerment promise State of the art ("web2") The promise ("web3")
    Governance How much say do you, the user, have in what the tech products do that you use? What about none! The developing companies do what they please, and very often the opposite of what their users want. Users are co-owners of the product, and have a vote through mechanisms such as DAOs.
    Identity You generally need at least an e-mail address hosted by some big platform to sign up for anything. Should the platform decide to close your account, even mistakenly, your identity effectively vanishes. Users are self-asserting their identity in a self-sovereign manner. We used to call this "user-centric identity", with protocols such as my LID or OpenID before they were eviscerated or co-opted by the big platforms. Glad to see the idea is making a come-back.
    Content ownership Practically, you own very little to none of the content you put on-line. While theoretically, you keep copyright of your social media posts, for example, today it is practically impossible to quit social media accounts without losing at least some of your content. Similarly, you are severely limited in your options for privacy, meaning where your data goes and does not go. You, and only you, decide where and how to use your content and all other data. It is not locked into somebody else's system.
    Ability to build Ever tried to add a feature to Facebook? It's almost a ridiculous proposition. Of course they won't let you. Other companies are no better. Everything is open, and composable, so everybody can build on each other's work.
    Exchange of value Today's mass consumer internet is largely financed through Surveillance Capitalism, in the form of targeted advertising, which has led to countless ills. Other models generally require subscriptions and credit cards and only work in special circumstances. Exchange of value as fungible and non-fungible tokens is a core feature and available to anybody and any app. An entirely new set of business models, in addition to established ones, have suddently become possible or even easy.

    As Jeremiah pointed out when we bumped into each other last night, public discussion of “web3” is almost completely focused on this last item: tokens, and the many ill-begotten schemes that they have enabled.

    But that is not web3’s lasting attraction. The other four promises – participation in governance, self-sovereign identity, content ownership and the freedom to build – are very appealing. In fact, it is hard to see how anybody (other than an incumbent with a turf to defend) could possible argue against any of them.

    If you don’t like the token part? Just don’t use it. 4 out of the 5 web3 empowerment promises for people, ain’t bad. And worth supporting.

  • 2022-08-15

    Levels of information architecture

    I’ve been reading up on what is apparently called information architecture: the “structural design of shared information environments”.

    A quite fascinating discipline, and sorely needed as the amount of information we need to interact with on a daily basis keeps growing.

    I kind of think of it as “the structure behind the design”. If design is the what you see when looking at something, information architecture are the beams and struts and foundations etc that keeps the whole thing standing and comprehensible.

    Based on what I’ve read so far, however, it can be a bit myopic in terms of focusing just on “what’s inside the app”. That’s most important, obviously, but insufficient in the age of IoT – where some of the “app” is actually controllable and observable through physical items – and the expected coming wave of AR applications. Even here and now many flows start with QR codes printed on walls or scanned from other people’s phones, and we miss something in the “design of shared information environments” if we don’t make those in-scope.

    So I propose this outermost framework to help us think about how to interact with shared information environments:

    Universe-level:
    Focuses on where on the planet where a user could conceivably be, and how that changes how they interact with the shared information environment. For example, functionality may be different in different regions, use different languages or examples, or not be available at all.
    Environment-level:
    Focuses on the space in which the user is currently located (like sitting on their living room couch), or that they can easily reach, such as a bookshelf in the same room. Here we can have a discussion about, say, whether the user will pick up their Apple remote, run the virtual remote app on their iOS device, or walk over to the TV to turn up the volume.
    Device-level:
    Once the user has decided which device to use (e.g. their mobile phone, their PC, their AR goggles, a button on the wall etc), this level focuses on what they user does on the top level of that device. On a mobile phone or PC, that would be the operating-system level features such as which app to run (not the content of the app, that’s the next level down), or home screen widgets. Here we can discuss how the user interacts with the shared information space given that they also do other things on their device; how to get back and forth; integrations and so forth.
    App-level:
    The top-level structure inside an app: For example, an app might have 5 major tabs reflecting 5 different sets of features.
    Page-level:
    The structure of pages within an app. Do they have commonalities (such as all of them have a title at the top, or a toolbox to the right) and how are they structured.
    Mode-level:
    Some apps have “modes” that change how the user interacts with what it shown on a page. Most notably: drawing apps where the selected tool (like drawing a circle vs erasing) determines different interaction styles.

    I’m just writing this down for my own purposes, because I don’t want to forget it and refer to it when thinking of design problems. And perhaps it is useful for you, the reader, as well. If you think it can be improved, let me know!

  • 2022-08-07

    An autonomous reputation system

    Context: We never built an open reputation system for the internet. This was a mistake, and that’s one of the reasons why we have so much spam and fake news.

    But now, as governance takes an ever-more prominent role in technology, such as for the ever-growing list of decentralized projects e.g. DAOs, we need to figure out how to give more power to “better” actors within a given community or context, and disempower or keep out the detractors and direct opponents. All without putting a centralized authority in place.

    Proposal: Here is a quite simple, but as I think rather powerful proposal. We use an on-line discussion group as an example, but this is a generic protocol that should be applicable to many other applications that can use reputation scores of some kind.

    1. Let’s call the participants in the reputation system Actors. As this is a decentralized, non-hierarchical system without a central power, there is only one class of Actor. In the discussion group example, each person participating in the discussion group is an Actor.

      An Actor is a person, or an account, or a bot, or anything really that has some ability to act, and that can be uniquely identified with an identifier of some kind within the system. No connection to the “real” world is necessary, and it could be as simple as a public key. There is no need for proving that each Actor is a distinct person, or that a person controls only one Actor. In our example, all discussion group user names identify Actors.

    1. The reputation system manages two numbers for each Actor, called the Reputation Score S, and the Rating Tokens Balance R. It does this in a way that it is impossible for those numbers to be changed outside of this protocol.

      For example, these numbers could be managed by a smart contract on a blockchain which cannot be modified except through the outlined protocol.

  1. The Reputation Score S is the current reputation of some Actor A, with respect to some subject. In the example discussion group, S might express the quality of content that A is contributing to the group.

    If there is more than one reputation subject we care about, there will be an instance of the reputation system for each subject, even if it covers the same Actors. In the discussion group example, the reputation of contributing good primary content might be different from reputation for resolving heated disputes, for example, and would be tracked in a separate instance of the reputation system.

  1. The Reputation Score S of any Actor automatically decreases over time. This means that Actors have a lower reputation if they were rated highly in the past, than if they were rated highly recently.

    There’s a parameter in the system, let’s call it αS, which reflects S’s rate of decay, such as 1% per month.

  1. Actors rate each other, which means that they take actions, as a result of which the Reputation Score of another Actor changes. Actors cannot rate themselves.

    It is out of scope for this proposal to discuss what specifically might cause an Actor to decide to rate another, and how. This tends to be specific to the community. For example, in a discussion group, ratings might often happen if somebody reads newly posted content and reacts to it; but it could also happen if somebody does not post new content because the community values community members who exercise restraint.

  1. The Rating Tokens Balance R is the set of tokens an Actor A currently has at their disposal to rate other Actors. Each rating that A performs decreases their Rating Tokens Balance R, and increases the Reputation Score S of the rated Actor by the same amount.

  2. Every Actor’s Rating Tokens Balance R gets replenished on a regular basis, such as monthly. The regular increase in R is proportional to the Actor’s current Reputation Score S.

    In other words, Actors with high reputation have a high ability to rate other Actors. Actors with a low reputation, or zero reputation, have little or no ability to rate other Actors. This is a key security feature inhibiting the ability for bad actors to take over.

  1. The Rating Token Balance R is capped to some maximum value Rmax, which is a percentage of the current reputation of the Actor.

    This prevents passive accumulation of rating tokens that then could be unleashed all at once.

  1. The overall number of new Ratings Tokens that is injected into the system on a regular basis as replenishment is determined as a function of the desired average Reputation Score of Actors in the system. This enables Actors’ average Reputation Scores to be relatively constant over time, even as individual reputations increase and decrease, and Actors join and leave the system.

    For example, if the desired average Reputation Score is 100 in a system with 1000 Actors, if the monthly decay reduced the sum of all Reputation Scores by 1000, 10 new Actors joined over the month, and 1000 Rating Tokens were eliminated because of the cap, 3000 new Rating Tokens (or something like that, my math may be off – sorry) would be distributed, proportional to their then-current Reputation Scores, to all Actors.

  1. Optionally, the system may allow downvotes. In this case, the rater’s Rating Token Balance still decreases by the number of Rating Tokens spent, while the rated Actor’s Reputation also decreases. Downvotes may be more expensive than upvotes.

    There appears to be a dispute among reputation experts on whether downvotes are a good idea, or not. Some online services support them, some don’t, and I assume for good reasons that depend on the nature of the community and the subject. Here, we can model this simply by introducing another coefficient between 0 and 1, which reflects the decrease of reputation of the downvoted Actor given the number of Rating Tokens spent by the downvoting Actor. In case of 1, upvotes cost the same as downvotes; in case of 0, no amount of downvotes can actually reduce somebody’s score.

To bootstrap the system, an initial set of Actors who share the same core values about the to-be-created reputation each gets allocated a bootstrap Reputation Score. This gives them the ability to receive Rating Tokens with which they can rate each other and newly entering Actors.

Some observations:

Known issues:

I would love your feedback.

This proposal probably should have a name. Because it can run autonomously, I’m going to call it Autorep. And this is version 0.5. I’ll create new versions when needed.

  • 2022-07-27

    Is this the end of social networking?

    Scott Rosenberg, in a piece with the title “Sunset of the social network”, writes at Axios:

    Mark last week as the end of the social networking era, which began with the rise of Friendster in 2003, shaped two decades of internet growth, and now closes with Facebook’s rollout of a sweeping TikTok-like redesign.

    A sweeping statement. But I think he’s right:

    Facebook is fundamentally an advertising machine. Like other Meta products are. There aren’t really about “technologies that bring the world closer together”, as the Meta homepage has it. At least not primarily.

    This advertising machine has been amazingly successful, leading to a recent quarterly revenue of over $50 per user in North America (source). And Meta certainly has driven this hard, otherwise it would not have been in the news for overstepping the consent of its users year after year, scandal after scandal.

    But now a better advertising machine is in town: TikTok. This new advertising machine is powered not by friends and family, but by an addiction algorithm. This addiction algorithm figures out your points of least resistance, and pours down one advertisement after another down your throat. And as soon as you have swalled one more, you scroll a bit more, and by doing so, you are asking for more advertisements, because of the addiction. This addiction-based advertising machine is probably close to the theoretical maximum of how many advertisements one can pour down somebody’s throat. An amazing work of art, as an engineer I have to admire it. (Of course that admiration quickly changes into some other emotion of the disgusting sort, if you have any kind of morals.)

    So Facebook adjusts, and transitions into another addiction-based advertising machine. Which does not really surprise anybody I would think.

    And because it was never about “bring[ing] the world closer together”, they drop that mission as if they never cared. (That’s because they didn’t. At least MarkZ didn’t, and he is the sole, unaccountable overlord of the Meta empire. A two-class stock structure gives you that.)

    With the giant putting their attention elsewhere, where does this leave social networking? Because the needs and the wants to “bring[ing] the world closer together”, and to catch up with friends and family are still there.

    I think it leaves social networking, or what will replace it, in a much better place. What about this time around we build products whose primary focus is actually the stated mission? Share with friends and family and the world, to bring it together (not divide it)! Instead of something unrelated, like making lots of ad revenue! What a concept!

    Imagine what social networking could be!! The best days of social networking are still ahead. Now that the pretenders are leaving, we can actually start solving the problem. Social networking is dead. Long live what will emerge from the ashes. It might not be called social networking, but it will be, just better.

  • 2022-07-26

    A list of (supposed) web3 benefits

    I’ve been collecting a list of the supposed benefits of web3, to understand how the term is used these days. Might as well post what I found:

    • better, fairer internet
    • wrest back power from a small number of centralized institutions
    • participate on a level playing field
    • control what data a platform receives
    • all data (incl. identities) is self-sovereign and secure
    • high-quality information flows
    • creators benefit
    • reduced inefficiencies
    • fewer intermediaries
    • transparency
    • personalization
    • better marketing
    • capture value from virtual items
    • no censorship (content, finance etc)
    • democratized content creation
    • crypto-verified information correctness
    • privacy
    • decentralization
    • composability
    • collaboration
    • human-centered
    • permissionless

    Some of this is clearly aspirational, perhaps on the other side of likely. Also not exactly what I would say if asked. But nevertheless an interesting list.