Dynamic quarantine: a proposal for combatting COVID-19 with pinpointed action based on real-time information

This post provides more details on the “Dynamic Quarantine” exit path from the COVID-19 pandemic that I listed in a previous post.

The problem

We need to reduce transmission of the virus to a level where the number of infected people at any time shrinks, rather than grows.

Absent vaccines or other medications, this requires reduction of in-person contact between people (“social distancing”).

However, this makes normal functioning of the economy largely impossible. For example, the state of California just ordered all “non-essential businesses” to be closed. While this may work in the short term, the longer the lock-down continues, the more things “break”: from mass unemployment and resulting poverty/defaults/bankruptcies, to the availability of replacement parts and eventually essentials such as food.

Such “social distancing” may need to continue until a vaccine is available, which may take many months (12-18 months is a common estimate). It is unclear how to keep the economy functioning enough for such an extended period of time.

We need better ideas.

The basic idea

Instead of a blanket shutdown of all “non-essential” businesses, confining “everybody” to their residence, we could shut down only those businesses in which infection is likely, and confine only those people to isolation whose likelihood of infecting somebody is higher than a certain threshold. In this approach, those likelihoods are dynamically determined by means of data collection mostly through mobile phones, and an algorithm that produces a corresponding score for each person from the collected data.

The likelihood of a subject infecting somebody is determined as a function of what is known about the health of the subject so far, plus a history of the subject’s interactions with other people and those people’s likelihood of infecting somebody.

By tracking this information in real time, the blanket closure of businesses and blanket shelter-in-place of the population can be avoided, and instead be replaced with a sharp, pinpointed focus on isolating those that are most likely contributing to the spread of the disease. The remainder of the economy and population can continue to function.

Certain parameters in the algorithm can be tuned to provide different tradeoffs between reducing spread and inhibiting (or not) the economy.

Infectiousness score

The infectiousness score in this approach is an estimate for the likelihood that one person infects another when exposed for a certain time period (e.g. 5 min).

For our purposes here, the infectiousness score is a number between 0 and 1, where 0 means: not infectious (e.g. because a highly reliable test has just cleared the subject) and 1 means: known to be maximally infectious (e.g. because viral loads have been found to be high, and the subject behaves promiscuously).

Details

A few definitions first:

  • P: a person (aka subject)
  • S(P,t): the infectiousness score of person P at time t. Ranges between 0 (not infectious) and 1 (maximally infectious).

The core algorithm is as follows. It deals with direct infection between two people only, but an extension is discussed below.

  • At each time unit (e.g. every hour), S(P,t) is calculated as a function of:
    • S(P,t-1): the infectiousness score of the person at the time prior;
    • S(Pi,τ): the infectiousness score of all people Pi that the subject interacted with in the time period τ = (t-tw) ... (t-1) (where tw is a parameter that determines the length of the time window that’s being considered; selection of this parameter depends on characteristics of the disease, such as incubation times, as well as the characteristics of enacted community interventions such as availability, frequency and accuracy of testing);
    • a rating of the subject’s current health derived from the subject’s self-assessment;
    • a rating of the subject’s current health based on information from the future (see below).

Rewriting history:

  • Test results come in with a delay (e.g. one day between tTest and current time t). Once available, the estimate for the infectiousness of the subject between tTest and t will be “overwritten” with an updated, more accurate estimate for that already-passed time period that takes the results of the test into account.
  • Similarly, subjects may be infectious prior to experiencing any symptoms. Once symptoms are apparent, all prior estimates of infectiousness of the subject will be recalculated over some time window whose length is determined by some assumptions about the disease (incubation time, time of infectiousness prior to symptoms etc).
  • When subject P's history is rewritten, the histories (and current score) need to be recalculated and rewritten of all subjects that have previously taken the history of subject P into account for their own scores. They need to now use the rewritten history. This may happen recursively. History may be overwritten repeatedly for a given subject, which again triggers rewrites for other subjects. (More efficient algorithms producing the same result can be found.)

Additional potential inputs to the algorithm:

  • A rating of a subject’s interventions that may modify their infectiousness, such as:
    • wearing a mask;
    • intentionally exhaling at others;
    • etc.

Extension to other forms of transmission

So far, we have assumed that transmission can only occur between two people in the same location. However, there are other forms of transmissions, such as:

  • transmission via a contaminated surface within a certain time interval that the virus remains active on that surface;
  • transmission via air droplets in an enclosed space with a certain time interval.

To account for these forms of transmission, the algorithm is extended to also include estimates of the infectiousness of objects in certain locations. Similar to people, these objects have an infectiousness score that is a function of which people (and their scores) have interacted with it in times prior, its previous infectiousness score and the passage of time.

The score of objects in the vicinity is considered as part of the algorithm to update S(P,t) in a corresponding manner to that of people.

User experience

  • Users run an app on their mobile phones.

  • From time to time, the app asks the user about how they feel. Specifically it asks about symptoms related to COVID-19, such as fever, fatigue, cough etc.

  • The app’s main screen shows an easy-to-understand visual representation of the likely infectiousness score, such as a color code (e.g. green: unlikely to infect).

  • When the app reports a score above a certain threshold, the subject goes into shelter-in-place or quarantine. (Legal questions about whether this is voluntary or legally required are out of scope for this discussion; certainly regulations such as “must be sheltered-in-place unless score is green” would be possible.)

  • Before two (or more) people meet in person, they can agree on a maximum score that participants are allowed to have to be allowed to participate in the meeting. (Such a maximum score may also be legally mandated.) The participants in the meeting check each others’ scores before the meeting.

  • Before a business admits a customer (or employee) onto the premises, they require the customer or employee to share their score. They will be denied access if the score is above a certain threshold. They may also deny access to those visitors who do not have, or are unwilling to display their score.

  • When the user gets tested, they enable the testing provider to add the test results to their record so it can be used to calculate the score going forward.

  • Depending on the implementation choices made, the mobile phone may need to be connected to the internet, to a local WiFi network and/or have Bluetooth on as sender or receiver or both.

Assumptions / challenges

  • Test results can be brought into the system in a way that defeats tampering: we cannot allow a subject to fake negative test results, for example, or eliminate from consideration positive test results.

  • Individuals may be tempted to fake their scores in order to enter a certain venue, for example, such as by displaying a static screen shot on their phone instead of their live score. Technical means (e.g. timestamping the display, or simultaneously broadcasting the score via wireless networking) can be employed to make this more difficult. This approach would also use technical means (e.g. public keys, app stores) to prevent “rogue apps” with false scores to participate.

  • In a naive implementation, the entire record of each subject (e.g. the entire world population) would be centrally collected. This would create a privacy nightmare and enable substantial future harm from dangers that are not biological in nature. So we assume that the implementation would need to be performed in a fashion that does not have a central point of data collection.

  • Location accuracy for this app is paramount. The absolute coordinates are less important; but relative coordinates between two subjects need to be determined as well as possible, as a distance of 2ft vs 8ft has substantially different impact on likelihood of transmission. This could be addressed with technical means (e.g. Bluetooth, NFC), user input (e.g. verify / enter into the app the people currently in close proximity) or a combination.

  • The space in which an encounter occurs is highly relevant. For example, a 10 min contact at 6ft inside a small, enclosed space without ventilation has dramatically different transmission characteristics than contact of the the same duration and distance in open nature with a slight wind. This also could be addressed with technical means (e.g. mapping information), use input (e.g. enter into the app whether the surroundings are enclosed space, ventilated, open window, city street, open nature etc) or a combination.

Approach to Privacy

It appears possible to keep most information needed for the functioning of the system on individual users’ mobile phones without requiring a centralized data repository:

  • The algorithm can run locally on local data.
  • Detection of other people in the neighborhood can be performed via local wireless networking (e.g. WiFi, zeroconf, Bluetooth).
  • The communication between mobile phones of people in an encounter to exchange scores can be performed using secure end-to-end encryption between the phones using any networking technology including through a centralized backend. This would not compromise privacy significantly.
  • To trigger history rewrites in other phones, those connections to other phones can be remembered and re-activated (including identity / encryption keys). This may use some existing centralized communication network (e.g. instant messenger) or a decentralized alternative with a distributed hash table for lookup, for example.
  • None of the functionality, or communications require more than pseudonymous identity. No centralized account, or identity verification is required, with the potential exception of entering verified testing results. However, in this case, the identity correlation remains local on the user’s device and is never shared beyond.

Public health reporting and management

  • The app can report scores to the public health authorities, who have the ability to track actual – and best-guess estimates – of the spread of the disease in real time.
  • For privacy reasons, scores do not need to be associated with other identifying attributes, although it may be advantageous to share demographic info such as age, and approximate (maybe rasterized) geographic location of the subject.
  • Key parameters of the algorithm – e.g. thresholds for “acceptable” scores for certain activities – could be centrally updated by the public health authorities, in order to “shape” the progression of the disease in real time.

Algorithmic improvements

  • The intentional distribution of data and computation, instead of centrally collecting it all, for privacy reasons, needs to be weighed against the need to continually debug, and improve the algorithm.
  • To be able to understand the functioning of the algorithm in the field, and to make improvements, it appears sufficient to report the time histories of scores centrally, including rewritten histories. It does not appear necessary to identify the specific other people whose scores were used as input to the algorithm, nor the locations where encounters took place.
  • Should more detailed information be required, collecting such more detailed information from a relatively small sample of volunteers should be sufficient.

How would one apply self-sovereign identity to ration face masks?

I’m reading that Taiwan has been limiting, to two a week, the number of face masks people are allowed to buy. The thesis is that it is better that most people have a few, instead of few people hoard a lot, to protect the population as a whole from COVID-19.

They implement it the straightforward way: would-be purchasers must show a national identity card, and there is a centralized database that tracks the purchases against the national identifier. You only get to buy one if you haven’t exhausted your allotment for this week. Other places, like California, do the same thing for certain medications (e.g. Sudafed), and certainly would apply it to face masks, too, if they felt like they needed to ration them. (We’ll see about that.)

Obviously, from a privacy perspective, this system is terrible. The pharmacy has no business to know whatsoever what my full name is, my address, my date of birth, and all of those things that tend to come with centralized ID cards: all information I am forced to hand over to before I can buy my cold meds in California, or a less than one-dollar mask in Taiwan. On the other hand, whatever system is implemented must be reasonably hard to circumvent, otherwise it is pointless.

So, friends in the self-sovereign identity community: how would you guys solve this problem in a privacy-preserving way that nevertheless has the same effect?

Hint: This is a great PR opportunity (in spite of the calamity), and perhaps a tech deployment opportunity, because we can be sure that what Taiwan started about masks here will be followed by others in short order. (I notice that the prices for face masks – in particular the shipping charges! – on Amazon seem to increase by the day.) And why not help out with helping people have more access to protection equipment, while also giving them privacy? There are worse use cases for identity technology than that!

MyData Stakeholder Segmentation (Draft)

For the purposes of our ongoing discussions in the MyData Silicon Valley Hub about a potential North America conference later this year, here is my attempt at segmenting the stakeholders.

Prime mover Follower Neutral Adversary
Product & Services
Channel & Distribution
Catalysts
Customers

where:

  • Prime movers: innovators, inventors, people and organizations that proactively push the vision forward and do things the first time they have ever been done, not waiting for others.
  • Followers: people and organizations who are willing to do things consistent with the vision but only after others have pioneered the way first.
  • Neutral: people and organizations who don’t care about about the vision.
  • Adversaries: people and organizations whose vision is fundamentally different and whose agenda is opposed to ours.

and:

  • Product & Services: creators of apps, platforms, integration products, support and the like.
  • Channel & Distribution: systems integrators, value-added resellers, app stores, retail etc.
  • Catalysts: press, analysts, event organizers, activists, MyData Global itself, governments / regulators, investors.
  • Customers: buyers and users of products and services (consumers, enterprises, governments).

What do you think?

German data for German firms, according to ... Microsoft?

In the US, we think of our struggle over data ownership as a conflict between large, unaccountable companies (like Facebook) versus us as individuals. But it is more complex than that as soon as you look beyond the US.

Take the Germany federal government, for example. How does your sovereignty as a nation look to you, if data is the new oil in the 21st century, but most of that data ends up on clouds operated by American (or Chinese) companies? Critical infrastructure entirely dependent on the goodwill of one (or two) other countries? Who can see anything you do there? Or turn it off in case of a conflict? Sounds like a disaster waiting to happen.

So what do you do? You might team up with fellow nations, like other EU members, and pass regulations such as the GDPR which erodes the exclusivity US companies have over data. Or spearhead a project called Gaia-X, which is intended to be a European alternative to American (and Chinese) “clouds” with the stated goal of regaining data sovereignty.

And into this fight steps Brad Smith, now Microsoft president, who is being quoted (in German) in the press saying:

German data should serve German companies – not just a handful of companies on the US west coast or the Chinese east coast.

(I will ignore here that this comes across as quite racist, and in case of Germany, one should not make that mistake, even if it comes from an American.)

Clearly, Microsoft has identified an opportunity to make a bundle here, by selling to countries like Germany attempting to set up their own clouds, and we know this because the quote comes from very top of the company, not some regional sales manager.

But the striking part of the quote: “should server German companies” (not “people”, or “Germany, the country”) tells us clearly what the German government has in mind here, to whom it is directed: use data to bolster German companies in international competition.

While we all benefit from new rules such as the GDPR, and their enforcement in Europe as in the recent case of the Irish against Facebook, it’s clear we, individuals, are merely an accidental beneficiary.

It’s really about big company competition, supported by national governments. Let’s not forget that. If they were to accommodate each other somehow, I bet the push for privacy and GDPR-like things would evaporate in a heartbeat.

When privacy and agency are in conflict

In the privacy-related communities I hang out, we often use the phrase “privacy and agency” as a label for the totality of what we want.

But what if those two cannot be had at the same time? What if more privacy, in practice, means I need third parties to take a larger role, thereby reducing my agency? Or what if I have more agency and can do more things in more ways that I solely determine, but only at the cost of less privacy?

Unbelievable?

If so, then look no further than the recent public discussion (dispute?) between the founders of the Signal and Matrix messaging systems, Moxie Marlinspike and Matthew Hodgson. The essence of their arguments, and I paraphrase:

  • Moxie: you can’t build a private messaging system that’s competitive as a consumer app unless a single party, such as the Signal project, takes responsibility and ownership of the whole thing. Lots of privacy, but for the user it’s take it or leave it. Link to full post.
  • Matthew: decentralization, on all levels including code forks and potentially insecure (non-private!) deployments, is an essential requirement to avoid single points of failure: critical people or components turning bad. Link to full post.

This is a high-quality conversation and we can all be very happy that it is conducted openly, and in a spirit of finding the truth. Go read both pieces, and ponder the arguments, it’s very much worth your while.

Who is right?

IMHO, both are. I don’t know whether all the the tradeoffs described are as unavoidable and unmitigatable as they are made out to be on those posts; maybe more innovation in technology and in particular governance could alleviate some of them.

However, the basic idea of a tradeoff between them, is valid. The Signal and Matrix projects make different choices on that spectrum, both for valid reasons.

If they need to do that, chances are, everybody else who cares about providing products and services with privacy and agency for the user, faces similar tradeoffs. It would serve us well to acknowledge that in every discussion on those points, and respect others who have the same goals as we do, but make different tradeoffs.

The most important point, however, is this: it shows how important it is to have both projects, or a plurality of projects addressing similar requirements but making different tradeoffs. Because that gives us, the users, you and me, the agency to make our own choices based on our own preferences. Including the choice to forego some agency in some aspects in favor of more privacy.

Which is the most important aspect of agency of them all.

Comments and questions on the JLINC protocol for Information Sharing Agreements

Updated 2020-01-24 with answers from Victor, slightly edited for formatting purposes. Thanks, =vg!

My friends Victor and Jim at JLINC have published a set of technical documents that show how to implement “Information Sharing Agreements” – contractual agreements between two parties, where one party receives information, such as personal information, from the other party and commits to only use the received data in accordance with the agreement.

This is basically a respectful, empowering form of today’s widespread, one-sided “I must consent to anything” click-through agreement every website forces us to sign. It’s respectful because:

  • it is negotiated, rather than unilaterally imposed as it is the default on the internet today;
  • the existence of the agreement, and which parties it binds, can be cryptographically proven by both parties;
  • there’s a full audit log on both sides, and so it would be difficult to “wiggle out of” the agreement;
  • it can’t be unilaterally changed after the fact, only terminated.

So as I read through the documents, I had some questions, and as usual, I blog them :-) in random sequence. I will add answers to this post as I find out about them. Answers in-lined.

  • Q: Why is a separate DID method required? I don’t quite understand what is unique about JLINC DIDs that are forms of DIDs can’t do, too.

    • A: The W3C DID working group has specified a “common data model, a URL format, and a set of operations for DIDs, DID documents, and DID methods.” This by itself does nothing - individual DID methods conforming to this model then need to be specified and implemented. See here. There are various DID methods (including `did:jlinc``) listed in the DID method registry. We believe our method is better for -our- needs and use cases – and besides, we understand that one ;-)
  • Q: To create a JLINC DID, I need to post something to which URL? The spec says /register but doesn’t identify a hostname. Can it be any? Or is that intended to be a centralized service, perhaps run by JLINC, the company?

    • A: Anyone could read our public spec and create their own registry, but we have put up a testnet and made it available via an open source Node module](https://github.com/jlinclabs/jlinc-did-client). The example config file in the above repo contains the correct testnet URL. When we feel the W3C DID model has stabilized sufficiently we will make available a production-version public registry.
  • Q: How do the identifiers that the two parties use for the JLINC protocol relate to identifiers they may use for other types of interaction, e.g. some other protocols within in the decentralized / self-sovereign identity universe? Is a given user supposed to have a variety of them for different purposes?

    • A: This is a question that is being addressed by the W3C DID-resolver community group, in which we are participating. We will make available a JLINC DID resolver when that spec has been published. Every DID contains a (presumably registered) DID method as its second colon-separated value (e.g. “did:jlinc:SOME-UNIQUE-STRING”) so you will be able to resolve any DID whose method your resolver is configured for.
  • Q: Why is a ledger and its associated ledger provider required? (Actually, maybe it is optional. But the spec says “may submit it to a Ledger of their choice to establish non-repudiation”, so that implies the ledger is required for that purpose.)

    • A: Supporting audit ledgers is part of our plan but has not yet been implemented.
  • Q: There is already a previousId in each exchange. Wouldn’t that be sufficient for non-repudiation if the two parties keep their own records?

    • A: Theoretically yes, but a third-party audit record contemporaneous with each data-transfer event would guard against any nefarious record manipulation that might become possible if there should turn out to be some cryptographic weakness discovered.
  • Q: There is also the role of an “audit provider”. How is it different from a “ledger provider”? And if it is, why do we need both?

    • A: Those are two names for the same thing.
  • Q: Are, by virtue of the ledger, the Information Sharing Agreements themselves, essentially public or at least leaked to an uninvolved third party? Can I use JLINC to privately agree on an Information Sharing Agreement without telling others about it? If so, what functionality do I lose?

    • A: For most purposes we envision using Standard Information Sharing Agreements (SISAs) that are published publicly, and we are looking for a suitable standards body to work out a format for those and perhaps publish some useful ones, modeled along the lines of Creative Commons. But JLINC will work fine with any agreement, most likely identified with a content-addressed URL, but conceivably even a private legal agreement between two parties, identified only by its hash value.
  • Q: When an AgreementURI is used to merely point to the legal text that defines the agreement, rather than incorporating it into the exchanged JSON, would it make sense to also at least include a hash of the agreement text? That way, a party cannot so easily wiggle out of the agreement by causing the hoster of the agreement text to make modifications, or claim to have agreed to a different version of the agreement.

    • A: Yes, ISAs are always identified by their hashes, usually via a content-addressed URL like IPFS or some similar scheme that includes a hash of the content as part of the address.
  • Q: There’s a field descendedFrom in various examples, which isn’t documented and is always the text string null. What might that be for?

    • A: The JLINC protocol has been rapidly evolving as we build stuff and discover ambiguities and possible efficiencies in it. That field is obsolete.
  • Q: How would a permissionEvent work in practice? Wouldn’t that require the underlying legal text to change? Is there a description somewhere?

    • A: The ISA should specify that the data-custodian agrees and will respect the rights-holder’s choices as they are transmitted via permission events. Then each permission change event is transmitted under the existing ISA, same as with data events.
  • Q: Could one use JLINC to govern data that’s much longer, or much more complex, than the typical small set of name-value pairs used for user registration data on consumer websites? Can I use it, say, for the first chapter of my Great American Novel I am sending to a publisher, permitting them to only read it themselves but not publish it yet, or to send my MRIs to a new doctor?

    • A: Yes, absolutely.
  • Q: In a successful relationship between a Me and a B, to use the Me2B Alliance‘s terminology, it appears that the “data kimono” is gradually opened by the Me to the B. For example, the Me may first visit a website without an account, then register (and provide their name and e-mail address) and a month later, buy something (which requires a shipping address and a credit card number, but only until the purchase is delivered and the data can be deleted again). In the JLINC world, does this require a different Information Sharing Agreement on each step? (particularly for the deletion after shipment?)

    • A: No – see the permissionEvent question above.