This post provides more details on the “Dynamic Quarantine” exit path from the COVID-19
pandemic that I listed in a previous post.
We need to reduce transmission of the virus to a level where the number of infected
people at any time shrinks, rather than grows.
Absent vaccines or other medications, this requires reduction of in-person contact
between people (“social distancing”).
However, this makes normal functioning of the economy largely impossible. For example, the
state of California just ordered all “non-essential businesses” to be closed. While this
may work in the short term, the longer the lock-down continues, the more things “break”:
from mass unemployment and resulting poverty/defaults/bankruptcies, to the availability of
replacement parts and eventually essentials such as food.
Such “social distancing” may need to continue until a vaccine is available, which may take
many months (12-18 months is a common estimate). It is unclear how to keep the economy
functioning enough for such an extended period of time.
We need better ideas.
The basic idea
Instead of a blanket shutdown of all “non-essential” businesses, confining “everybody”
to their residence, we could shut down only those businesses in which infection is likely,
and confine only those people to isolation whose likelihood of infecting somebody is
higher than a certain threshold. In this approach, those likelihoods are dynamically
determined by means of data collection mostly through mobile phones, and an algorithm that
produces a corresponding score for each person from the collected data.
The likelihood of a subject infecting somebody is determined as a function of what is known about
the health of the subject so far, plus a history of the subject’s interactions with other people
and those people’s likelihood of infecting somebody.
By tracking this information in real time, the blanket closure of businesses and blanket
shelter-in-place of the population can be avoided, and instead be replaced with a sharp,
pinpointed focus on isolating those that are most likely contributing to the spread of the
disease. The remainder of the economy and population can continue to function.
Certain parameters in the algorithm can be tuned to provide different tradeoffs between
reducing spread and inhibiting (or not) the economy.
The infectiousness score in this approach is an estimate for the likelihood that one
person infects another when exposed for a certain time period (e.g. 5 min).
For our purposes here, the infectiousness score is a number between 0 and 1, where
0 means: not infectious (e.g. because a highly reliable test has just cleared the subject)
and 1 means: known to be maximally infectious (e.g. because viral loads have been found to
be high, and the subject behaves promiscuously).
A few definitions first:
P: a person (aka subject)
S(P,t): the infectiousness score of person
P at time
t. Ranges between
0 (not infectious) and 1 (maximally infectious).
The core algorithm is as follows. It deals with direct infection between two people only,
but an extension is discussed below.
- At each time unit (e.g. every hour),
S(P,t) is calculated as a function of:
S(P,t-1): the infectiousness score of the person at the time prior;
S(Pi,τ): the infectiousness score of all people
the subject interacted with in the time period
τ = (t-tw) ... (t-1) (where
is a parameter that determines the length of the time window that’s being considered;
selection of this parameter depends on characteristics of the disease, such as
incubation times, as well as the characteristics of enacted community interventions such
as availability, frequency and accuracy of testing);
- a rating of the subject’s current health derived from the subject’s self-assessment;
- a rating of the subject’s current health based on information from the future (see
- Test results come in with a delay (e.g. one day between
tTest and current time
Once available, the estimate for the infectiousness of the subject between
t will be “overwritten” with an updated, more accurate estimate for that already-passed
time period that takes the results of the test into account.
- Similarly, subjects may be infectious prior to experiencing any symptoms. Once symptoms
are apparent, all prior estimates of infectiousness of the subject will be
recalculated over some time window whose length is determined by some assumptions
about the disease (incubation time, time of infectiousness prior to symptoms etc).
- When subject
P's history is rewritten, the histories (and current score) need to be
recalculated and rewritten of all subjects that have previously taken the history of
P into account for their own scores. They need to now use the rewritten history.
This may happen recursively. History may be overwritten repeatedly for a given
subject, which again triggers rewrites for other subjects. (More efficient algorithms
producing the same result can be found.)
Additional potential inputs to the algorithm:
- A rating of a subject’s interventions that may modify their infectiousness, such as:
- wearing a mask;
- intentionally exhaling at others;
Extension to other forms of transmission
So far, we have assumed that transmission can only occur between two people in the same
location. However, there are other forms of transmissions, such as:
- transmission via a contaminated surface within a certain time interval that the
virus remains active on that surface;
- transmission via air droplets in an enclosed space with a certain time interval.
To account for these forms of transmission, the algorithm is extended to also include
estimates of the infectiousness of objects in certain locations. Similar to people,
these objects have an infectiousness score that is a function of which people (and
their scores) have interacted with it in times prior, its previous infectiousness
score and the passage of time.
The score of objects in the vicinity is considered as part of the algorithm to update
S(P,t) in a corresponding manner to that of people.
Users run an app on their mobile phones.
From time to time, the app asks the user about how they feel. Specifically it asks
about symptoms related to COVID-19, such as fever, fatigue, cough etc.
The app’s main screen shows an easy-to-understand visual representation of the
likely infectiousness score, such as a color code (e.g. green: unlikely to infect).
When the app reports a score above a certain threshold, the subject goes into
shelter-in-place or quarantine. (Legal questions about whether this is voluntary
or legally required are out of scope for this discussion; certainly regulations such
as “must be sheltered-in-place unless score is green” would be possible.)
Before two (or more) people meet in person, they can agree on a maximum score that
participants are allowed to have to be allowed to participate in the meeting. (Such a
maximum score may also be legally mandated.) The participants in the meeting check each
others’ scores before the meeting.
Before a business admits a customer (or employee) onto the premises, they require the
customer or employee to share their score. They will be denied access if the score
is above a certain threshold. They may also deny access to those visitors who do not
have, or are unwilling to display their score.
When the user gets tested, they enable the testing provider to add the test results
to their record so it can be used to calculate the score going forward.
Depending on the implementation choices made, the mobile phone may need to be
connected to the internet, to a local WiFi network and/or have Bluetooth on as sender
or receiver or both.
Assumptions / challenges
Test results can be brought into the system in a way that defeats tampering: we cannot
allow a subject to fake negative test results, for example, or eliminate from consideration
positive test results.
Individuals may be tempted to fake their scores in order to enter a certain venue,
for example, such as by displaying a static screen shot on their phone instead of their
live score. Technical means (e.g. timestamping the display, or simultaneously broadcasting
the score via wireless networking) can be employed to make this more difficult. This
approach would also use technical means (e.g. public keys, app stores) to prevent “rogue apps”
with false scores to participate.
In a naive implementation, the entire record of each subject (e.g. the entire world
population) would be centrally collected. This would create a privacy nightmare and
enable substantial future harm from dangers that are not biological in nature. So we
assume that the implementation would need to be performed in a fashion that does not have
a central point of data collection.
Location accuracy for this app is paramount. The absolute coordinates are less important;
but relative coordinates between two subjects need to be determined as well as possible,
as a distance of 2ft vs 8ft has substantially different impact on likelihood of
transmission. This could be addressed with technical means (e.g. Bluetooth, NFC), user
input (e.g. verify / enter into the app the people currently in close proximity) or a
The space in which an encounter occurs is highly relevant. For example, a 10 min
contact at 6ft inside a small, enclosed space without ventilation has dramatically
different transmission characteristics than contact of the the same duration and distance
in open nature with a slight wind. This also could be addressed with technical means
(e.g. mapping information), use input (e.g. enter into the app whether the surroundings
are enclosed space, ventilated, open window, city street, open nature etc) or a
Approach to Privacy
It appears possible to keep most information needed for the functioning of the system
on individual users’ mobile phones without requiring a centralized data repository:
- The algorithm can run locally on local data.
- Detection of other people in the neighborhood can be performed via local wireless networking
(e.g. WiFi, zeroconf, Bluetooth).
- The communication between mobile phones of people in an encounter to exchange scores can be
performed using secure end-to-end encryption between the phones using any networking technology
including through a centralized backend. This would not compromise privacy significantly.
- To trigger history rewrites in other phones, those connections to other phones can be
remembered and re-activated (including identity / encryption keys). This may use
some existing centralized communication network (e.g. instant messenger) or a decentralized
alternative with a distributed hash table for lookup, for example.
- None of the functionality, or communications require more than pseudonymous identity.
No centralized account, or identity verification is required, with the potential exception
of entering verified testing results. However, in this case, the identity correlation
remains local on the user’s device and is never shared beyond.
Public health reporting and management
- The app can report scores to the public health authorities, who have the ability to
track actual – and best-guess estimates – of the spread of the disease in real time.
- For privacy reasons, scores do not need to be associated with other identifying
attributes, although it may be advantageous to share demographic info such as age,
and approximate (maybe rasterized) geographic location of the subject.
- Key parameters of the algorithm – e.g. thresholds for “acceptable” scores for
certain activities – could be centrally updated by the public health authorities,
in order to “shape” the progression of the disease in real time.
- The intentional distribution of data and computation, instead of centrally collecting
it all, for privacy reasons, needs to be weighed against the need to continually
debug, and improve the algorithm.
- To be able to understand the functioning of the algorithm in the field, and to make
improvements, it appears sufficient to report the time histories of scores centrally,
including rewritten histories. It does not appear necessary to identify the specific other
people whose scores were used as input to the algorithm, nor the locations where
encounters took place.
- Should more detailed information be required, collecting such more detailed information
from a relatively small sample of volunteers should be sufficient.
I’m reading that Taiwan has been
limiting, to two a week, the number of face masks people are allowed to buy. The
thesis is that it is better that most people have a few, instead of few people
hoard a lot, to protect the population as a whole from COVID-19.
They implement it the straightforward way: would-be purchasers must show a national
identity card, and there is a centralized database that tracks the purchases against
the national identifier. You only get to buy one if you haven’t exhausted your
allotment for this week. Other places, like California, do the same thing for
certain medications (e.g. Sudafed), and certainly would apply it to face masks,
too, if they felt like they needed to ration them. (We’ll see about that.)
Obviously, from a privacy perspective, this system is terrible. The pharmacy
has no business to know whatsoever what my full name is, my address, my date
of birth, and all of those things that tend to come with centralized ID cards:
all information I am forced to hand over to before I can buy my cold meds in
California, or a less than one-dollar mask in Taiwan. On the other hand,
whatever system is implemented must be reasonably hard to circumvent, otherwise
it is pointless.
So, friends in the self-sovereign identity community: how would you guys solve
this problem in a privacy-preserving way that nevertheless has the same effect?
Hint: This is a great PR opportunity (in spite of the calamity), and perhaps a
tech deployment opportunity, because we can be sure that what Taiwan started
about masks here will be followed by others in short order. (I notice that the
prices for face masks – in particular the shipping charges! – on Amazon seem
to increase by the day.) And why not help out with helping people have more
access to protection equipment, while also giving them privacy? There are
worse use cases for identity technology than that!
For the purposes of our ongoing discussions in the MyData Silicon Valley Hub
about a potential North America conference later this year, here is my attempt at
segmenting the stakeholders.
|Product & Services
|Channel & Distribution
- Prime movers: innovators, inventors, people and organizations that proactively push the
vision forward and do things the first time they have ever been done, not waiting for others.
- Followers: people and organizations who are willing to do things consistent with the
vision but only after others have pioneered the way first.
- Neutral: people and organizations who don’t care about about the vision.
- Adversaries: people and organizations whose vision is fundamentally different and whose
agenda is opposed to ours.
- Product & Services: creators of apps, platforms, integration products, support and the like.
- Channel & Distribution: systems integrators, value-added resellers, app stores, retail etc.
- Catalysts: press, analysts, event organizers, activists, MyData Global itself,
governments / regulators, investors.
- Customers: buyers and users of products and services (consumers, enterprises, governments).
What do you think?
In the US, we think of our struggle over data ownership as a conflict between large,
unaccountable companies (like Facebook) versus us as individuals. But it is more
complex than that as soon as you look beyond the US.
Take the Germany federal government, for example. How does your sovereignty as
a nation look to you, if data is the new oil in the 21st century, but most of that data
ends up on clouds operated by American (or Chinese) companies? Critical infrastructure
entirely dependent on the goodwill of one (or two) other countries? Who can see anything
you do there? Or turn it off in case of a conflict? Sounds like a disaster
waiting to happen.
So what do you do? You might team up with fellow nations, like other EU members, and pass
regulations such as the GDPR
which erodes the exclusivity US companies have over data. Or
spearhead a project called Gaia-X, which is intended to be a European alternative to
American (and Chinese) “clouds” with the stated goal of regaining data sovereignty.
And into this fight steps Brad Smith, now Microsoft president, who is
(in German) in the press saying:
German data should serve German companies – not just a handful of companies on the
US west coast or the Chinese east coast.
(I will ignore here that this comes across as quite racist, and in case of Germany, one
should not make that mistake, even if it comes from an American.)
Clearly, Microsoft has identified an opportunity to make a bundle here, by selling to
countries like Germany attempting to set up their own clouds, and we know this because
the quote comes from very top of the company, not some regional sales manager.
But the striking part of the quote: “should server German companies” (not “people”,
or “Germany, the country”) tells us clearly what the German government has in mind here,
to whom it is directed: use data to bolster German companies in international competition.
While we all benefit from new rules such as the GDPR, and their enforcement in Europe as in the
recent case of the Irish against Facebook,
it’s clear we, individuals, are merely an accidental beneficiary.
It’s really about big company competition, supported by national governments. Let’s not
forget that. If they were to accommodate each other somehow, I bet the push for privacy
and GDPR-like things would evaporate in a heartbeat.
In the privacy-related communities
out, we often use the phrase “privacy
and agency” as a label for the totality of what we want.
But what if those two cannot be had at the same time? What if more privacy, in
practice, means I need third parties to take a larger role, thereby reducing my
agency? Or what if I have more agency and can do more things in more ways that I
solely determine, but only at the cost of less privacy?
If so, then look no further than the recent public discussion (dispute?) between the
founders of the Signal and Matrix
messaging systems, Moxie Marlinspike and
Matthew Hodgson. The essence of
their arguments, and I paraphrase:
- Moxie: you can’t build a private messaging system that’s competitive as a consumer
app unless a single party, such as the Signal project, takes responsibility and ownership
of the whole thing. Lots of privacy, but for the user it’s take it or leave it.
Link to full post.
- Matthew: decentralization, on all levels including code forks and potentially
insecure (non-private!) deployments, is an essential requirement to avoid
single points of failure: critical people or components turning bad.
Link to full post.
This is a high-quality conversation and we can all be very happy that it is
conducted openly, and in a spirit of finding the truth. Go read both pieces, and ponder
the arguments, it’s very much worth your while.
Who is right?
IMHO, both are. I don’t know whether all the the tradeoffs described are as unavoidable
and unmitigatable as they are made out to be on those posts; maybe more innovation in
technology and in particular governance could alleviate some of them.
However, the basic idea of a tradeoff between them, is valid. The Signal and Matrix
projects make different choices on that spectrum, both for valid reasons.
If they need to do that, chances are, everybody else who cares about providing
products and services with privacy and agency for the user, faces similar tradeoffs.
It would serve us well to acknowledge that in every discussion on those points,
and respect others who have the same goals as we do, but make different tradeoffs.
The most important point, however, is this: it shows how important it is to have both
projects, or a plurality of projects addressing similar requirements but making
different tradeoffs. Because that gives us, the users, you and me, the agency to make
our own choices based on our own preferences. Including the choice to forego some
agency in some aspects in favor of more privacy.
Which is the most important aspect of agency of them all.
Updated 2020-01-24 with answers from Victor, slightly edited for formatting
purposes. Thanks, =vg!
My friends Victor and Jim at JLINC have published
a set of technical documents that show how to implement
“Information Sharing Agreements” – contractual agreements between two parties,
where one party receives information, such as personal information, from the other
party and commits to only use the received data in accordance with the agreement.
This is basically a respectful, empowering form of today’s widespread, one-sided
“I must consent to anything” click-through agreement every website forces us to
sign. It’s respectful because:
- it is negotiated, rather than unilaterally imposed as it is the default on the internet
- the existence of the agreement, and which parties it binds, can be cryptographically
proven by both parties;
- there’s a full audit log on both sides, and so it would be difficult to “wiggle out of”
- it can’t be unilaterally changed after the fact, only terminated.
So as I read through the documents, I had some questions, and as usual, I blog them :-)
in random sequence.
in-lined. I will add answers to this post as I find out about them.
Q: Why is a separate DID method required? I don’t
quite understand what is unique about JLINC DIDs that are forms of DIDs can’t do, too.
- A: The W3C DID working group has specified a “common data model, a URL format,
and a set of operations for DIDs, DID documents, and DID methods.” This by itself does
nothing - individual DID methods conforming to this model then need to be specified
and implemented. See here. There are
various DID methods (including `did:jlinc``) listed in the
DID method registry. We believe
our method is better for -our- needs and use cases – and besides, we understand that
Q: To create a JLINC DID, I need to post something to which URL? The spec says
but doesn’t identify a hostname. Can it be any? Or is that intended to be a centralized
service, perhaps run by JLINC, the company?
- A: Anyone could read our public spec and create their own registry, but we have put
up a testnet and made it available via an open source
Node module](https://github.com/jlinclabs/jlinc-did-client). The example config file
in the above repo contains the correct testnet URL. When we feel the W3C DID model
has stabilized sufficiently we will make available a production-version public registry.
Q: How do the identifiers that the two parties use for the JLINC protocol relate to
identifiers they may use for other types of interaction, e.g. some other protocols
within in the decentralized / self-sovereign identity universe? Is a given user supposed
to have a variety of them for different purposes?
- A: This is a question that is being addressed by the W3C DID-resolver community group,
in which we are participating. We will make available a JLINC DID resolver when that
spec has been published. Every DID contains a (presumably registered) DID method as its
second colon-separated value (e.g. “did:jlinc:SOME-UNIQUE-STRING”) so you will be able
to resolve any DID whose method your resolver is configured for.
Q: Why is a ledger and its associated ledger provider required? (Actually, maybe it is
optional. But the spec says “may submit it to a Ledger of their choice to establish
non-repudiation”, so that implies the ledger is required for that purpose.)
- A: Supporting audit ledgers is part of our plan but has not yet been implemented.
Q: There is already a
previousId in each exchange. Wouldn’t that be sufficient for
non-repudiation if the two parties keep their own records?
- A: Theoretically yes, but a third-party audit record contemporaneous with each
data-transfer event would guard against any nefarious record manipulation that might
become possible if there should turn out to be some cryptographic weakness discovered.
Q: There is also the role of an “audit provider”. How is it different from a “ledger provider”?
And if it is, why do we need both?
- A: Those are two names for the same thing.
Q: Are, by virtue of the ledger, the Information Sharing Agreements themselves, essentially
public or at least leaked to an uninvolved third party? Can I use JLINC to privately agree
on an Information Sharing Agreement without telling others about it? If so, what
functionality do I lose?
- A: For most purposes we envision using Standard Information Sharing Agreements (SISAs)
that are published publicly, and we are looking for a suitable standards body to work
out a format for those and perhaps publish some useful ones, modeled along the lines of
Creative Commons. But JLINC will work fine with any agreement, most likely identified
with a content-addressed URL, but conceivably even a private legal agreement between
two parties, identified only by its hash value.
Q: When an AgreementURI is used to merely point to the legal text that defines the agreement,
rather than incorporating it into the exchanged JSON, would it make sense to also at least
include a hash of the agreement text? That way, a party cannot so easily wiggle out of
the agreement by causing the hoster of the agreement text to make modifications, or
claim to have agreed to a different version of the agreement.
- A: Yes, ISAs are always identified by their hashes, usually via a content-addressed
URL like IPFS or some similar scheme that includes a hash of the content as part of the address.
Q: There’s a field
descendedFrom in various examples, which isn’t documented and is
always the text string
null. What might that be for?
- A: The JLINC protocol has been rapidly evolving as we build stuff and discover
ambiguities and possible efficiencies in it. That field is obsolete.
Q: How would a
permissionEvent work in practice? Wouldn’t that require the underlying
legal text to change? Is there a description somewhere?
- A: The ISA should specify that the data-custodian agrees and will respect the
rights-holder’s choices as they are transmitted via permission events. Then each
permission change event is transmitted under the existing ISA, same as with data events.
Q: Could one use JLINC to govern data that’s much longer, or much more complex, than
the typical small set of name-value pairs used for user registration data on consumer
websites? Can I use it, say, for the first chapter of my Great American Novel I am
sending to a publisher, permitting them to only read it themselves but not publish it
yet, or to send my MRIs to a new doctor?
Q: In a successful relationship between a Me and a B, to use the Me2B Alliance‘s
terminology, it appears that the “data kimono” is gradually opened by the Me to the B.
For example, the Me may first visit a website without an account, then register (and provide
their name and e-mail address) and a month later, buy something (which requires a shipping
address and a credit card number, but only until the purchase is delivered and the data
can be deleted again). In the JLINC world, does this require a different Information
Sharing Agreement on each step? (particularly for the deletion after shipment?)
- A: No – see the permissionEvent question above.