Tech

By Johannes Ernst

https://reb00ted.org/tech/

2020-10-09

Three Scenarios for Rolling Back Surveillance Capitalism
Are we stuck with Surveillance Capitalism? I hope not.

But what are realistic alternatives? Alternatives that keep the amazing wonders that are consumer technologies in 2020, but don’t invade our privacy, don’t spread misinformation, give us back a measure of control over our electronic lives, don’t set us up for manipulation and help rather than hurt our mental health?

Here are three scenarios how we could get out.

Scenario 1: Regulation Bites

Building on the success of GDPR and buoyed by a growing data sovereignty movement supported by the political right and left, the European Union intensifies regulating cyberspace, and in short order:
- disallows all businesses to move any personal information pertaining to its residents to data centers outside of the European Union;
- broadly disallows user tracking except for very narrow circumstances; in particular, cross-site and cross-app user tracking becomes prohibited; advertising networks cannot target audiences smaller than 100,000 members any more;
- requires all social and communications apps to implement full data portability (including loss-less transmission to a new provider) similar to phone number portability.
The dominant, American social networking giants focus their efforts in the courts to roll back these regulations, but in the meantime, nimble European upstarts simply copy the feature sets of the dominant platforms and implement them consistent with European regulations. Local politicians mention these apps at every opportunity.

By marketing their products through schools, privacy-conscious German parents switch over an entire new generation of users to the European apps, and when e-government initiatives enable citizens to much more easily and securely interact with governments through the new apps, the network effect starts hurting instead of favoring the American surveillance platforms.

As integration has become easy, a European startup figures out how to game-ify fact checking on this new open platform, and on-line misinformation drops rapidly. This increases user engagement and user confidence, and few people ever want to go back to the old apps.

Other countries outside the EU concerned about data sovereignty have been watching carefully and quickly follow the European model, through regulation and targeted industrial policy. Facebook and friends are playing catch-up and are forced to play by the new rules to keep at least some of their user base in those countries.

And when they started to market their apps internationally, even large swathes of the American population moved over, because they don’t want to be surveilled either.

Scenario 2: A Global Disinvestment Campaign Leads to a Vibrant Good Technology Market

With the slogan “Facebook is just as bad as burning oil”, digital rights activists have partnered with veterans of the divestiture campaigns against South African apartheid, tobacco and fossil fuels for an international public relations campaign targeting investment and retirement funds that invest in companies monetizing surveillance.

Being reminded of the impact of previous disinvestment campaigns and sensing a business opportunity, fund managers globally are rapidly rolling out new niche funds that promise to only invest in companies that use personal data responsibly. Their initial target markets are minorities and parents saving for retirement who are concerned about their kids’ safety when using technology.

Upstart VCs jump on the opportunity that this new, focused capital represents and funnel it via special-purpose “Good Tech Only” venture funds to eager entrepreneurs world-wide to build next-generation social networking, commerce and virtual/augmented reality companies, without fear that VCs will pressure them to monetize customer data anyway when the company hits a difficult patch.

Having made a clean break from the surveillance business model, these upstarts are able to innovate rapidly both on business model and technology. For example, enabled by new business models, interoperability with other vendors has now become a value driver rather than a leak in the enterprise’ moat. This completely changes the dynamics of the marketplace.

As a result, entirely new product categories no longer prevented by vendors’ data hoarding strategies explode on the scene, including, for example, much better targeted advertising because users can volunteer personal data without fear of privacy violations, proactive maintenance of consumer products by an army of service providers no longer inhibited by hermetically sealed cloud castle products, and far more reuse and upcycle of previously discarded products.

As the Good Tech brand rises, and unprecedented features become available, more and more technology users are willing to make a clean break with surveillance legacy platforms, and shame their friends to move from the legacy social networks into moving to Good Tech as well.

Ultimately, the legacy vendors practicing surveillance capital face shrinking users bases, less access to capital, and structurally cannot compete with the new generation of Good Tech companies.

Scenario 3: Frustrated Users and Open-Source Developers Start Cooperating for Mutual Benefit

It started small, with a few technically-competent digital rights activists pooling their expertise and a little bit of money to operate their own Mastodon server, so they could stay in touch just like on Twitter, but without an unaccountable third party in the loop. (Note: this, of course, already has happened; there are many Mastodon deployments like this all around the world, some of which have already progressed further along the lines outlined below.)

As interest and user numbers grew, the previously informal collaborations started to be formalized: users not contributing their labor would pay a monthly fee, from which systems administrators would be paid to keep the deployment up and running reliably. Over time, the initial collaborative decision making process for the project morphed into a formal cooperative governance structure in which all stakeholders – users and maintainers – have equal rights. They decided on all matters affecting the project democratically, although different cooperatives employ different styles of governance including direct, liquid and representative democracy.

Soon users started to ask for additional tools provided to them in a similar manner, like document sharing, calendaring, e-mail, and more. Accountants would ask: “Microsoft charges me $6.99 per month to access Excel. If I pay the same amount to the coop, can’t we host something like Excel ourselves, and I can be certain that my clients’ financials stay private instead of whatever Microsoft does?” Some other users in the coop declared that they had similar needs, and banded together, money in hand, to fund a project. Which attracted open-source software developers who committed to porting open-source collaborative document editing software into the cooperative’s environment and keep it maintained for a monthly fee paid for by its users.

Of course, the apps operated by the various cooperatives always interoperated, because that’s what users want and no vendor subject to the coop’s rules has the opportunity (or desire) to lock in anybody. So leaving one cooperative to join another became as simple as moving banks today, with no money or data lost in the process.

Some projects didn’t work out. Some money was wasted. Some coops imploded. Some users left because initially, the quality of the coops’ products was below the quality of social networking products of today’s dominant internet platforms funded by billions of Wall Street dollars. However, because the cooperative structure relates the needs and wants of the users directly to the revenue opportunity of the vendors, with no independent shareholders to satisfy, ultimately the match between needs and features became much better than in pure capitalistic for-profit models, creating legions of fanatically happy users and profitable vendors completely outside the need or desire for surveillance capitalism.

Some final thoughts

Of course, there are other scenarios; elements of these scenarios could be combined in different ways or shake out differently, and predictions are hard, particularly about the future :-)

But there are people working on each of those scenarios today (myself included!), and it is not obvious to me that those projects are doomed. In other words, they have promise! How can we help them be more likely to succeed? Because I want out from surveillance capitalism, and chances are, you do, too!

(Please get in touch.)
2020-06-17

We need best practice templates for tech governance (just like we have a library of open-source licenses)
- privacy
- goverance
Ever learn things while being an invited “expert” on a panel at some conference? It just happened to me, again, at today’s Transparency By Design Summit.

We were discussing how to collect consumer healthcare data responsibly, for COVID-19 and beyond, and the challenges how to (justly) gain the trust of people whose data we would like to collect. Because if they don’t trust us, they won’t let us collect the data, or even poison what they give us. The core question is:

How do I know that you, the data collector/medical researcher/public health system, will indeed do what you promised? (About privacy, data retention, anonymization, sharing etc)

And the answer, as always, is “good governance”, followed by a bunch of hand waving: just what exactly does this mean? What is that thing called “good governance” of a system that includes a lot of technology and a lot of humans developing and operating that technology? Take a COVID-19 contact tracing app: there’s the code, and the release process, and the data sharing, and the employment agreements of people who touch the code or the data that hopefully will oblige them not do “bad things” and the legal enforcement and the audit trails and what have you. It’s not simple, and goes far beyond just “the code”.

First of all we have few examples where good governance is actually practiced. So we are not used to it. Worse, we have nothing resembling agreement on what that actually means, in detail. Just my example enumeration above is woefully lacking in detail.

It occurs to me it’s a bit like open-source licensing of code was about 20+ years ago, with everybody having their own software license (or none at all), many of which were homegrown and not very professional. Fortunately, the open-source world has since coalesced around a fairly small number of primary open-source licenses (like GPL, AGPL, Apache, MIT and a few more), which are fairly well understood.

We need the same thing for technology governance: a bunch of governance templates, which can be used by technology systems. They could, for example, include open-source licensing for their code component (but they don’t necessarily need to), but need to go far beyond, including questions such as:
- What is the data retention period?
- What’s the process to make sure the data is deleted after the data retention period?
- How do we find out whether the process is or isn’t being followed?
… and many other related questions. If we had such a series of templates, innovation in governance was still possible (just create another template) but we could collectively understand what governance looks like for a given system, and, for example, fix governance problems one bug at a time. Something not possible at all today.

It would go a long way towards us all regaining trust in technology. By public health systems pushing COVID apps just as much as Facebook pushing the latest “trust us, we won’t spy on you” update.

Anybody working on anything like that? Would love to hear about it.

2020-05-07

Trust through Transparency for Apps

What do you need to know so you can confidently trust a piece of technology, such as an app supposedly helping fight COVID-19?

That question is at the heart of Project App Assay. It applies to all technology, but is particularly important for the COVID-19 apps, because many of them collect so much information about our health, our friends, our locations and activities around the clock.

Here is a proposal.

First: the key questions that need answering, I think, are:

Is the app effective? If it is not effective in what it does, such as help fight the virus, there is no point, and you should not trust it to help with your life or the lives of your fellow people. Specifically:
- Does it do what it says it does and is it good at it? E.g. if it says it tracks contacts via Bluetooth, does it do that and do it well (and nothing else)?
- Does that help with the virus? E.g. if the app provides medical advice, it would be pointless if the advice it dispensed made no difference to your health or the health of the people around you.
What are the downsides of me using the app? These range from the mundane, like will it drain my phone’s battery quickly, to the profound: e.g. will the people promoting the app use the collected personal data for purposes other than fighting the virus? Perhaps even use it against me now or at some point in the future, e.g. by jacking up insurance rates or finding other members of my persecuted religious minority?

These are critical questions we all ask ourselves when faced with the decision to use or not use an app.

As we analyze COVID-19 apps at Project App Assay, we have observed that the authors of those apps make many claims about their apps answering these questions, but that’s all they are: claims by the creators of the app who obviously have a self-interest. Can those claims be trusted? Clearly, it would be nice if we had more to go on.

So I have come up with the following rating scheme. It looks like this:

	Self-asserted, few details	Self-asserted, comprehensive	Comprehensively audited	Demonstrably uses best practices
Effectiveness
Technology
Operations
Governance

Let me explain:

Effectiveness: what do we know about whether the app is effective? This includes whether its advertised features work, and what we know about whether it indeed helps and pushes back the virus.
Technology: what do we know about the technology, including algorithms, which data is collected, what protocols and cryptography does it use and the like?
Operations: what do we know about how the deployed system is operated, e.g. how often are security reviews being performed, who has access to cryptographic secrets, or are systems administrators vetted?
Governance: who makes decisions, and how are they made, about all aspects of the app and the data it generates? How is dissent handled on the governance team? (E.g. is there a whistleblower process?)

We then rate each dimension with the possible values of:

Self-asserted, few details: the app creator provides no or few details on the subject; no third party has validated those claims.
Self-asserted, comprehensive: the app creator provides comprehensive information on the subject; but no independent, credible third party has validated those claims.
Comprehensively audited by an independent, credible third party: the claims have been validated by an independent, credible third party, and found to be largely correct with no major discrepancies.
Follows best industry practices: the third-party validation confirms that the app follows best industry practices.

As an example, the evaluation of a simple hypothetical app (only) dispensing health advice that gained high marks might look like this:

	Self-asserted, few details	Self-asserted, comprehensive	Comprehensively audited	Demonstrably uses best practices
Effectiveness
Technology
Operations
Governance

This would be the evaluation for the health advice app, if, for example:

the health advice was sourced from respectable medical sources (e.g. CDC) with back links to the source, and had been reviewed for correctness by the CDC.
it was developed in the open, such as open-source, with a large and diverse developer community. If the developer community is large and diverse and functional, it effectively performs the audit function itself, and gravitates to following best technology practices.
for this app, operations are minimal and transparent, so this is a non-issue.
governance of the app was performed in the open, such as in public meetings or on public mailing lists.

On the contrary, the evaluation for a similar app with low marks could look like this:

	Self-asserted, few details	Self-asserted, comprehensive	Comprehensively audited	Demonstrably uses best practices
Effectiveness
Technology
Operations
Governance

This would happen if, for example:

the health advice had no discernable source, and no review had been performed by medical professionals.
the app was provided as a “black box” of which nothing is known other than what the developers claim about it, and they have publicly said little.
there is no knowledge about who is involved in operations or governance of the app, and what decisions are being made on an ongoing basis.

Of course, it is entirely possible that an app could receive low marks although it is effective and does not harm users in any way.

However, for a public health emergency like COVID-19, I can think of few good reasons why apps should keep their technology or governance secret. And as large-scale adoption by many users is required for most to be effective, I can think of few ways to better gain user trust than evaluations all to the right in this matrix.

I would love your feedback on Twitter.

2020-04-28

Virtual Unconference
- iiw
- unconference
It’s the week for the 30th (yes!) Internet Identity Workshop (IIW) and it just started. IIW, since the very first time, has been an unconference since the very first time, with the agenda driven by the participants, and conversation replacing presentations to a large extent.

Because of the pandemic, IIW has gone virtual using a website called qiqochat.com, which integrates collaboration tools such as Zoom, Google Docs and chat into a virtual venue for unconferences.

There is an opening circle, and there are breakout rooms. I’m fascinated about the potential of unconferences in cyberspace … it shows that crises often cause leaps in innovation.

So far, so very good! There have been over 200 people on video in the opening circle, seamless transition into much smaller breakout rooms, and collaborative document editing.
2020-04-06

New Privacy, Surveillance, Anonymity Podcast
- privacy
- security
My friends Seth Goldstein and Kaliya Young have started a new, timely podcast. First episode is up!
2020-03-20

Dynamic quarantine: a proposal for combatting COVID-19 with pinpointed action based on real-time information
This post provides more details on the “Dynamic Quarantine” exit path from the COVID-19 pandemic that I listed in a previous post.

The problem

We need to reduce transmission of the virus to a level where the number of infected people at any time shrinks, rather than grows.

Absent vaccines or other medications, this requires reduction of in-person contact between people (“social distancing”).

However, this makes normal functioning of the economy largely impossible. For example, the state of California just ordered all “non-essential businesses” to be closed. While this may work in the short term, the longer the lock-down continues, the more things “break”: from mass unemployment and resulting poverty/defaults/bankruptcies, to the availability of replacement parts and eventually essentials such as food.

Such “social distancing” may need to continue until a vaccine is available, which may take many months (12-18 months is a common estimate). It is unclear how to keep the economy functioning enough for such an extended period of time.

We need better ideas.

The basic idea

Instead of a blanket shutdown of all “non-essential” businesses, confining “everybody” to their residence, we could shut down only those businesses in which infection is likely, and confine only those people to isolation whose likelihood of infecting somebody is higher than a certain threshold. In this approach, those likelihoods are dynamically determined by means of data collection mostly through mobile phones, and an algorithm that produces a corresponding score for each person from the collected data.

The likelihood of a subject infecting somebody is determined as a function of what is known about the health of the subject so far, plus a history of the subject’s interactions with other people and those people’s likelihood of infecting somebody.

By tracking this information in real time, the blanket closure of businesses and blanket shelter-in-place of the population can be avoided, and instead be replaced with a sharp, pinpointed focus on isolating those that are most likely contributing to the spread of the disease. The remainder of the economy and population can continue to function.

Certain parameters in the algorithm can be tuned to provide different tradeoffs between reducing spread and inhibiting (or not) the economy.

Infectiousness score

The infectiousness score in this approach is an estimate for the likelihood that one person infects another when exposed for a certain time period (e.g. 5 min).

For our purposes here, the infectiousness score is a number between 0 and 1, where 0 means: not infectious (e.g. because a highly reliable test has just cleared the subject) and 1 means: known to be maximally infectious (e.g. because viral loads have been found to be high, and the subject behaves promiscuously).

Details

A few definitions first:
- P: a person (aka subject)
- S(P,t): the infectiousness score of person P at time t. Ranges between 0 (not infectious) and 1 (maximally infectious).
The core algorithm is as follows. It deals with direct infection between two people only, but an extension is discussed below.
- At each time unit (e.g. every hour), S(P,t) is calculated as a function of:
  - S(P,t-1): the infectiousness score of the person at the time prior;
  - S(Pi,τ): the infectiousness score of all people Pi that the subject interacted with in the time period τ = (t-tw) ... (t-1) (where tw is a parameter that determines the length of the time window that’s being considered; selection of this parameter depends on characteristics of the disease, such as incubation times, as well as the characteristics of enacted community interventions such as availability, frequency and accuracy of testing);
  - a rating of the subject’s current health derived from the subject’s self-assessment;
  - a rating of the subject’s current health based on information from the future (see below).
Rewriting history:
- Test results come in with a delay (e.g. one day between tTest and current time t). Once available, the estimate for the infectiousness of the subject between tTest and t will be “overwritten” with an updated, more accurate estimate for that already-passed time period that takes the results of the test into account.
- Similarly, subjects may be infectious prior to experiencing any symptoms. Once symptoms are apparent, all prior estimates of infectiousness of the subject will be recalculated over some time window whose length is determined by some assumptions about the disease (incubation time, time of infectiousness prior to symptoms etc).
- When subject P’s history is rewritten, the histories (and current score) need to be recalculated and rewritten of all subjects that have previously taken the history of subject P into account for their own scores. They need to now use the rewritten history. This may happen recursively. History may be overwritten repeatedly for a given subject, which again triggers rewrites for other subjects. (More efficient algorithms producing the same result can be found.)
Additional potential inputs to the algorithm:
- A rating of a subject’s interventions that may modify their infectiousness, such as:
  - wearing a mask;
  - intentionally exhaling at others;
  - etc.
Extension to other forms of transmission

So far, we have assumed that transmission can only occur between two people in the same location. However, there are other forms of transmissions, such as:
- transmission via a contaminated surface within a certain time interval that the virus remains active on that surface;
- transmission via air droplets in an enclosed space with a certain time interval.
To account for these forms of transmission, the algorithm is extended to also include estimates of the infectiousness of objects in certain locations. Similar to people, these objects have an infectiousness score that is a function of which people (and their scores) have interacted with it in times prior, its previous infectiousness score and the passage of time.

The score of objects in the vicinity is considered as part of the algorithm to update S(P,t) in a corresponding manner to that of people.

User experience
- Users run an app on their mobile phones.
- From time to time, the app asks the user about how they feel. Specifically it asks about symptoms related to COVID-19, such as fever, fatigue, cough etc.
- The app’s main screen shows an easy-to-understand visual representation of the likely infectiousness score, such as a color code (e.g. green: unlikely to infect).
- When the app reports a score above a certain threshold, the subject goes into shelter-in-place or quarantine. (Legal questions about whether this is voluntary or legally required are out of scope for this discussion; certainly regulations such as “must be sheltered-in-place unless score is green” would be possible.)
- Before two (or more) people meet in person, they can agree on a maximum score that participants are allowed to have to be allowed to participate in the meeting. (Such a maximum score may also be legally mandated.) The participants in the meeting check each others’ scores before the meeting.
- Before a business admits a customer (or employee) onto the premises, they require the customer or employee to share their score. They will be denied access if the score is above a certain threshold. They may also deny access to those visitors who do not have, or are unwilling to display their score.
- When the user gets tested, they enable the testing provider to add the test results to their record so it can be used to calculate the score going forward.
- Depending on the implementation choices made, the mobile phone may need to be connected to the internet, to a local WiFi network and/or have Bluetooth on as sender or receiver or both.
Assumptions / challenges
- Test results can be brought into the system in a way that defeats tampering: we cannot allow a subject to fake negative test results, for example, or eliminate from consideration positive test results.
- Individuals may be tempted to fake their scores in order to enter a certain venue, for example, such as by displaying a static screen shot on their phone instead of their live score. Technical means (e.g. timestamping the display, or simultaneously broadcasting the score via wireless networking) can be employed to make this more difficult. This approach would also use technical means (e.g. public keys, app stores) to prevent “rogue apps” with false scores to participate.
- In a naive implementation, the entire record of each subject (e.g. the entire world population) would be centrally collected. This would create a privacy nightmare and enable substantial future harm from dangers that are not biological in nature. So we assume that the implementation would need to be performed in a fashion that does not have a central point of data collection.
- Location accuracy for this app is paramount. The absolute coordinates are less important; but relative coordinates between two subjects need to be determined as well as possible, as a distance of 2ft vs 8ft has substantially different impact on likelihood of transmission. This could be addressed with technical means (e.g. Bluetooth, NFC), user input (e.g. verify / enter into the app the people currently in close proximity) or a combination.
- The space in which an encounter occurs is highly relevant. For example, a 10 min contact at 6ft inside a small, enclosed space without ventilation has dramatically different transmission characteristics than contact of the the same duration and distance in open nature with a slight wind. This also could be addressed with technical means (e.g. mapping information), use input (e.g. enter into the app whether the surroundings are enclosed space, ventilated, open window, city street, open nature etc) or a combination.
Approach to Privacy

It appears possible to keep most information needed for the functioning of the system on individual users’ mobile phones without requiring a centralized data repository:
- The algorithm can run locally on local data.
- Detection of other people in the neighborhood can be performed via local wireless networking (e.g. WiFi, zeroconf, Bluetooth).
- The communication between mobile phones of people in an encounter to exchange scores can be performed using secure end-to-end encryption between the phones using any networking technology including through a centralized backend. This would not compromise privacy significantly.
- To trigger history rewrites in other phones, those connections to other phones can be remembered and re-activated (including identity / encryption keys). This may use some existing centralized communication network (e.g. instant messenger) or a decentralized alternative with a distributed hash table for lookup, for example.
- None of the functionality, or communications require more than pseudonymous identity. No centralized account, or identity verification is required, with the potential exception of entering verified testing results. However, in this case, the identity correlation remains local on the user’s device and is never shared beyond.
Public health reporting and management
- The app can report scores to the public health authorities, who have the ability to track actual – and best-guess estimates – of the spread of the disease in real time.
- For privacy reasons, scores do not need to be associated with other identifying attributes, although it may be advantageous to share demographic info such as age, and approximate (maybe rasterized) geographic location of the subject.
- Key parameters of the algorithm – e.g. thresholds for “acceptable” scores for certain activities – could be centrally updated by the public health authorities, in order to “shape” the progression of the disease in real time.
Algorithmic improvements
- The intentional distribution of data and computation, instead of centrally collecting it all, for privacy reasons, needs to be weighed against the need to continually debug, and improve the algorithm.
- To be able to understand the functioning of the algorithm in the field, and to make improvements, it appears sufficient to report the time histories of scores centrally, including rewritten histories. It does not appear necessary to identify the specific other people whose scores were used as input to the algorithm, nor the locations where encounters took place.
- Should more detailed information be required, collecting such more detailed information from a relatively small sample of volunteers should be sufficient.

Tech

Three Scenarios for Rolling Back Surveillance Capitalism

Scenario 1: Regulation Bites

Scenario 2: A Global Disinvestment Campaign Leads to a Vibrant Good Technology Market

Scenario 3: Frustrated Users and Open-Source Developers Start Cooperating for Mutual Benefit

Some final thoughts

We need best practice templates for tech governance (just like we have a library of open-source licenses)

Trust through Transparency for Apps

Virtual Unconference

New Privacy, Surveillance, Anonymity Podcast

Dynamic quarantine: a proposal for combatting COVID-19 with pinpointed action based on real-time information

The problem

The basic idea

Infectiousness score

Details

Extension to other forms of transmission

User experience

Assumptions / challenges

Approach to Privacy

Public health reporting and management

Algorithmic improvements