Comments and questions on the JLINC protocol for Information Sharing Agreements

My friends Victor and Jim at JLINC have published a set of technical documents that show how to implement “Information Sharing Agreements” – contractual agreements between two parties, where one party receives information, such as personal information, from the other party and commits to only use the received data in accordance with the agreement.

This is basically a respectful, empowering form of today’s widespread, one-sided “I must consent to anything” click-through agreement every website forces us to sign. It’s respectful because:

  • it is negotiated, rather than unilaterally imposed as it is the default on the internet today;
  • the existence of the agreement, and which parties it binds, can be cryptographically proven by both parties;
  • there’s a full audit log on both sides, and so it would be difficult to “wiggle out of” the agreement;
  • it can’t be unilaterally changed after the fact, only terminated.

So as I read through the documents, I had some questions, and as usual, I blog them :-) in random sequence. I will add answers to this post as I find out about them.

  • Q: Why is a separate DID method required? I don’t quite understand what is unique about JLINC DIDs that are forms of DIDs can’t do, too.

  • Q: To create a JLINC DID, I need to post something to which URL? The spec says /register but doesn’t identify a hostname. Can it be any? Or is that intended to be a centralized service, perhaps run by JLINC, the company?

  • Q: How do the identifiers that the two parties use for the JLINC protocol relate to identifiers they may use for other types of interaction, e.g. some other protocols within in the decentralized / self-sovereign identity universe? Is a given user supposed to have a variety of them for different purposes?

  • Q: Why is a ledger and its associated ledger provider required? (Actually, maybe it is optional. But the spec says “may submit it to a Ledger of their choice to establish non-repudiation”, so that implies the ledger is required for that purpose.)

  • Q: There is already a previousId in each exchange. Wouldn’t that be sufficient for non-repudiation if the two parties keep their own records?

  • Q: There is also the role of an “audit provider”. How is it different from a “ledger provider”? And if it is, why do we need both?

  • Q: Are, by virtue of the ledger, the Information Sharing Agreements themselves, essentially public or at least leaked to an uninvolved third party? Can I use JLINC to privately agree on an Information Sharing Agreement without telling others about it? If so, what functionality do I lose?

  • Q: When an AgreementURI is used to merely point to the legal text that defines the agreement, rather than incorporating it into the exchanged JSON, would it make sense to also at least include a hash of the agreement text? That way, a party cannot so easily wiggle out of the agreement by causing the hoster of the agreement text to make modifications, or claim to have agreed to a different version of the agreement.

  • Q: There’s a field descendedFrom in various examples, which isn’t documented and is always the text string null. What might that be for?

  • Q: How would a permissionEvent work in practice? Wouldn’t that require the underlying legal text to change? Is there a description somewhere?

  • Q: Could one use JLINC to govern data that’s much longer, or much more complex, than the typical small set of name-value pairs used for user registration data on consumer websites? Can I use it, say, for the first chapter of my Great American Novel I am sending to a publisher, permitting them to only read it themselves but not publish it yet, or to send my MRIs to a new doctor?

  • Q: In a successful relationship between a Me and a B, to use the Me2B Alliance’s terminology, it appears that the “data kimono” is gradually opened by the Me to the B. For example, the Me may first visit a website without an account, then register (and provide their name and e-mail address) and a month later, buy something (which requires a shipping address and a credit card number, but only until the purchase is delivered and the data can be deleted again). In the JLINC world, does this require a different Information Sharing Agreement on each step? (particularly for the deletion after shipment?)

Want to buy an aged Twitter account?

From a spam e-mail:

Aged Twitter 2009 to 2015 Accounts For Sale - check new thread for new prices

The accounts are empty or with less than 50 followers.

2008 - 10$ Per Account
2009 - 9$ Per Account
2010 - 8$ Per Account
2011 - 7$ Per Account
2012 - 6$ Per Account
2013 - 5$ Per Account
2014 - 4$ Per Account
2015 - 3$ Per Account

Assuming those accounts actually exist, I can think of some political maneuverers who would likely be interested. I’m a bit surprised at the prices.

Downloading all your data and new security risks

I’ve been playing around with the new data download features major on-line providers like Twitter, Facebook or Google have been forced to provide to us Californians since January 1, 2020, under the California Consumer Privacy Act.

It’s amazing what kinds of data they have. For example, from the Facebook download I learned that dozens of car dealerships all over the country (like, say, in Texas, where I definitely have never gone car shopping) have my name and address. How – I have no idea.

But speak about putting all your eggs in one basket. In Google’s case, a single ZIP file contains all your e-mail over a decade or more, your pictures, your private messages, your location history – everything you ever used any Google product for, and many things you never thought Google recorded about you.

If this one file fell into the hands of somebody nefarious, you’d probably be in serious trouble – from possible financial fraud to blackmail on multiple levels, in particular in less-liberal countries, of which there are unfortunately more and more. The trouble would likely be much bigger than if somebody “merely” logged into your account: because all the info is there in one place, you don’t have to look for it, you can write scripts against it and immediately analyze it.

As Andrew Carnegie, and then Mark Twain, said: “And Then Watch That Basket!”. The trouble is … do we? I mean … before I started jotting down this post, I don’t recall having seen a single discussion of this threat anywhere, and I usually pay attention to this kind of thing.

It’s hard to secure that kind of access. To be sure, I’m all in favor of me and you being able to know every last bit of what big companies record about us, and get that data and use it somewhere else. But that power sure comes with a lot of potential dangers.

I fully expect a wave of “GDPR” and “CCPA attacks” to occur, all focused on getting your full archive from major service providers and “monetizing” this in various ways, plus enabling whatever any secret police in some jurisdiction – and I use the word “juris diction” loosely here – can come up with.

What’s the alternative? Well, those service providers not having all that data about me in the first place! Instead, they should only be “borrowing” it from me; well, the parts they need for something I agree to for as long as they actually need it. Then, no bulk upload or download is necessary, and we don’t have this high-risk security problem in the first place.

Can you trust FaceBook? Who paid $40m for overstating their numbers by up to 900%?

Sometimes we think it’s all overstated and a matter of opinion. Surely they can’t be so untrustworthy, otherwise they wouldn’t have a business?

Well, this article with link to the settlement describes what Facebook calls an “error” in calculating how much time viewers spent watching certain videos on Facebook. As a result of which, I’m told, all sorts of media outlets moved their videos from YouTube to Facebook, and then promptly imploded because the numbers of views was much lower than expected.

Between 150% and 900%. The error, of course, was in the “up” direction.

Read.

Personal Data Organization Landscape

Personal data is becoming a thing in 2020. Not just startups, but also not-for-profit organiations have been popping up everywhere … by some count, there are now literally hundreds (!) that are involved in it somehow. It’s hard not to get lost.

To order my own thoughts, and for the purposes of some organizations that I’m involved in, I’ve been working on a 2x2 or 3x3-style matrix diagram that similar to what many startups are using to position themselves with potential investors.

Here is my currently best draft. Would love your feedback and ideas!

The first question is: if we can only pick two axes by which to classify organizations, which are the most important ones? I’ve picked:

  • whether organizations are for-profit, or help building the commons. This is obviously a big difference. I’m also distinguishing between organizations working broadly across the space, or focused on a particular aspect of it.
  • who is the primary customer of the organization? As personal data touches both individuals and businesses (or Me’s and B’s as we call it in Me2BA), organizations might focus on either, or both, and that has many practical differences just like B2B and B2C businesses are different. For this diagram, “customers” mean the entities that provide money to the organization, through membership fees, or who buy the product or services. (They may also have benefits for the other side, but that’s probably common for most of them so it’s not shown here.)

Now, let’s put some example organizations into the diagram and see how they fit.

  • MyData has both business and individual members. It is a very broad umbrella organization, and so I am putting it stretching from B to Me, with a somewhat fuzzy border between general and specific focus. (One could have specific tendrils going out and up in the diagram, like the recent MyData Operators group.)
  • The Sovrin Foundation, which runs the epynomous digital identity network, has only businesses as its members, so I put them on the left. It is focused on something much more specific than, say, MyData, and so it goes further up in the diagram.
  • Customer Commons is an advocacy organization for (non-business) customers. It has a specific focus in mission and only focuses on consumers, thus it is in the middle on the right.
  • The Me2B Alliance strives to improve the relationship between businesses and individuals, and – once its membership structure is fully defined – will have both business and individual members, so I put it in the middle.

Does this make sense?