loaf is an email filtering verification system from Maciej and a partner, in early days yet. For the record, I love the ‘cantbedone.org’ URL.

I nearly did not blog this until I realized the underlying concept bothered me, and that I could explain why, in non-technical terms. It also fits broadly into my theme for the day: identity is the face we choose to show others, and privacy is the area of concerns that arise when that identity is challenged for one reason or another. Frustratingly, I’m in a hurry, and so I’m going to have to cover this very broadly and I hope I don’t misrepresent anything or mis-state a fact. If I do, I’ll clean it up as soon as I am aware of it.

The way that Loaf is described as working: an encrypted (or disguised, or hashed, at any rate it’s not human readable) copy of your whole email address book is appended to each one of your outbound email messages. When it’s recieved and parsed by another Loaf-using email system, the sender (you) is rated based, essentially, on your degree of familiarity to the recipient (or really, of course, to Loaf). The more familiar you are, the likelier it is that your message will get through.

It’s a pretty neat idea, and I can’t think of any reason, functionally, why this would be problematic.

However, I think there is a very good reason to mistrust the concept. It’s based on both legal approaches to privacy and ethical concerns underlying them. Forgive me a moment of digression.

Generally speaking, in the US, legal guidelines for organizations that gather and manage personally identifiable information (PII) are required to follow a specific set of practices with regard to how that information is gathered, stored, and made accessible for correction or deletion to the initial source of that data, generally the consumer. An example of that is COPPA, which is a law that effectively requires online data gatherers to either collect no PII from children under 13 or to ensure that parental permission has been granted for that data to be gathered.

It’s my opinion that the PII is the property of the consumer and that there is an ethical obligation to the consumer to permit some level of error-correction feedback mechanism. Additionally, there is an obligation on the part of the data maintainer to follow a ‘best-practices’ level of security with regards to the data, and practices which allow the data to move to a different organization with different privacy practices, while legal, are frowned upon. Of course, such data transfers happen all the time, notably in corporate acquisitions.

In practice, the response of most commercial organizations has been based on a desire to minimize the ancillary data-management costs of PII while making every effort to allow that data to be utilized within the business. It’s effectively a business asset, and as such is percieved as adding value to the organization. Thus your level of access to the data may be limited to writing a letter to the company to request that your record be deleted.

This is unsatisfactory for any number of reasons; adding to the problems with the current approach are the rumblings we hear about the possibility that data collections and methodologies may become available for proprietary protection under U.S. intellectual property law. This may mean, for example, that if in the context of a discussion of privacy management methodology I cited a sample record – or the structure of a specific PII database – I might be in violation of a proprietary concept or data object. But I’ll leave that bone for the EFF to worry at the moment, as vexing as it is.

Returning to Loaf: the concept relies on individual email users exposing their email address books to anyone they send email to. That information may or may not be unpackable to reconstitute the specific PII it contains in a way which is maliciously or unethically useful. From the lack of absolute language on the descriptive page I link to above, I’d be very surprised if it was impossible to do so.

Moreover, by deliberately placing the PII into a sharing-oriented environment, the strategy violates the legal and ethical guidelines I just sketched (however fuzzy my sketch might be), primarily by sharing a specifc element of that PII (your correspondent’s email address).

Therefore, it will be very difficult to deploy any solution based on this approach into commercial organizations that have been working to ensure compliance with the guidelines and regulations.

I am by no means an expert either in the sort of programming that Maciej (a good guy, by all accounts, and a hell of an online writer to boot) does, or, honestly, in online privacy. I do think that I have raised some valid points for discussion. I hope that Maciej or his partner can take the time to address them.

2 thoughts on “A slice of privacy

  1. This is a fascinating topic and one that I think will crop up again and again with social software stuff. For want of true insight, I’ll post some fake insight.

    The core difficulty seems to be deciding who the edges in a social network belong to. We can all agree about information that pertains to nodes – my SSN is a piece of information that describes me, just like my name, and address, and credit card number do. But if I send Mike an email, the fact that it was sent somehow belongs to both of us – there’s no way Mike can tell you he got a letter from me without compromising his privacy, and I can’t say that I wrote to him without revealing something about Mike. We have social mechanisms for dealing with this, since people have been having conversations for as long as we’ve been human, but I have no idea how it can map on a legal or regulatory context.

    In the absence of stunning immediate insight, I can offer a little more background information to frame the debate. First, while we avoid the term ‘unbreakable’ on the Loaf page to avoid setting off crank detectors, there really is no way to get emails back out of a Bloom filter once they are hashed in. The algorithm itself is lossy, so there is no way even in principle to extract the data.

    That said, the filter allows you the opportunity to ask ‘yes/no’ questions with a user-configurable level of accuracy. So you could construct a Bloom filter for yourself that would have a false positive rate of 0.0000000001%, or one that had a false positive rate of 50% (it would claim to have seen every other email address you asked it about) or anything in between. This just exacerbates the regulatory problem, because it introduces gradations of privacy leakage, which, if you will excuse a horrible pun, makes a complete hash of things.

  2. erm.

    it’s late and I yam drinkied.

    Macsz (if I may) sez:

    a) sending email violates extant privacy guidelines but the edxtant guidelines don’t meet basic use reqs: “there’s no way Mike can tell you he got a letter from me without compromising his privacy, and I can’t say that I wrote to him without revealing something about Mike.” I’m not sure I agree with this as either idea or statement. More after sleep.

    b) I think he b’leeves the Bloom filter representations of that data to be assuredly secure (to invent a security term, I think): “there really is no way to get emails back out of a Bloom filter once they are hashed in. The algorithm itself is lossy, so there is no way even in principle to extract the data.”

    c) none of these defenses necessarily resolves the ‘regulatory problem.’

    more when conscious, hopefully demanding focused responses.

    (do you have any idea how tricky it is to type responses to this sort of thing on an ad hoc basis? Mondays offer some obvious advantages.)

Comments are now closed.