Hey fellow nerds, I have an idea that Iā€™d like to discuss with you. All feedback ā€“ positive or negative ā€“ is welcome. Consider this a baby RFC (Request for Comments).

So. Iā€™ve been having a think on how to implement the right to be forgotten (one of the cornerstones of eg. the GDPR) in the context of federated services. Currently, itā€™s not possible to remove your comments, posts, etc., from the Fediverse and not just your ā€œhome instanceā€ without manually contacting every node in the network. in my opinion, this is a fairly pressing problem, and there would already be a GDPR case here if someone were to bring the ā€œeye of Sauronā€ (ie. a national data protection authority) upon us.

Please note that this is very much a draft and it does have some issues and downsides, some of which Iā€™ve outlined towards the end.

The problem

In a nutshell, the problem Iā€™m trying to solve is how to guarantee that ā€œwell-behavedā€ instances, which support this proposal, will delete user content even in the most common exceptional cases, such as changes in network topology, network errors, and server downtime. These are situations where youā€™d typically expect messages about content or user deletion to be lost. Itā€™s important to note that Iā€™ve specifically approached this from the ā€œright to be forgottenā€ perspective, so the current version of the proposal solely deals with ā€œmass deletionā€ when user accounts are deleted. It doesnā€™t currently integrate or work with the normal content deletion flow (Iā€™ll further discuss this below).

While I understand that in a federated or decentralized network itā€™s impossible to guarantee that your content will be deleted (and the Wayback Machine exists), but we canā€™t let ā€œperfect be the enemy of good enoughā€. Making a concerted effort to ensure that in most cases user content is deleted (initially this could even just be a Lemmy thing and not a wider Fediverse thing) from systems under our control when the user so wishes would already be a big step in the right direction.

I havenā€™t yet looked into ā€œprior artā€ except some very cursory searches and I had banged the outline of this proposal out before I even went looking, but I now know that eg. Mastodon has the ability to set TTLs on posts. This proposal is sort of adjacent and could be massaged a bit to support this on Lemmy (or whatever else service) too.

1. The proposal: TTLs on user content

  1. Every comment, post etc. (content) must by default have an associated TTL (eg. a live_until timestamp). This TTL can be long, on the order of weeks or even a couple of months. Users can also opt out (see below)
  2. well before the contentā€™s TTL runs out (eg. even halfway through the TTL, with some random jitter to prevent ā€œthundering herdsā€), an instance asks the ā€œhome instanceā€ of the user who created the content whether the user account is still live. If it is, great, update the TTL and go on with life
    1. in cases where the ā€œhome instanceā€ of a content creator canā€™t be reached due to eg. network problems, this ā€œliveness checkā€ must be repeated at random long-ish intervals (eg. every 20 ā€“ 30h) until an answer is gotten or the TTL runs out
    2. information about user liveness should be cached, but with a much shorter TTL than content
    3. liveness check requests to other instances should be batched, with some sensible time limit on how long to wait for the batch to fill up, and an upper limit for the batch size
    4. in cases where the userā€™s home instance isnā€™t in an instanceā€™s linked instance list or is in their blocked instance list, this liveness check may be skipped
  3. when a user liveness check hasnā€™t succeeded and a contentā€™s TTL runs out, or when a user liveness check specifically comes back as negative, the content must be deleted
    1. when a liveness check comes back as negative and the user has been removed, instances must delete the rest of that userā€™s content and not just the one whose TTL ran out
    2. when a liveness check fails (eg. the userā€™s home instance doesnā€™t respond), instances may delete the rest of that userā€™s content. Or maybe should? My reason for handling this differently from an explicit negative liveness check is to prevent the spurious deletion of all of a userā€™s content in cases where their home instance experiences a long outage, but Iā€™m not sure if this distinction really matters. Needs more thinkifying
  4. user accounts must have a TTL, on the order of several years
    1. when a user performs any activity on the instance, this TTL must be updated
    2. when this TTL runs out, the account must be deleted. The userā€™s content must be deleted if the user hasnā€™t opted out of the content deletion (see below)
    3. instances may eg. ping users via email to remind them about their account expiring before the TTL runs out
  5. users may opt out of the content deletion mechanism, both on a per-user basis or on a per-content basis
    1. if a user has opted out of the mechanism completely, their content must not be marked with a TTL. However, this does present a problem if they later change their mind

2. Advantages of this proposal

  1. guarantees that user content is deleted from ā€œwell behavedā€ instances, even in the face of changing network topologies when instances defederate or disappear, hiccups in message delivery, server uptime and so on
  2. would allow supporting Mastodon-like general content TTLs with a little modification, hence why it has TTLs per content and not just per user. Maybe something like a refresh_liveness boolean field on content that says whether an instance should do user liveness checks and refresh the contentā€™s TTL based on it or not?
  3. with some modification this probably could (and should) be made to work with and support the regular content deletion flow. Something for draft v0.2 in case this gets any traction?

3. Disadvantages of this proposal

  1. more network traffic, DB activity, and CPU usage, even during ā€œnormalā€ operation and not just when something gets deleted. Not a huge amount but the impact should probably be estimated so weā€™d have at least an idea of what itā€™d mean
    1. however, considering the nature of the problem, some extra work is to be expected
  2. as noted, the current form of this proposal does not support or work with the regular deletion flow for individual comments or posts, and only addresses the more drastic scenario when a user account is deleted or disappears
  3. spurious deletions of content are theoretically possible, although with long TTLs and persistent liveness check retries they shouldnā€™t happen except in rare cases. Whether this is actually a problem requires more thinkifying
  4. requires buy-in from the rest of the Fediverse as long as itā€™s not a protocol-level feature (and thereā€™s more protocols than just ActivityPub). This same disadvantage would naturally apply to all proposals that arenā€™t protocol-level. The end goal would definitely be to have this feature be a protocol thing and not just a Lemmy thing, but one step at a time
  5. need to deal with the case where a user opts out of having their content deleted when they delete their account (whether they did this for all of their content or specific posts/comments) and then alter changes their mind. Will have limitations, such as not having any effect on instances that are no longer federated with their home instance

3.1 ā€œItā€™s a feature, not a bugā€

  1. when an instance defederates or otherwise leaves the network, content from users on that instance will eventually disappear from instances no longer connected to its network. This is a feature: when you lose contact with an instance for a long time, you have to assume that itā€™s been ā€œlost at seaā€ to make sure that the usersā€™ right to forgotten is respected. As a side note, this would also help prune content from long gone instances
  2. content canā€™t be assumed to be forever. This is by design: in my opinon Lemmy shouldnā€™t try to be a permanent archive of all content, like the Wayback Machine
  3. content can be copied to eg. the Wayback Machine (as noted above), so you canā€™t actually guarantee deletion of all of a userā€™s content from the whole Internet. As noted in the problem statement this is absolutely true, but what Iā€™m looking for here is best effort to make sure content is deleted from compliant instances. Just because itā€™s impossible to guarantee total deletion of content from everywhere does not mean no effort at all should be made to delete it from places that are under our control
  4. this solution is more complex than simply actually deleting content when the user so wishes, instead of just hiding it from view like itā€™s done now in Lemmy. While ā€œtrue deletionā€ definitely needs to also be implemented, itā€™s not enough to guarantee eventual content deletion in cases like defederation, or network and server errors leading to an instance not getting the message about content or a user being deleted
  • bjornsno@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    Ā·
    1 year ago

    TTL on all content scales extremely poorly. You touch on this but I donā€™t think you appreciate just hope big of a SELECT * WHERE TTL ... this would be in just a few months/years. As an alternative, every instance sync should come with a list of newly deleted users. Retrying would not need to be reimplemented. If a user who wishes to be forgotten has had their home instance go dark, there will need to be a way for them to prove ownership over the original account (signup confirmation email perhaps) so a delete can be started from a foreign instance.