Skip to content

Provide mechanism to unwedge wedged Olm sessions #7428

@ara4n

Description

@ara4n

One remaining cause of UISIs is that if an Olm session gets out of sync due to a device being cloned (#3822) or due to missing OTKs (#3309), there is no way to recover short of relogging in.

Splitting this out from #2996, @richvdh said:

Yes, we can certainly improve this. We can give the user feedback about failing to decrypt to_device messages (though they tend to get replayed at initial sync, so we'd have to think how to avoid false positives). element-hq/riot-android#800 randomly, covers that. If we can get it reliable, we can start a new Olm session to try and unwedge things. We can also consider giving better feedback from the sender's end (#2494).

(The reason that to_device messages may get replayed on initial sync is because that they will include the messages from the most recent incr sync (which haven't otherwise been acknowledged)).

So, better late than never, i'm wondering what we can do to fix this.

  • Whatever, we need to provide some kind of feedback mechanism to the sender when we haven't been able to decrypt their messages, when we think they should be be able to start a new Olm session and try again. (Or can we just reset the Olm session from our side somehow?)
  • We should probably prompt the user to confirm if we should reestablish the Olm session with the sender, as we might want to first check OOB if the sender's device has been cloned by an attacker. We might want to make this a paranoid-mode only UX though. Also, any other bug which causes a wedged Olm session would get lumped in with the "check for clones" warning, which could cause undue panic if there's no evidence of cloning.
  • I wonder if can probably get away with ignoring undecryptable to-device messages on initial sync, and rely on 'live' undecryptable ones arriving via incremental sync to trigger the recovery mechanism. Alternatively, i guess we'd need to look at seqnums for to-device messages or similar to filter out duplicates on the receiving client.

What I'm most unclear about here is whether the new Olm session would be started by the sender or the receiver, and whether there's actually a good reason not to do this transparently on error.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions