-
-
Notifications
You must be signed in to change notification settings - Fork 226
Description
Current state
Only the iOS and Java/Android SDKs have offline caching. Neither SDK is unified. This feature has been requested for .NET and JavaScript. There's a PR being reviewed for JS. This feature will be very useful to all SDKs that are not server-only, in other words, SDKs that can be used on apps like: desktop, mobile, games.
iOS
captureEvent
will always persist the event first, then attempt to send it (same instance, from memory) out to Sentry. On success, the event is deleted from disk immediately.
Event persistence: iOS apps are sandboxed. Sentry creates a folder on device. To account for jail break, a hash of the DSN is created to serve as the directory name. The number of files is caped to 10.
Java
captureEvent
will always store the event to disk and if sending the event is successful, the event is discarded from the buffer. A TTL is controlled via settings. The buffer is caped to 10 items by default.
Needs to be taken into account:
- Building folder name, account for
- Multiple instances of the same app
- In which case two instances of the SDK would be racing on the cached events
- Same limit to number of events shared across the apps
- Multiple apps with same DSN
- Again possible race condition, as above.
- Multiple instances of the same app
- Directory clean up
- If ephemeral information like pid is used, ensure clean up task accounts for that
- Happy to live with these edge cases:
- If caching is turned off, events would not be cleaned up on the next installation.
- If the DSN changed, old crashes/events will not be sent.
- Event TTL / Retry
- Max event queue depth (aka: Max cached events)
- Event size could vary. Capping it by size in bytes is an option but might be overkill.
- Don't attempt to send events older than N (90 days old data is deleted from sentry anyway)
- A fixed number of attempts could be defined to avoid bugs where we retry until expiry is reached
- Take the
retry-after
header into consideration - Every error code besides 429 (rate limiting) should drop the event and not re-queue it
- Max event queue depth (aka: Max cached events)
- A back pressure strategy needs to be in place
- The transport layer needs to ensure it'll slow down which is less ideal but might already be in place
- The worker needs a backoff strategy to avoid taking up resources on the client reattempting to connect.
- At least once should be attempted as opposed to at most once
- Get a non-connection timeout error before deleting the event.
OfflineCaching Mode:
Off: Never cache events
OnConnectionFailed: Cache events that failed to be delivered due to connectivity issues
Always: Write event to disk first, send later.
Relates #80