Skip to content

Offline caching implementation #280

@bruno-garcia

Description

@bruno-garcia

Current state

Only the iOS and Java/Android SDKs have offline caching. Neither SDK is unified. This feature has been requested for .NET and JavaScript. There's a PR being reviewed for JS. This feature will be very useful to all SDKs that are not server-only, in other words, SDKs that can be used on apps like: desktop, mobile, games.

iOS

captureEvent will always persist the event first, then attempt to send it (same instance, from memory) out to Sentry. On success, the event is deleted from disk immediately.

Event persistence: iOS apps are sandboxed. Sentry creates a folder on device. To account for jail break, a hash of the DSN is created to serve as the directory name. The number of files is caped to 10.

Java

captureEvent will always store the event to disk and if sending the event is successful, the event is discarded from the buffer. A TTL is controlled via settings. The buffer is caped to 10 items by default.

Needs to be taken into account:

  • Building folder name, account for
    • Multiple instances of the same app
      • In which case two instances of the SDK would be racing on the cached events
      • Same limit to number of events shared across the apps
    • Multiple apps with same DSN
      • Again possible race condition, as above.
  • Directory clean up
    • If ephemeral information like pid is used, ensure clean up task accounts for that
    • Happy to live with these edge cases:
      • If caching is turned off, events would not be cleaned up on the next installation.
      • If the DSN changed, old crashes/events will not be sent.
  • Event TTL / Retry
    • Max event queue depth (aka: Max cached events)
      • Event size could vary. Capping it by size in bytes is an option but might be overkill.
    • Don't attempt to send events older than N (90 days old data is deleted from sentry anyway)
    • A fixed number of attempts could be defined to avoid bugs where we retry until expiry is reached
    • Take the retry-after header into consideration
    • Every error code besides 429 (rate limiting) should drop the event and not re-queue it
  • A back pressure strategy needs to be in place
    • The transport layer needs to ensure it'll slow down which is less ideal but might already be in place
    • The worker needs a backoff strategy to avoid taking up resources on the client reattempting to connect.
  • At least once should be attempted as opposed to at most once
    • Get a non-connection timeout error before deleting the event.

OfflineCaching Mode:

Off: Never cache events

OnConnectionFailed: Cache events that failed to be delivered due to connectivity issues

Always: Write event to disk first, send later.

Relates #80

Metadata

Metadata

Assignees

No one assigned

    Labels

    FeatureNew feature or request

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions