-
Notifications
You must be signed in to change notification settings - Fork 6
Description
I've been working to make zarrita work with async stores and using concurrency throughout the library. However, I think that many users will still want to use synchronous methods. So, the implementation uses only async methods internally and provides sync methods, by running the async methods an event loop on a separate thread and blocking the main thread (inspired by fsspec).
I am looking for feedback on how to design the API to accommodate both sync and async functions. Here are the main options that came to my mind:
1. Separate classes
The class methods create either sync or async variants of the Array class. Users need to decide upfront, whether to use async or sync methods.
# sync
a = zarrita.Array.create_sync(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))
assert isinstance(a, zarrita.ArraySync)
# async
a = await zarrita.Array.create_async(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)
await a[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a[:, :].get() # get
await a.reshape((10, 10))
assert isinstance(a, zarrita.Array)2. Separate methods and properties
Both sync and async methods are available through the same class. There are still separate create and create_async class methods because the creation of an array is async under the hood (i.e. writing metadata to storage).
# sync
a = zarrita.Array.create(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)
# async
a = await zarrita.Array.create_async(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)2a. Property-based async
This is a sync-first API, with the async methods available through the async_ property.
# sync
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))
# async
await a.async_[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a.async_[:, :].get() # get
await a.async_.reshape((10, 10))2b. Async methods
Similar to 2a, but with _async-suffixed async methods. This feels unpleasant, because the slice syntax [:, :] cannot be used.
# sync
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))
# async
await a.set_async((slice(None), slice(None)), np.ones((6, 10), dtype='int32')) # set
await a.get_async((slice(None), slice(None))) # get
await a.reshape_async((10, 10))3. Async-first API
Implemented through future objects. Inspired by tensorstore
# sync
a[:, :].set(np.ones((6, 10), dtype='int32')).result() # set
a[:, :].get().result() # get
a.reshape((10, 10)).result()
# async
await a[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a[:, :].get() # get
await a.reshape((10, 10))