DBMStore #186

alimanfoo · 2017-11-16T01:58:23Z

This PR adds a DBMStore class, which is a compatibility wrapper around any DBM-style database object, which includes the DBM-style objects available from standard library as well as Berkeley DB and more. Resolves #133.

alimanfoo · 2017-11-16T10:01:16Z

Added tests against berkeleydb. These are run in travis (linux) only, I don't think it's worth trying to get bsddb3 built on appveyor.

alimanfoo · 2017-11-16T10:15:23Z

OK, I think this is ready to go.

alimanfoo · 2017-11-16T16:15:29Z

Here's an example using Berkeley DB B-tree:

In [1]: import zarr

In [2]: import bsddb3

In [3]: store = zarr.DBMStore('example.bdb', open=bsddb3.btopen)

In [4]: grp = zarr.group(store)

In [5]: z = grp.create_dataset('foo', shape=100000000, dtype='i8')

In [7]: import numpy as np

In [8]: z[:] = np.arange(z.shape[0])

In [9]: z[:]
Out[9]: array([       0,        1,        2, ..., 99999997, 99999998, 99999999])

In [10]: store.close()

cc @jeromekelleher, @jakirkham - this PR adds support for storing data in any DBM-style database, including Berkeley DB. Should provide an alternative to zip files, without the issues around replacing existing entries. I haven't figured out if/how this works under parallel reads or parallel writes to an array, I know Berkeley DB supports various concurrency options but I don't know which is enabled by default or which is most appropriate for use with zarr. In any case all the tests pass so I will probably merge this and add some caveats to the docs around unknowns for parallel usage. Would be interested if you do try it.

jakirkham · 2017-11-16T16:32:33Z

Sounds like a good idea. Don't have time to review it, but do like the idea of having this option.

alimanfoo · 2017-11-16T17:21:05Z

No problem, just thought you'd be interested.

…

On Thu, Nov 16, 2017 at 4:32 PM, jakirkham ***@***.***> wrote: Sounds like a good idea. Don't have time to review it, but do like the idea of having this option. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/alimanfoo/zarr/pull/186#issuecomment-344979364>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8QuLmuHbxX_Ndxma4qntdhjk9hBKKks5s3GOhgaJpZM4Qf2aI> .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: [email protected] Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

jakirkham · 2017-11-16T19:54:31Z

Indeed. It is interesting. Thanks for the ping. :)

jeromekelleher · 2017-11-17T09:27:08Z

Thanks @alimanfoo, I'll have a play with this when I get a chance and let you know how it goes.

jeromekelleher · 2017-11-17T14:29:49Z

FYI, I'm trying this out instead of Zip containers. Working great so far!

alimanfoo · 2017-11-17T14:48:02Z

Cool! Have you tried any concurrent reads or writes? From what I've been able to glean so far, there are various different locking modes supported internally within Berkeley DB, but it's not immediately obvious how to use them via bsddb3 Python API, and it would be great to know which (if any) should be initialised when using with zarr if you're expecting to do concurrent reads or concurrent writes to an array.

…

On Fri, Nov 17, 2017 at 2:29 PM, Jerome Kelleher ***@***.***> wrote: FYI, I'm trying this out instead of Zip containers. Working great so far! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/alimanfoo/zarr/pull/186#issuecomment-345258451>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8QkKuWktqqC4iBDVkbeCi4JFH4Ydfks5s3ZhdgaJpZM4Qf2aI> .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: [email protected] Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

jeromekelleher · 2017-11-17T14:55:47Z

Yeah, doing both concurrent reads and writes and it seems to work fine. Not sure how bsddb3 supports concurrency in BDB. It's pretty solid once it goes through the correct DBEnv object though as far as I know.

alimanfoo · 2017-11-17T15:16:12Z

Good to know. FWIW it looks like if you use one of the shortcut functions like bsddb3.btopen then it uses a DBEnv with the locking subsystem initialized (DB_INIT_LOCK). I gather that means it is safe to attempt concurrent writes, as long as there's some way to detect deadlocks, which it looks like bsddb3 tries to do (_DeadlockWrap is used everywhere). The next interesting question is whether you do manage to get some concurrent throughput, i.e., you see multiple CPU utilisation while doing concurrent writes, or whether the database locking subsystem is preventing that at all.

jeromekelleher · 2017-11-17T15:26:55Z

It's hard to know in my case as I'm doing a lot of compression with concurrent writes, so that's dominating my CPU time. I have 4 cores doing compression, and one core feeding them and it's all looking like it should. As far as I can tell the locking is pretty fine grained and allowing everything to go ahead pretty nicely.

alimanfoo · 2017-11-17T16:35:05Z

That's great to know, thanks.

…

On Fri, Nov 17, 2017 at 3:26 PM, Jerome Kelleher ***@***.***> wrote: It's hard to know in my case as I'm doing a lot of compression with concurrent writes, so that's dominating my CPU time. I have 4 cores doing compression, and one core feeding them and it's all looking like it should. As far as I can tell the locking is pretty fine grained and allowing everything to go ahead pretty nicely. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/alimanfoo/zarr/pull/186#issuecomment-345274280>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8Qik-zIFEIWXiuWDca4G_kayTCJzjks5s3aW_gaJpZM4Qf2aI> .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: [email protected] Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

alimanfoo added this to the v2.2 milestone Nov 16, 2017

alimanfoo added 4 commits November 16, 2017 02:09

add support for DBM databases

ff6ebea

dbm py2 compat

5bc73d6

doco and reduce line length

a144c27

flake8

f1fb432

alimanfoo force-pushed the dbm branch from d2c7f62 to f1fb432 Compare November 16, 2017 02:09

alimanfoo added 3 commits November 16, 2017 09:14

try testing against berkeley

5b20d54

test bsddb on linux only

0267009

test coverage

b1dc0a3

alimanfoo merged commit 0a0fb1a into master Nov 16, 2017

alimanfoo deleted the dbm branch November 16, 2017 20:31

alimanfoo mentioned this pull request Nov 16, 2017

WIP: Zarr backend pydata/xarray#1528

Merged

4 tasks

alimanfoo added enhancement New features or improvements release notes done Automatically applied to PRs which have release notes. labels Nov 20, 2017

jakirkham mentioned this pull request Jul 15, 2023

remove bsddb3 #1464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DBMStore #186

DBMStore #186

Uh oh!

alimanfoo commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017

Uh oh!

jakirkham commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017 via email

Uh oh!

jakirkham commented Nov 16, 2017

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

alimanfoo commented Nov 17, 2017 via email

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

alimanfoo commented Nov 17, 2017

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

alimanfoo commented Nov 17, 2017 via email

Uh oh!

Uh oh!

Uh oh!

DBMStore #186

DBMStore #186

Uh oh!

Conversation

alimanfoo commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017

Uh oh!

jakirkham commented Nov 16, 2017

Uh oh!

alimanfoo commented Nov 16, 2017 via email

Uh oh!

jakirkham commented Nov 16, 2017

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

alimanfoo commented Nov 17, 2017 via email

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

alimanfoo commented Nov 17, 2017

Uh oh!

jeromekelleher commented Nov 17, 2017

Uh oh!

alimanfoo commented Nov 17, 2017 via email

Uh oh!

Uh oh!