BigTable: Cell.from_pb() performance improvement #4745

zakons · 2018-01-12T05:30:05Z

Change to have Cell store the microseconds from the Cell protobuf and to use a property annotation to get the timestamp as a datetime, when requested. This moves the performance penalty to only the code which needs to access this timestamp, which may actually be a small amount of code. There is better than a 5% performance improvement for reading rows with 10 cells. See Issue #4714.

theacodes

This LGTM, but I'd like @tseaver or @dhermes to also give it a quick glance.

tseaver · 2018-01-16T16:20:23Z

A couple of questions:

Do we need to consider backward-compatibility? E.g., what is the chance that client code is manually constructing Cell instances, passing timestamps? @sduskis do you have a sense of that likelihood?
Should the micros->timestamp calculation be memoized?

zakons · 2018-01-17T05:57:30Z

@tseaver the code that a client uses to create a cell calls the following method on row.py:

    def _set_cell(self, column_family_id, column, value, timestamp=None,
                  state=None):
        column = _to_bytes(column)
        if isinstance(value, six.integer_types):
            value = _PACK_I64(value)
        value = _to_bytes(value)
        if timestamp is None:
            # Use -1 for current Bigtable server time.
            timestamp_micros = -1
        else:
            timestamp_micros = _microseconds_from_datetime(timestamp)
            # Truncate to millisecond granularity.
            timestamp_micros -= (timestamp_micros % 1000)

        mutation_val = data_v2_pb2.Mutation.SetCell(
            family_name=column_family_id,
            column_qualifier=column,
            timestamp_micros=timestamp_micros,
            value=value,
        )
        mutation_pb = data_v2_pb2.Mutation(set_cell=mutation_val)
        self._get_mutations(state).append(mutation_pb)

SetCell is a Mutation type and all changes to BigTable go through Mutations. At the Mutation level the timestamp is expressed in units of microseconds. So the Cell class is really a read only class - there is no ORM magic going on for the persistence. ;-)

I hope this answers your first question above.

As far as memoization, I thought of that, but suspect that the timestamp as a datetime type will be read once.

zakons · 2018-01-18T04:49:39Z

Can this be merged now?

tseaver · 2018-01-18T18:56:05Z

@zakons I'm fine to merge if @sduskis doesn't veto it before tomorrow.

sduskis · 2018-01-19T13:22:41Z

LGTM

tseaver · 2018-01-19T13:47:15Z

@zakons Thanks for the patch!

chemelnucfin · 2018-01-19T20:55:15Z

@tseaver Looks like something is wrong in the system tests. Looking into it. But feel free to find the mistake(s) if you see it first.

chemelnucfin · 2018-01-19T21:16:58Z

@tseaver looks like I found the cause and it's not a big deal. I'll check again to make sure but I'll submit a PR for it.

zakons added 4 commits January 10, 2018 00:13

Performance enhancement for Cell timestamps.

2b9876b

Unit test case fixups.

588b057

Unit test case fixup timestamp_micros.

1943d72

Line length.

bd2c79c

zakons requested a review from lukesneeringer as a code owner January 12, 2018 05:30

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 12, 2018

theacodes approved these changes Jan 12, 2018

View reviewed changes

zakons mentioned this pull request Jan 14, 2018

BigTable: Cell.from_pb() performance improvement #4714

Closed

chemelnucfin assigned tseaver, dhermes and chemelnucfin Jan 15, 2018

chemelnucfin added api: bigtable Issues related to the Bigtable API. performance type: process A process-related concern. May include testing, release, or the like. labels Jan 15, 2018

tseaver merged commit 1ef5db6 into googleapis:master Jan 19, 2018

zakons deleted the feature/timestamp_performance branch January 23, 2018 03:17

tseaver mentioned this pull request May 31, 2018

Updated Happy Base framework to the latest version of Bigtable 0.29.0 googleapis/google-cloud-python-happybase#36

Merged

theacodes unassigned dhermes and chemelnucfin Sep 28, 2018

parthea pushed a commit that referenced this pull request Nov 22, 2025

BigTable: Cell.from_pb() performance improvement (#4745)

32235a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BigTable: Cell.from_pb() performance improvement #4745

BigTable: Cell.from_pb() performance improvement #4745

Uh oh!

zakons commented Jan 12, 2018 •

edited

Loading

Uh oh!

theacodes left a comment

Uh oh!

tseaver commented Jan 16, 2018

Uh oh!

zakons commented Jan 17, 2018

Uh oh!

zakons commented Jan 18, 2018

Uh oh!

tseaver commented Jan 18, 2018

Uh oh!

sduskis commented Jan 19, 2018

Uh oh!

tseaver commented Jan 19, 2018

Uh oh!

chemelnucfin commented Jan 19, 2018 •

edited

Loading

Uh oh!

chemelnucfin commented Jan 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

BigTable: Cell.from_pb() performance improvement #4745

BigTable: Cell.from_pb() performance improvement #4745

Uh oh!

Conversation

zakons commented Jan 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theacodes left a comment

Choose a reason for hiding this comment

Uh oh!

tseaver commented Jan 16, 2018

Uh oh!

zakons commented Jan 17, 2018

Uh oh!

zakons commented Jan 18, 2018

Uh oh!

tseaver commented Jan 18, 2018

Uh oh!

sduskis commented Jan 19, 2018

Uh oh!

tseaver commented Jan 19, 2018

Uh oh!

chemelnucfin commented Jan 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chemelnucfin commented Jan 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zakons commented Jan 12, 2018 •

edited

Loading

chemelnucfin commented Jan 19, 2018 •

edited

Loading