Skip to content

Conversation

@zakons
Copy link
Contributor

@zakons zakons commented Jan 12, 2018

Change to have Cell store the microseconds from the Cell protobuf and to use a property annotation to get the timestamp as a datetime, when requested. This moves the performance penalty to only the code which needs to access this timestamp, which may actually be a small amount of code. There is better than a 5% performance improvement for reading rows with 10 cells. See Issue #4714.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 12, 2018
Copy link
Contributor

@theacodes theacodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I'd like @tseaver or @dhermes to also give it a quick glance.

@chemelnucfin chemelnucfin added api: bigtable Issues related to the Bigtable API. performance type: process A process-related concern. May include testing, release, or the like. labels Jan 15, 2018
@tseaver
Copy link
Contributor

tseaver commented Jan 16, 2018

A couple of questions:

  • Do we need to consider backward-compatibility? E.g., what is the chance that client code is manually constructing Cell instances, passing timestamps? @sduskis do you have a sense of that likelihood?
  • Should the micros->timestamp calculation be memoized?

@zakons
Copy link
Contributor Author

zakons commented Jan 17, 2018

@tseaver the code that a client uses to create a cell calls the following method on row.py:

    def _set_cell(self, column_family_id, column, value, timestamp=None,
                  state=None):
        column = _to_bytes(column)
        if isinstance(value, six.integer_types):
            value = _PACK_I64(value)
        value = _to_bytes(value)
        if timestamp is None:
            # Use -1 for current Bigtable server time.
            timestamp_micros = -1
        else:
            timestamp_micros = _microseconds_from_datetime(timestamp)
            # Truncate to millisecond granularity.
            timestamp_micros -= (timestamp_micros % 1000)

        mutation_val = data_v2_pb2.Mutation.SetCell(
            family_name=column_family_id,
            column_qualifier=column,
            timestamp_micros=timestamp_micros,
            value=value,
        )
        mutation_pb = data_v2_pb2.Mutation(set_cell=mutation_val)
        self._get_mutations(state).append(mutation_pb)

SetCell is a Mutation type and all changes to BigTable go through Mutations. At the Mutation level the timestamp is expressed in units of microseconds. So the Cell class is really a read only class - there is no ORM magic going on for the persistence. ;-)

I hope this answers your first question above.

As far as memoization, I thought of that, but suspect that the timestamp as a datetime type will be read once.

@zakons
Copy link
Contributor Author

zakons commented Jan 18, 2018

Can this be merged now?

@tseaver
Copy link
Contributor

tseaver commented Jan 18, 2018

@zakons I'm fine to merge if @sduskis doesn't veto it before tomorrow.

@sduskis
Copy link
Contributor

sduskis commented Jan 19, 2018

LGTM

@tseaver tseaver merged commit 1ef5db6 into googleapis:master Jan 19, 2018
@tseaver
Copy link
Contributor

tseaver commented Jan 19, 2018

@zakons Thanks for the patch!

@chemelnucfin
Copy link
Contributor

chemelnucfin commented Jan 19, 2018

@tseaver Looks like something is wrong in the system tests. Looking into it. But feel free to find the mistake(s) if you see it first.

@chemelnucfin
Copy link
Contributor

@tseaver looks like I found the cause and it's not a big deal. I'll check again to make sure but I'll submit a PR for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable Issues related to the Bigtable API. cla: yes This human has signed the Contributor License Agreement. performance type: process A process-related concern. May include testing, release, or the like.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants