Skip to content

Bigtable API not at feature parity with HBase #16

@dhermes

Description

@dhermes

I "discovered" some issues when implementing the happybase functionality on top of the Bigtable API. (I put discovered in quotes, because some of the issues may just be that I don't grok how to do the same thing with the Bigtable API).

These were mostly discovered because I wrote a system test for happybase that could work both with HBase and with the Bigtable backend. It can be switched from one to another by changing the USING_HBASE boolean.

Many other differences have been enumerated in the documentation for our custom Bigtable happybase package.


Issues / Differences

  • When committing a batch of mutations, the happybase method Batch.send() uses Thrift/HBase's mutateRows / mutateRowsTs method to send all mutations at once. With the Bigtable API, this is not possible, we have to commit row-by-row. (This comes up in the system test as well.)
  • Bigtable Garbage Collection is not as immediate as HBase. In HBase, a column with one max_version immediately evicts the old value when a new one is added. Similarly, with a TTL of 3 seconds, after sleeping for 3.5 seconds, the value has been evicted. Neither of these occur (at least consistently in Bigtable). (I don't really see this as a problem, but users from HBase may have different expectations)
  • A row scan with sorted_columns is not possible in Bigtable.
  • Using HBase filter string is not possible in Bigtable. (Also some of the filter string concepts don't map to Bigtable filters, e.g. KeyOnlyFilter)
  • The Bigtable Mutation.DeleteFromRow mutation does not support timestamps (also). Even attempting to send one conditionally (via CheckAndMutateRowRequest) deletes the entire row.
  • Bigtable can't use a timestamp with column families since Mutation.DeleteFromFamily does not include a timestamp range.

Differences that are Upgrades

  • Writes to HBase (via Thrift) with a timestamp just drop the timestamp whereas the Bigtable API respects them

  • The Thrift API fails to retrieve the TTL information from a column family while the Bigtable API succeeds in returning this information. (We have to work-around this in a few system tests.)

  • When Thrift API does a row read with columns cf1 and cf1:qual1 (in that order) only the results from cf1:qual1 are returned (even though they are a subset of all the columns in the column family cf1). If the columns are given in the opposite order (cf1:qual1 then cf1) the correct results are returned. In Cloud Bigtable, it works as expected in either order. (We use a union filter, one which has only family_name_regex_filter='cf1' and another which has that combined with column_qualifier_regex_filter='qual1'.) (This happen for a single row read and multiple rows.)

  • HBase counter_get doesn't actually populate the data even though the docstring says:

    This method retrieves the current value of a counter column. If the counter column does not exist, this function initialises it to 0

Neither Good/Bad

  • HBase reads (via Table.row, Table.rows, Table.cells, Table.scan) all use exclusive end timestamps, which makes the behavior of a Bigtable TimestampRange. On the other hand, HBase deletes use inclusive end timestamps, while Bigtable deletes are still using a TimestampRange (only for deleting specific columns those, as column family or row deletes can't send a timestamp range, as referenced above). We address this just by incrementing the passed in timestamp by 1 millisecond (which is the lowest allowed granularity).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions