-
Notifications
You must be signed in to change notification settings - Fork 3
Description
I "discovered" some issues when implementing the happybase functionality on top of the Bigtable API. (I put discovered in quotes, because some of the issues may just be that I don't grok how to do the same thing with the Bigtable API).
These were mostly discovered because I wrote a system test for happybase that could work both with HBase and with the Bigtable backend. It can be switched from one to another by changing the USING_HBASE boolean.
Many other differences have been enumerated in the documentation for our custom Bigtable happybase package.
Issues / Differences
- When committing a batch of mutations, the
happybasemethodBatch.send()uses Thrift/HBase'smutateRows/mutateRowsTsmethod to send all mutations at once. With the Bigtable API, this is not possible, we have to commit row-by-row. (This comes up in the system test as well.) - Bigtable Garbage Collection is not as immediate as HBase. In HBase, a column with one
max_versionimmediately evicts the old value when a new one is added. Similarly, with a TTL of 3 seconds, after sleeping for 3.5 seconds, the value has been evicted. Neither of these occur (at least consistently in Bigtable). (I don't really see this as a problem, but users from HBase may have different expectations) - A row scan with
sorted_columnsis not possible in Bigtable. - Using HBase filter string is not possible in Bigtable. (Also some of the filter string concepts don't map to Bigtable filters, e.g.
KeyOnlyFilter) - The Bigtable
Mutation.DeleteFromRowmutation does not support timestamps (also). Even attempting to send one conditionally (viaCheckAndMutateRowRequest) deletes the entire row. - Bigtable can't use a timestamp with column families since
Mutation.DeleteFromFamilydoes not include a timestamp range.
Differences that are Upgrades
-
Writes to HBase (via Thrift) with a timestamp just drop the timestamp whereas the Bigtable API respects them
-
The Thrift API fails to retrieve the TTL information from a column family while the Bigtable API succeeds in returning this information. (We have to work-around this in a few system tests.)
-
When Thrift API does a row read with columns
cf1andcf1:qual1(in that order) only the results fromcf1:qual1are returned (even though they are a subset of all the columns in the column familycf1). If the columns are given in the opposite order (cf1:qual1thencf1) the correct results are returned. In Cloud Bigtable, it works as expected in either order. (We use a union filter, one which has onlyfamily_name_regex_filter='cf1'and another which has that combined withcolumn_qualifier_regex_filter='qual1'.) (This happen for a single row read and multiple rows.) -
HBase
counter_getdoesn't actually populate the data even though the docstring says:This method retrieves the current value of a counter column. If the counter column does not exist, this function initialises it to
0
Neither Good/Bad
- HBase reads (via
Table.row,Table.rows,Table.cells,Table.scan) all use exclusive end timestamps, which makes the behavior of a BigtableTimestampRange. On the other hand, HBase deletes use inclusive end timestamps, while Bigtable deletes are still using aTimestampRange(only for deleting specific columns those, as column family or row deletes can't send a timestamp range, as referenced above). We address this just by incrementing the passed in timestamp by 1 millisecond (which is the lowest allowed granularity).