Description
Hi, I'm using Heroku to run a python based ETL process where I'm pushing the contents of a Pandas dataframe into Google BQ using to_gbq. However, it's generating a UnicodeEncodeError with the following stack trace, due to some non-latin characters.
Strangely this works fine on my Mac but when I try to run it on Heroku, it's failing. It seems that for some reason, http.client.py is getting an un-encoded string rather than bytes and therefore, it's trying to encode with latin-1, which is the default but obviously would choke on anything non-latin, like Chinese chars.
2018-01-08T04:54:17.307496+00:00 app[run.2251]:
Load is 100.0% Complete044+00:00 app[run.2251]:
2018-01-08T04:54:20.443238+00:00 app[run.2251]: Traceback (most recent call last):
2018-01-08T04:54:20.443267+00:00 app[run.2251]: File "AllCostAndRev.py", line 534, in
2018-01-08T04:54:20.443708+00:00 app[run.2251]: main(yaml.dump(data=ads_dict))
2018-01-08T04:54:20.443710+00:00 app[run.2251]: File "AllCostAndRev.py", line 475, in main
2018-01-08T04:54:20.443915+00:00 app[run.2251]: private_key=environ['skynet_bq_pk']
2018-01-08T04:54:20.443917+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 989, in to_gbq
2018-01-08T04:54:20.444390+00:00 app[run.2251]: connector.load_data(dataframe, dataset_id, table_id, chunksize)
2018-01-08T04:54:20.444391+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 590, in load_data
2018-01-08T04:54:20.444653+00:00 app[run.2251]: job_config=job_config).result()
2018-01-08T04:54:20.444656+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 748, in load_table_from_file
2018-01-08T04:54:20.445248+00:00 app[run.2251]: response = upload.transmit_next_chunk(transport)
2018-01-08T04:54:20.445250+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 395, in transmit_next_chunk
2018-01-08T04:54:20.444942+00:00 app[run.2251]: file_obj, job_resource, num_retries)
2018-01-08T04:54:20.445457+00:00 app[run.2251]: retry_strategy=self._retry_strategy)
2018-01-08T04:54:20.444943+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 777, in _do_resumable_upload
2018-01-08T04:54:20.445458+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request
2018-01-08T04:54:20.445592+00:00 app[run.2251]: func, RequestsMixin._get_status_code, retry_strategy)
2018-01-08T04:54:20.445594+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry
2018-01-08T04:54:20.445725+00:00 app[run.2251]: response = func()
2018-01-08T04:54:20.445726+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/auth/transport/requests.py", line 186, in request
2018-01-08T04:54:20.445866+00:00 app[run.2251]: method, url, data=data, headers=request_headers, **kwargs)
2018-01-08T04:54:20.445867+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
2018-01-08T04:54:20.446099+00:00 app[run.2251]: resp = self.send(prep, **send_kwargs)
2018-01-08T04:54:20.446101+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
2018-01-08T04:54:20.446456+00:00 app[run.2251]: r = adapter.send(request, **kwargs)
2018-01-08T04:54:20.446457+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
2018-01-08T04:54:20.446728+00:00 app[run.2251]: timeout=timeout
2018-01-08T04:54:20.446730+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
2018-01-08T04:54:20.446969+00:00 app[run.2251]: chunked=chunked)
2018-01-08T04:54:20.446970+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 357, in _make_request
2018-01-08T04:54:20.447229+00:00 app[run.2251]: conn.request(method, url, **httplib_request_kw)
2018-01-08T04:54:20.447231+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/http/client.py", line 1239, in request
2018-01-08T04:54:20.447690+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/http/client.py", line 1284, in _send_request
2018-01-08T04:54:20.448232+00:00 app[run.2251]: body = _encode(body, 'body')
2018-01-08T04:54:20.448234+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/http/client.py", line 161, in _encode
2018-01-08T04:54:20.448405+00:00 app[run.2251]: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 553626-553628: Body ('信用卡') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
2018-01-08T04:54:20.447689+00:00 app[run.2251]: self._send_request(method, url, body, headers, encode_chunked)
2018-01-08T04:54:20.448396+00:00 app[run.2251]: (name.title(), data[err.start:err.end], name)) from None
2018-01-08T04:54:20.621819+00:00 heroku[run.2251]: State changed from up to complete
2018-01-08T04:54:20.609814+00:00 heroku[run.2251]: Process exited with status 1