Skip to content

read_sql chokes on mysql when using labels with queries due to unnecessary quoting #7826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxgrenderjones opened this issue Jul 23, 2014 · 15 comments · Fixed by #7961
Closed
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@maxgrenderjones
Copy link
Contributor

Not sure if this is a pandas bug or an upstream one, but here's an example of the bug (pandas-0.14.1, mariadb 10, sqlalchemy-0.9.4)

engine=create_engine('mysql://{username}:{password}@{host}/{database}?charset=utf8'.format(**db))
pandas.io.sql.read_sql('SELECT onlinetransactions.id FROM onlinetransactions LIMIT 1', engine)   #Does what you'd expect
pandas.io.sql.read_sql('SELECT onlinetransactions.id as firstid FROM onlinetransactions LIMIT 1', engine)  #Fails

The error you get back is:

C:\Anaconda\lib\site-packages\pandas\io\sql.pyc in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns)
    421             coerce_float=coerce_float, parse_dates=parse_dates)
    422 
--> 423     if pandas_sql.has_table(sql):
    424         pandas_sql.meta.reflect(only=[sql])
    425         return pandas_sql.read_table(

C:\Anaconda\lib\site-packages\pandas\io\sql.pyc in has_table(self, name)
    847 
    848     def has_table(self, name):
--> 849         return self.engine.has_table(name)
    850 
    851     def get_table(self, table_name):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in has_table(self, table_name, schema)
   1757 
   1758         """
-> 1759         return self.run_callable(self.dialect.has_table, table_name, schema)
   1760 
   1761     def raw_connection(self):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in run_callable(self, callable_, *args, **kwargs)
   1661         """
   1662         with self.contextual_connect() as conn:
-> 1663             return conn.run_callable(callable_, *args, **kwargs)
   1664 
   1665     def execute(self, statement, *multiparams, **params):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in run_callable(self, callable_, *args, **kwargs)
   1188 
   1189         """
-> 1190         return callable_(self, *args, **kwargs)
   1191 
   1192     def _run_visitor(self, visitorcallable, element, **kwargs):

C:\Anaconda\lib\site-packages\sqlalchemy\dialects\mysql\base.pyc in has_table(self, connection, table_name, schema)
   2274         try:
   2275             try:
-> 2276                 rs = connection.execute(st)
   2277                 have = rs.fetchone() is not None
   2278                 rs.close()

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in execute(self, object, *multiparams, **params)
    710         """
    711         if isinstance(object, util.string_types[0]):
--> 712             return self._execute_text(object, multiparams, params)
    713         try:
    714             meth = object._execute_on_connection

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _execute_text(self, statement, multiparams, params)
    859             statement,
    860             parameters,
--> 861             statement, parameters
    862         )
    863         if self._has_events or self.engine._has_events:

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _execute_context(self, dialect, constructor, statement, parameters, *args)
    945                                 parameters,
    946                                 cursor,
--> 947                                 context)
    948 
    949         if self._has_events or self.engine._has_events:

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1106                                         self.dialect.dbapi.Error,
   1107                                         connection_invalidated=self._is_disconnect),
-> 1108                                     exc_info
   1109                                 )
   1110 

C:\Anaconda\lib\site-packages\sqlalchemy\util\compat.pyc in raise_from_cause(exception, exc_info)
    183             exc_info = sys.exc_info()
    184         exc_type, exc_value, exc_tb = exc_info
--> 185         reraise(type(exception), exception, tb=exc_tb)
    186 
    187 if py3k:

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _execute_context(self, dialect, constructor, statement, parameters, *args)
    938                                      statement,
    939                                      parameters,
--> 940                                      context)
    941         except Exception as e:
    942             self._handle_dbapi_exception(

C:\Anaconda\lib\site-packages\sqlalchemy\engine\default.pyc in do_execute(self, cursor, statement, parameters, context)
    433 
    434     def do_execute(self, cursor, statement, parameters, context=None):
--> 435         cursor.execute(statement, parameters)
    436 
    437     def do_execute_no_params(self, cursor, statement, context=None):

C:\Anaconda\lib\site-packages\MySQLdb\cursors.pyc in execute(self, query, args)
    203             del tb
    204             self.messages.append((exc, value))
--> 205             self.errorhandler(self, exc, value)
    206         self._executed = query
    207         if not self._defer_warnings: self._warning_check()

C:\Anaconda\lib\site-packages\MySQLdb\connections.pyc in defaulterrorhandler(***failed resolving arguments***)
     34     del cursor
     35     del connection
---> 36     raise errorclass, errorvalue
     37 
     38 re_numeric_part = re.compile(r"^(\d+)")

ProgrammingError: (ProgrammingError) (1103, "Incorrect table name 'SELECT onlinetransactions.id as firstid FROM onlinetransactions LIMIT 1'") 'DESCRIBE `SELECT onlinetransactions.id as firstid FROM onlinetransactions LIMIT 1`' ()

So it never gets as far as running the actual query, because it's tried to run a DESCRIBE query with ``` quotes which fails. i.e.

MariaDB [transactions]> DESCRIBE `SELECT onlinetransactions.id as firstid FROM onlinetransactions LIMIT 1`;
ERROR 1103 (42000): Incorrect table name 'SELECT onlinetransactions.id as firstid FROM onlinetransactions LIMIT 1'

But

MariaDB [transactions]> DESCRIBE SELECT onlinetransactions.id as firstid FROM onlinetransactions LIMIT 1;
+------+-------------+--------------------+-------+---------------+--------------------------------+---------+------+----------+-------------+
| id   | select_type | table              | type  | possible_keys | key                            | key_len | ref  | rows     | Extra       |
+------+-------------+--------------------+-------+---------------+--------------------------------+---------+------+----------+-------------+
|    1 | SIMPLE      | onlinetransactions | index | NULL          | ix_OnlineTransactions_Rpt_Year | 5       | NULL | 11485535 | Using index |
+------+-------------+--------------------+-------+---------------+--------------------------------+---------+------+----------+-------------+
1 row in set (0.00 sec)

Ok. So looking at the stacktrace, I reckon this is a pandas bug as it seems to be calling has_table on my sql query, which doesn't seem to make any sense?

@jreback
Copy link
Contributor

jreback commented Jul 23, 2014

you need to use read_sql_query; read_sql uses a heurestic to figure out which of read_sql_query or read_sql_table needs to be called; its 'wrong' in that since you have an embedded select I think it thinks this is a table.

I am not sure this is a bug in the heurestic or user error (in that you need to explicity call a different function here).

@jreback jreback added the SQL label Jul 23, 2014
@maxgrenderjones
Copy link
Contributor Author

I call it a bug (but feel free to disagree), and here's the one liner to fix it:

was

if pandas_sql.has_table(sql):
        pandas_sql.meta.reflect(only=[sql])
        return pandas_sql.read_table(
            sql, index_col=index_col, coerce_float=coerce_float,
            parse_dates=parse_dates, columns=columns)

Works with:

if pandas_sql.get_table(sql) is not None:
        pandas_sql.meta.reflect(only=[sql])
        return pandas_sql.read_table(
            sql, index_col=index_col, coerce_float=coerce_float,
            parse_dates=parse_dates, columns=columns)

Note that the docs just say that the first argument is a query (maybe I'm missing something, but what else would it be other than a SELECT statement?) - it doesn't have to be a table.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2014

ok, seems reasonable. want to do a pull-request (with a tests) for this?

@jreback jreback added the Bug label Jul 23, 2014
@jreback jreback added this to the 0.15.0 milestone Jul 23, 2014
@maxgrenderjones
Copy link
Contributor Author

Will try to. I suck at git, and I have to wait 'til I have access to my linux box so I can build pandas, but I'll do my best :)

In terms of tests, what do we need? I don't have access to all the different database types that SQLAlchemy supports, and so I can't be sure that I'll be writing correct SQL. Is it enough to just have a test in pandas.io.tests.test_sql.TestMySQLAlchemy that makes sure you can do a read_sql query with an as foo subclause?

@jreback
Copy link
Contributor

jreback commented Jul 23, 2014

I think u can make a reasonable generic one yes?

this should be tested on all engines

@maxgrenderjones
Copy link
Contributor Author

OK - we'll see what I come up with :)

@jorisvandenbossche
Copy link
Member

@maxgrenderjones Thanks for the report. However, I can't reproduce this.
Can you see if you still get the error with this small test case?

df = pd.DataFrame({'a': range(3), 'b':list('abc')})
df.to_sql('test', engine, index=False, if_exists='replace')
pd.read_sql('test', engine)
pd.read_sql('SELECT test.a FROM test LIMIT 1', engine)
pd.read_sql('SELECT test.a AS testa FROM test LIMIT 1', engine)

I am running this with sqlalchemy 9.6, MySQL 5.6 and pymysql 0.6.1. Would it be possible it has to do something wih MariaDB or your python driver?

And for the docs, if you look at the current one: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html, the text should already be updated to reflect the current situation.

@jorisvandenbossche
Copy link
Member

And BTW, it seems to me like a bug in has_table, no? As this should just return False if the table does not exist, or if it is an "incorrect" table name (as the error report says). So maybe a bug in sqlalchemy or the python driver you are using (but difficult to say as I couldn't reproduce it)

@maxgrenderjones
Copy link
Contributor Author

Thanks @jorisvandenbossche, looks like there's more to this one than would appear at first glance

Before I begin, my versions:

  • SQLAlchemy: 0.9.4
  • Pandas: 0.14.1
  • PyMySQL: 0.6.2.None
  • MySQLdb: 1.2.5
  • oursql: 0.9.3.1

My test code

reload(pandas)
reload(pandas.io)
reload(pandas.io.sql)
from IPython.display import display
for engine in [model.getOurSQLEngine(), model.getMySQLEngine(), model.getPyMySQLEngine()]:
    print(engine.dialect.driver)
    metadata=MetaData(bind=engine)
    metadata.reflect()
    df = pandas.DataFrame({'a': range(3), 'b':list('abc')})
    testtable='test'
    df.to_sql(testtable, engine, index=False, if_exists='replace')
    display(pandas.io.sql.read_sql(testtable, engine))
    try:
        display(pandas.read_sql(testtable, engine))
    except Exception as e:
        print(e)
    display(pandas.io.sql.read_sql('SELECT %s.a FROM %s LIMIT 1' % (testtable, testtable), engine))
    display(pandas.io.sql.read_sql('SELECT %s.a AS aid FROM %s LIMIT 1' % (testtable, testtable), engine))

Output:

oursql
a   b
0   0   a
1   1   b
2   2   c
(ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'test' at line 1", None) 'test' ()
a
0   0
aid
0   0
mysqldb
a   b
0   0   a
1   1   b
2   2   c
(ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'test' at line 1") 'test' ()
a
0   0
aid
0   0
pymysql
a   b
0   0   a
1   1   b
2   2   c
(ProgrammingError) (1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'test' at line 1") 'test' ()
a
0   0
aid
0   0

So note that pandas.read_sql throws an exception, but pandas.io.sql.read_sql does not. Either way, the issue I was having before, seems to have vanished.

Now run the same code, changing the for loop to:

for engine in [model.getOurSQLEngine(), model.getMySQLEngine(), model.getPyMySQLEngine()]:
    print(engine.dialect.driver)
    metadata=MetaData(bind=engine)
    metadata.reflect()
    df = pandas.DataFrame({'a': range(3), 'b':list('abc')})
    testtable='tablewithaverylongname'
    df.to_sql(testtable, engine, index=False, if_exists='replace')
    display(pandas.io.sql.read_sql(testtable, engine))
    try:
        display(pandas.read_sql(testtable, engine))
    except Exception as e:
        print(e)
    try:
        display(pandas.io.sql.read_sql('SELECT %s.a FROM %s LIMIT 1' % (testtable, testtable), engine))
        display(pandas.io.sql.read_sql('SELECT %s.a AS aid FROM %s LIMIT 1' % (testtable, testtable), engine))
    except Exception as e:
        print(e)

and our issue comes back again

oursql
a   b
0   0   a
1   1   b
2   2   c
(ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'tablewithaverylongname' at line 1", None) 'tablewithaverylongname' ()
(ProgrammingError) (1103, "Incorrect table name 'SELECT tablewithaverylongname.a FROM tablewithaverylongname LIMIT 1'", None) 'DESCRIBE `SELECT tablewithaverylongname.a FROM tablewithaverylongname LIMIT 1`' ()
mysqldb
a   b
0   0   a
1   1   b
2   2   c
(ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'tablewithaverylongname' at line 1") 'tablewithaverylongname' ()
(ProgrammingError) (1103, "Incorrect table name 'SELECT tablewithaverylongname.a FROM tablewithaverylongname LIMIT 1'") 'DESCRIBE `SELECT tablewithaverylongname.a FROM tablewithaverylongname LIMIT 1`' ()
pymysql
a   b
0   0   a
1   1   b
2   2   c
(ProgrammingError) (1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'tablewithaverylongname' at line 1") 'tablewithaverylongname' ()
(ProgrammingError) (1103, u"Incorrect table name 'SELECT tablewithaverylongname.a FROM tablewithaverylongname LIMIT 1'") 'DESCRIBE `SELECT tablewithaverylongname.a FROM tablewithaverylongname LIMIT 1`' ()

P.S. fwiw, tablewithaverylongname was created successfully, and contains what we expect it to contain

@jorisvandenbossche
Copy link
Member

Hmm, very strange:

  • first, pd.read_sql and pd.io.sql.read_sql are exactly the same, so that should not give a difference (just look what is printed when you type 'pd.read_sql')
  • If I run your example code on my pymysql engine, I don't get any error for the first example, and I get another error for the second example code (I get "Identifier name 'SELECT tablewithaverylongname.a FROM tablewithaverylongname LIMIT 1' is too long" -> the 'table name' (the select statement) has become too long because of the long table name)

Are you sure you are picking the correct pandas version? (pandas.__version__ gives 0.14.1?)

@jorisvandenbossche
Copy link
Member

@maxgrenderjones See #7961. Can you check if this solves the issue? (difficult to check for me, as I couldn't reproduce it, but this fix should least prevent an error when has_table fails).

@maxgrenderjones
Copy link
Contributor Author

Will do

@jorisvandenbossche
Copy link
Member

@maxgrenderjones Have you been able to test it?

@maxgrenderjones
Copy link
Contributor Author

Fixes the issue. (Note that my previous pandas.io is different from pandas result was a red herring - I'd called reload on one but not the other so they weren't the same thing).

@jorisvandenbossche
Copy link
Member

@maxgrenderjones Thanks for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants