Skip to content

Commit d49cb5a

Browse files
committed
Merge e554cd7 into b13ddd5
2 parents b13ddd5 + e554cd7 commit d49cb5a

File tree

2 files changed

+146
-8
lines changed

2 files changed

+146
-8
lines changed

.github/CONTRIBUTING.md renamed to CONTRIBUTING.md

Lines changed: 47 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -109,9 +109,9 @@ For a python 3 environment:
109109

110110
conda create -n pandas_dev python=3 --file ci/requirements_dev.txt
111111

112-
If you are on Windows, then you will also need to install the compiler linkages:
113-
114-
conda install -n pandas_dev libpython
112+
> **warning**
113+
>
114+
> If you are on Windows, see here for a fully compliant Windows environment <contributing.windows>.
115115
116116
This will create the new environment, and not touch any of your existing environments, nor any existing python installation. It will install all of the basic dependencies of *pandas*, as well as the development and testing tools. If you would like to install other dependencies, you can install them as follows:
117117

@@ -143,6 +143,28 @@ See the full conda docs [here](http://conda.pydata.org/docs).
143143

144144
At this point you can easily do an *in-place* install, as detailed in the next section.
145145

146+
### Creating a Windows development environment
147+
148+
To build on Windows, you need to have compilers installed to build the extensions. You will need to install the appropriate Visual Studio compilers, VS 2008 for Python 2.7, VS 2010 for 3.4, and VS 2015 for Python 3.5.
149+
150+
For Python 2.7, you can install the `mingw` compiler which will work equivalently to VS 2008:
151+
152+
conda install -n pandas_dev libpython
153+
154+
or use the [Microsoft Visual Studio VC++ compiler for Python](https://www.microsoft.com/en-us/download/details.aspx?id=44266). Note that you have to check the `x64` box to install the `x64` extension building capability as this is not installed by default.
155+
156+
For Python 3.4, you can download and install the [Windows 7.1 SDK](https://www.microsoft.com/en-us/download/details.aspx?id=8279). Read the references below as there may be various gotchas during the installation.
157+
158+
For Python 3.5, you can download and install the [Visual Studio 2015 Community Edition](https://www.visualstudio.com/en-us/downloads/visual-studio-2015-downloads-vs.aspx).
159+
160+
Here are some references and blogs:
161+
162+
- <https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/>
163+
- <https://github.com/conda/conda-recipes/wiki/Building-from-Source-on-Windows-32-bit-and-64-bit>
164+
- <https://cowboyprogrammer.org/building-python-wheels-for-windows/>
165+
- <https://blog.ionelmc.ro/2014/12/21/compiling-python-extensions-on-windows/>
166+
- <https://support.enthought.com/hc/en-us/articles/204469260-Building-Python-extensions-with-Canopy>
167+
146168
### Making changes
147169

148170
Before making your code changes, it is often necessary to build the code that was just checked out. There are two primary methods of doing this.
@@ -258,17 +280,26 @@ Contributing to the code base
258280

259281
### Code standards
260282

261-
*pandas* uses the [PEP8](http://www.python.org/dev/peps/pep-0008/) standard. There are several tools to ensure you abide by this standard.
283+
*pandas* uses the [PEP8](http://www.python.org/dev/peps/pep-0008/) standard. There are several tools to ensure you abide by this standard. Here are *some* of the more common `PEP8` issues:
284+
285+
> - we restrict line-length to 80 characters to promote readability
286+
> - passing arguments should have spaces after commas, e.g. `foo(arg1, arg2, kw1='bar')`
287+
288+
The Travis-CI will run [flake8](http://pypi.python.org/pypi/flake8) tool and report any stylistic errors in your code. Generating any warnings will cause the build to fail; thus these are part of the requirements for submitting code to *pandas*.
289+
290+
It is helpful before submitting code to run this yourself on the diff:
291+
292+
git diff master | flake8 --diff
262293

263-
We've written a tool to check that your commits are PEP8 great, [pip install pep8radius](https://github.com/hayd/pep8radius). Look at PEP8 fixes in your branch vs master with:
294+
Furthermore, we've written a tool to check that your commits are PEP8 great, [pip install pep8radius](https://github.com/hayd/pep8radius). Look at PEP8 fixes in your branch vs master with:
264295

265-
pep8radius master --diff
296+
pep8radius master --diff
266297

267298
and make these changes with:
268299

269300
pep8radius master --diff --in-place
270301

271-
Alternatively, use the [flake8](http://pypi.python.org/pypi/flake8) tool for checking the style of your code. Additional standards are outlined on the [code style wiki page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions).
302+
Additional standards are outlined on the [code style wiki page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions).
272303

273304
Please try to maintain backward compatibility. *pandas* has lots of users with lots of existing code, so don't break it if at all possible. If you think breakage is required, clearly state why as part of the pull request. Also, be careful when changing method signatures and add deprecation warnings where needed.
274305

@@ -315,6 +346,14 @@ The tests suite is exhaustive and takes around 20 minutes to run. Often it is wo
315346
nosetests pandas/tests/[test-module].py:[TestClass]
316347
nosetests pandas/tests/[test-module].py:[TestClass].[test_method]
317348

349+
Furthermore one can run
350+
351+
``` sourceCode
352+
pd.test()
353+
```
354+
355+
with an imported pandas to run tests similarly.
356+
318357
#### Running the performance test suite
319358

320359
Performance matters and it is worth considering whether your code has introduced performance regressions. *pandas* is in the process of migrating to the [asv library](https://github.com/spacetelescope/asv) to enable easy monitoring of the performance of critical *pandas* operations. These benchmarks are all found in the `pandas/asv_bench` directory. asv supports both python2 and python3.
@@ -356,7 +395,7 @@ It can also be useful to run tests in your current environment. You can simply d
356395

357396
This command is equivalent to:
358397

359-
asv run --quick --show-stderr --python=same
398+
asv run --quick --show-stderr --python=same
360399

361400
This will launch every test only once, display stderr from the benchmarks, and use your local `python` that comes from your `$PATH`.
362401

doc/source/comparison_with_sql.rst

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,10 +372,109 @@ In pandas, you can use :meth:`~pandas.concat` in conjunction with
372372
373373
pd.concat([df1, df2]).drop_duplicates()
374374
375+
Pandas equivalents for some SQL analytic and aggregate functions
376+
----------------------------------------------------------------
377+
Top N rows with offset
378+
379+
.. code-block:: sql
380+
381+
-- MySQL
382+
SELECT * FROM tips
383+
ORDER BY tip DESC
384+
LIMIT 10 OFFSET 5;
385+
386+
In pandas:
387+
388+
.. ipython:: python
389+
390+
tips.nlargest(10+5, columns='tip').tail(10)
391+
392+
Top N rows per group
393+
394+
.. code-block:: sql
395+
396+
-- Oracle's ROW_NUMBER() analytic function
397+
SELECT * FROM (
398+
SELECT
399+
t.*,
400+
ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn
401+
FROM tips t
402+
)
403+
WHERE rn < 3
404+
ORDER BY day, rn;
405+
406+
Let's add a helper column: `RN` (Row Number)
407+
408+
.. ipython:: python
409+
410+
(tips.assign(rn=tips.sort_values(['total_bill'], ascending=False)
411+
.groupby(['day'])
412+
.cumcount() + 1)
413+
.query('rn < 3')
414+
.sort_values(['day','rn'])
415+
)
416+
417+
the same using `rank(method='first')` function
418+
419+
.. ipython:: python
420+
421+
(tips.assign(rnk=tips.groupby(['day'])['total_bill']
422+
.rank(method='first', ascending=False))
423+
.query('rnk < 3')
424+
.sort_values(['day','rnk'])
425+
)
426+
427+
.. code-block:: sql
428+
429+
-- Oracle's RANK() analytic function
430+
SELECT * FROM (
431+
SELECT
432+
t.*,
433+
RANK() OVER(PARTITION BY sex ORDER BY tip) AS rnk
434+
FROM tips t
435+
WHERE tip < 2
436+
)
437+
WHERE rnk < 3
438+
ORDER BY sex, rnk;
439+
440+
Let's find tips with (rank < 3) per gender group for (tips < 2).
441+
Notice that when using ``rank(method='min')`` function
442+
`rnk_min` remains the same for the same `tip`
443+
(as Oracle's RANK() function)
444+
445+
.. ipython:: python
446+
447+
(tips[tips['tip'] < 2]
448+
.assign(rnk_min=tips.groupby(['sex'])['tip']
449+
.rank(method='min'))
450+
.query('rnk_min < 3')
451+
.sort_values(['sex','rnk_min'])
452+
)
453+
375454
376455
UPDATE
377456
------
378457

458+
.. code-block:: sql
459+
460+
UPDATE tips
461+
SET tip = tip*2
462+
WHERE tip < 2;
463+
464+
.. ipython:: python
465+
466+
tips.loc[tips['tip'] < 2, 'tip'] *= 2
379467
380468
DELETE
381469
------
470+
471+
.. code-block:: sql
472+
473+
DELETE FROM tips
474+
WHERE tip > 9;
475+
476+
In pandas we select the rows that should remain, instead of deleting them
477+
478+
.. ipython:: python
479+
480+
tips = tips.loc[tips['tip'] <= 9]

0 commit comments

Comments
 (0)