Skip to content

Commit 689d491

Browse files
committed
DOC: move data reader docs to Remote Data Access top-level section
1 parent 9209224 commit 689d491

File tree

3 files changed

+218
-181
lines changed

3 files changed

+218
-181
lines changed

doc/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ See the package overview for more detail about what's in the library.
125125
visualization
126126
rplot
127127
io
128+
remote_data
128129
enhancingperf
129130
sparse
130131
gotchas

doc/source/io.rst

Lines changed: 0 additions & 181 deletions
Original file line numberDiff line numberDiff line change
@@ -2523,184 +2523,3 @@ Alternatively, the function :func:`~pandas.io.stata.read_stata` can be used
25232523
25242524
import os
25252525
os.remove('stata.dta')
2526-
2527-
Data Reader
2528-
-----------
2529-
2530-
.. _io.data_reader:
2531-
2532-
Functions from :mod:`pandas.io.data` extract data from various Internet
2533-
sources into a DataFrame. Currently the following sources are supported:
2534-
2535-
- Yahoo! Finance
2536-
- Google Finance
2537-
- St. Louis FED (FRED)
2538-
- Kenneth French's data library
2539-
2540-
It should be noted, that various sources support different kinds of data, so not all sources implement the same methods and the data elements returned might also differ.
2541-
2542-
Yahoo! Finance
2543-
~~~~~~~~~~~~~~
2544-
2545-
.. ipython:: python
2546-
2547-
import pandas.io.data as web
2548-
start = datetime.datetime(2010, 1, 1)
2549-
end = datetime.datetime(2013, 01, 27)
2550-
f=web.DataReader("F", 'yahoo', start, end)
2551-
f.ix['2010-01-04']
2552-
2553-
Google Finance
2554-
~~~~~~~~~~~~~~
2555-
2556-
.. ipython:: python
2557-
2558-
import pandas.io.data as web
2559-
start = datetime.datetime(2010, 1, 1)
2560-
end = datetime.datetime(2013, 01, 27)
2561-
f=web.DataReader("F", 'google', start, end)
2562-
f.ix['2010-01-04']
2563-
2564-
FRED
2565-
~~~~
2566-
2567-
.. ipython:: python
2568-
2569-
import pandas.io.data as web
2570-
start = datetime.datetime(2010, 1, 1)
2571-
end = datetime.datetime(2013, 01, 27)
2572-
gdp=web.DataReader("GDP", "fred", start, end)
2573-
gdp.ix['2013-01-01']
2574-
2575-
2576-
Fama/French
2577-
~~~~~~~~~~~
2578-
2579-
Tthe dataset names are listed at `Fama/French Data Library
2580-
<http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html>`_)
2581-
2582-
.. ipython:: python
2583-
2584-
import pandas.io.data as web
2585-
ip=web.DataReader("5_Industry_Portfolios", "famafrench")
2586-
ip[4].ix[192607]
2587-
2588-
2589-
World Bank panel data in Pandas
2590-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2591-
2592-
``Pandas`` users can easily access thousands of panel data series from the
2593-
`World Bank's World Development Indicators <http://data.worldbank.org>`_
2594-
by using the ``wb`` I/O functions.
2595-
2596-
For example, if you wanted to compare the Gross Domestic Products per capita in
2597-
constant dollars in North America, you would use the ``search`` function:
2598-
2599-
.. code:: python
2600-
2601-
In [1]: from pandas.io.wb import search, download
2602-
2603-
In [2]: search('gdp.*capita.*const').iloc[:,:2]
2604-
Out[2]:
2605-
id name
2606-
3242 GDPPCKD GDP per Capita, constant US$, millions
2607-
5143 NY.GDP.PCAP.KD GDP per capita (constant 2005 US$)
2608-
5145 NY.GDP.PCAP.KN GDP per capita (constant LCU)
2609-
5147 NY.GDP.PCAP.PP.KD GDP per capita, PPP (constant 2005 internation...
2610-
2611-
Then you would use the ``download`` function to acquire the data from the World
2612-
Bank's servers:
2613-
2614-
.. code:: python
2615-
2616-
In [3]: dat = download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX'], start=2005, end=2008)
2617-
2618-
In [4]: print dat
2619-
NY.GDP.PCAP.KD
2620-
country year
2621-
Canada 2008 36005.5004978584
2622-
2007 36182.9138439757
2623-
2006 35785.9698172849
2624-
2005 35087.8925933298
2625-
Mexico 2008 8113.10219480083
2626-
2007 8119.21298908649
2627-
2006 7961.96818458178
2628-
2005 7666.69796097264
2629-
United States 2008 43069.5819857208
2630-
2007 43635.5852068142
2631-
2006 43228.111147107
2632-
2005 42516.3934699993
2633-
2634-
The resulting dataset is a properly formatted ``DataFrame`` with a hierarchical
2635-
index, so it is easy to apply ``.groupby`` transformations to it:
2636-
2637-
.. code:: python
2638-
2639-
In [6]: dat['NY.GDP.PCAP.KD'].groupby(level=0).mean()
2640-
Out[6]:
2641-
country
2642-
Canada 35765.569188
2643-
Mexico 7965.245332
2644-
United States 43112.417952
2645-
dtype: float64
2646-
2647-
Now imagine you want to compare GDP to the share of people with cellphone
2648-
contracts around the world.
2649-
2650-
.. code:: python
2651-
2652-
In [7]: search('cell.*%').iloc[:,:2]
2653-
Out[7]:
2654-
id name
2655-
3990 IT.CEL.SETS.FE.ZS Mobile cellular telephone users, female (% of ...
2656-
3991 IT.CEL.SETS.MA.ZS Mobile cellular telephone users, male (% of po...
2657-
4027 IT.MOB.COV.ZS Population coverage of mobile cellular telepho...
2658-
2659-
Notice that this second search was much faster than the first one because
2660-
``Pandas`` now has a cached list of available data series.
2661-
2662-
.. code:: python
2663-
2664-
In [13]: ind = ['NY.GDP.PCAP.KD', 'IT.MOB.COV.ZS']
2665-
In [14]: dat = download(indicator=ind, country='all', start=2011, end=2011).dropna()
2666-
In [15]: dat.columns = ['gdp', 'cellphone']
2667-
In [16]: print dat.tail()
2668-
gdp cellphone
2669-
country year
2670-
Swaziland 2011 2413.952853 94.9
2671-
Tunisia 2011 3687.340170 100.0
2672-
Uganda 2011 405.332501 100.0
2673-
Zambia 2011 767.911290 62.0
2674-
Zimbabwe 2011 419.236086 72.4
2675-
2676-
Finally, we use the ``statsmodels`` package to assess the relationship between
2677-
our two variables using ordinary least squares regression. Unsurprisingly,
2678-
populations in rich countries tend to use cellphones at a higher rate:
2679-
2680-
.. code:: python
2681-
2682-
In [17]: import numpy as np
2683-
In [18]: import statsmodels.formula.api as smf
2684-
In [19]: mod = smf.ols("cellphone ~ np.log(gdp)", dat).fit()
2685-
In [20]: print mod.summary()
2686-
OLS Regression Results
2687-
==============================================================================
2688-
Dep. Variable: cellphone R-squared: 0.297
2689-
Model: OLS Adj. R-squared: 0.274
2690-
Method: Least Squares F-statistic: 13.08
2691-
Date: Thu, 25 Jul 2013 Prob (F-statistic): 0.00105
2692-
Time: 15:24:42 Log-Likelihood: -139.16
2693-
No. Observations: 33 AIC: 282.3
2694-
Df Residuals: 31 BIC: 285.3
2695-
Df Model: 1
2696-
===============================================================================
2697-
coef std err t P>|t| [95.0% Conf. Int.]
2698-
-------------------------------------------------------------------------------
2699-
Intercept 16.5110 19.071 0.866 0.393 -22.384 55.406
2700-
np.log(gdp) 9.9333 2.747 3.616 0.001 4.331 15.535
2701-
==============================================================================
2702-
Omnibus: 36.054 Durbin-Watson: 2.071
2703-
Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.133
2704-
Skew: -2.314 Prob(JB): 1.35e-26
2705-
Kurtosis: 11.077 Cond. No. 45.8
2706-
==============================================================================

0 commit comments

Comments
 (0)