Python for Data Science – Importing table data from a web page

This is another blog post about using Pandas package. This time, I’ll show you how to import table data from a web page. To be able to get table data, there should be a table defined with table tags (table,td,tr) in the web page we access. Unfortunately most web sites do not use “tables” anymore. They usually prefer to use “div” tags, so if this code doesn’t work, check HTML source code of the page.

For testing purposes, I’ll try to fetch exchange rates from CNN Money International web site. There are two tables in the page, one for the exchange rates and one for the world markets.

Python code is very simple:

import pandas as pd

df_list = pd.read_html( "http://money.cnn.com/data/currencies/" )

print df_list

Output:

[         Currencies       $1=  Change inU.S. dollars % Change  \
0  Argentinean Peso   17.6299                 0.0099  +0.056%   
1    Brazilian Real    3.2817                -0.0244  -0.738%   
2   Canadian Dollar    1.2760                 0.0001  +0.005%   
3      Chilean Peso  631.8000                -2.2000  -0.347%   
4    Dominican Peso   47.3280                -0.1750  -0.368%   
5      Mexican Peso   19.1027                -0.0645  -0.337%   

          52-week range  
0    14.85Today|||17.83  
1      3.04Today|||3.58  
2      1.21Today|||1.38  
3  604.04Today|||670.90  
4    45.29Today|||48.06  
5    17.45Today|||22.03  ,                    Index 1 day change     Level
0 NaN  Nikkei 225  Japan       +0.04%  22548.35
1 NaN   Hang Seng  China       -0.02%  28596.80
2 NaN  FTSE 100  England       +0.02%   7561.64
3 NaN     CAC 40  France       -0.23%   5505.17]

I examined the HTML code of the page and see that these tables have different IDs. The ID of the exchange rates table is “wsod_currencyExhangeRatesTable”. I use this ID to fetch only the exchange rates table:

import pandas as pd

df_list = pd.read_html( "http://money.cnn.com/data/currencies/", attrs = {'id': 'wsod_currencyExhangeRatesTable'} )

print df_list

Output:

[         Currencies       $1=  Change inU.S. dollars % Change  \
0  Argentinean Peso   17.6300                 0.0100  +0.057%   
1    Brazilian Real    3.2838                -0.0223  -0.675%   
2   Canadian Dollar    1.2755                -0.0004  -0.031%   
3      Chilean Peso  631.8300                -2.1700  -0.342%   
4    Dominican Peso   47.3280                -0.1750  -0.368%   
5      Mexican Peso   19.0832                -0.0841  -0.439%   

          52-week range  
0    14.85Today|||17.83  
1      3.04Today|||3.58  
2      1.21Today|||1.38  
3  604.04Today|||670.90  
4    45.29Today|||48.06  
5    17.45Today|||22.03  ]

The read_html function returns a list of DataFrames even there’s only one table. We need to use indexes (i.e. df_list[0]) to access the first table.

You probably noticed that the last column contains both min and max values and it could be better to extract these data into separate columns. Here’s the script:

# -*- coding: utf-8 -*-
"""
Created on Tue Nov  6 16:01:21 2017

@author: Gokhan Atil
"""

import pandas as pd

def main():
    """ main """
    df_json_raw = pd.read_json('test.json')
    df_json = df_json_raw.apply(lambda x: pd.Series([x[0]['name'], x[0]['email']]), axis=1)
    df_json.columns = ['name', 'email']
    print df_json

main()

and the output:

Currencies       $1=  Change inU.S. dollars % Change     min     max
0  Argentinean Peso   17.6290                 0.0090  +0.051%   14.85   17.83
1    Brazilian Real    3.2899                -0.0162  -0.490%    3.04    3.58
2   Canadian Dollar    1.2761                 0.0002  +0.016%    1.21    1.38
3      Chilean Peso  632.3000                -1.7000  -0.268%  604.04  670.90
4    Dominican Peso   47.3280                -0.1750  -0.368%   45.29   48.06
5      Mexican Peso   19.1031                -0.0641  -0.334%   17.45   22.03

So we successfully fetched the table data and parsed it from a web site. Did you see how easy to manipulate columns of Pandas DataFrames? See you next blog post!

Python for Data Science – Importing table data from a web page

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112