Turning pages into data ready for Excel and Stata
       |           

Welcome, Guest. Please login or register.
Did you miss your activation email?
May 01, 2024, 02:48:09 AM
News: Election Simulator 2.0 Released. Senate/Gubernatorial maps, proportional electoral votes, and more - Read more

  Talk Elections
  General Politics
  Economics (Moderator: Torie)
  Turning pages into data ready for Excel and Stata
« previous next »
Pages: [1]
Author Topic: Turning pages into data ready for Excel and Stata  (Read 341 times)
phk
phknrocket1k
Atlas Icon
*****
Posts: 12,906


Political Matrix
E: 1.42, S: -1.22

Show only this user's posts in this thread
« on: August 08, 2010, 07:53:38 PM »

There is a lot of data on the web, meant to be looked at by people, but how do you turn it into a spreadsheet people could actually analyze statistically?

The technique to turn web pages intended for people into structured data sets intended for computers is called "screen scraping." It has just been made easier with a wiki/community http://scraperwiki.com/.

They provide libraries to extract information from PDF, Excel files, to automatically fill in forms and similar. Moreover, the community aspect of it should allow researchers doing similar things to get connected. It's very good.

Road Accidents - http://scraperwiki.com/scrapers/show/sefton-mbc-road-accidents/
Port of London Arrivals - http://scraperwiki.com/scrapers/show/port-of-london-arrivals/

You can already find collections of structured data online, examples are Infochimps ("find the world's data"). http://infochimps.org/datasets

Freebase  ("An entity graph of people, places and things, built by a community that loves open data."). http://www.freebase.com/

There's also a repository system for data, TheData ("An open-source application for publishing, citing and discovering research data"). http://thedata.org/home
Logged
Pages: [1]  
« previous next »
Jump to:  


Login with username, password and session length

Terms of Service - DMCA Agent and Policy - Privacy Policy and Cookies

Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Page created in 0.02 seconds with 11 queries.