Archived

Scraping Alpha is a Scrapy-powered web spider that (might) trawl the Scraping Alpha website and collate earnings call transcripts in an SQL database. I'm not sure how well it works.

scraping-websites

This repository has been archived on 2022-08-01. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.

Find a file

rumps e4d86d9cdf h		2016-02-22 10:24:39 +00:00
.gitignore	Initial commit	2016-02-12 19:55:18 +00:00
firstfew.xml	Moving	2016-02-21 14:07:05 +00:00
firstfew.xml~	Moving	2016-02-21 14:07:05 +00:00
firstfewbackup.xml	Moving	2016-02-21 14:07:05 +00:00
patent_slurper.pl	Moving	2016-02-21 14:07:05 +00:00
patentlast.xml	Initial	2016-02-12 23:17:15 +00:00
patentsfirst.xml	Initial	2016-02-12 23:17:15 +00:00
README.md	h	2016-02-22 10:24:39 +00:00
step1.pl	Moving	2016-02-21 14:07:05 +00:00

README.md

PatentSlurp

Patent slurper for Dr Lars Hass, LUMS

TODO

Add stripper for redundant xml tags
Harvest below data from Google dumps 2001-2015:

          storage  display     value

variable name type format label variable label

sta str2 %2s assg/state cnt str3 %3s assg/country assgnum byte %8.0g assg/assignee seq. number (imc) cty str72 %72s assg/city pdpass long %12.0g Unique assignee number ptype str1 %9s patent type patnum long %12.0g patent number

Compare data with NBER data (http://eml.berkeley.edu/~bhhall/NBER06.html)
...

Useful Tools

http://codebeautify.org/xmlviewer http://www.regexr.com/