Scraping Alpha is a Scrapy-powered web spider that (might) trawl the Scraping Alpha website and collate earnings call transcripts in an SQL database. I'm not sure how well it works.
This repository has been archived on 2022-08-01. You can view files and clone it, but cannot push or open issues or pull requests.
Find a file
2016-02-22 10:24:39 +00:00
.gitignore Initial commit 2016-02-12 19:55:18 +00:00
firstfew.xml Moving 2016-02-21 14:07:05 +00:00
firstfew.xml~ Moving 2016-02-21 14:07:05 +00:00
firstfewbackup.xml Moving 2016-02-21 14:07:05 +00:00
patent_slurper.pl Moving 2016-02-21 14:07:05 +00:00
patentlast.xml Initial 2016-02-12 23:17:15 +00:00
patentsfirst.xml Initial 2016-02-12 23:17:15 +00:00
README.md h 2016-02-22 10:24:39 +00:00
step1.pl Moving 2016-02-21 14:07:05 +00:00

PatentSlurp

Patent slurper for Dr Lars Hass, LUMS

TODO

  1. Add stripper for redundant xml tags
  2. Harvest below data from Google dumps 2001-2015:

          storage  display     value

variable name type format label variable label

sta str2 %2s assg/state cnt str3 %3s assg/country assgnum byte %8.0g assg/assignee seq. number (imc) cty str72 %72s assg/city pdpass long %12.0g Unique assignee number ptype str1 %9s patent type patnum long %12.0g patent number

  1. Compare data with NBER data (http://eml.berkeley.edu/~bhhall/NBER06.html)
  2. ...

Useful Tools

http://codebeautify.org/xmlviewer http://www.regexr.com/