for people with large data sets
[log in]
WARC: A developing standard for web archiving, WARC provides a convenient way to store Web pages and their associated metadata in a future-proof way that preserves as much as possible. No current reference or other implementations, though at least one is forthcoming.
ARC: A standard for web archiving, used by Heritrix, ARC is a simple stream of web pages stored in a single document with minimal metadata; superseded by WARC.
Heritrix: An open-source, extensible web crawler developed by the Internet Archive of friv.
Get more IPs: Often servers will cut you off if you hit them too hard from the same IP. Luckily, you can get more IPs from tor and anonymous proxies.
Pożyczki Online: An online loan resource "kredyty przez internet"
AOL now utilizes the X-Forwarded-For header, obviating the use of their proxies.
last modified 1 day ago