for people with large data sets
[log in]
This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It's a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.
| getscrapers, crawlers, phone calls, buyouts | processconversions, queries, regressions, collaborative filtering | viewtables, graphs, maps, web sites | 
| Mailing list. Join it today! There you can swap tips, questions, and success stories with others who are trying to get big data sets. | Mailing list. Please join us! Find an ongoing discussion with others trying to make sense of big data sets. | Mailing list. We need you! Join the conversation of people building tools for visualizing data. | 
| Tips and tricks. Share tricks for extracting data from those who don't want to give it up. | Tips and tricks. Help us document the best ways to make sense of big data dumps. | Tips and tricks. You know how to make sense of this stuff -- share your techniques in our wiki. | 
| Tools of the trade. Tell us about the things you found to help you solve your problems. | Tools of the trade. Found an amazing tool for processing data? Add it to our wiki. | Tools of the trade. What software do you turn to when you want to make sense of the data? | 
| Data sets. Help us build the most comprehensive list of big data sets available on the Web. | Results. What cool things have you found during your hours of analysis? | Visualizations. Add your site to our gallery -- you know we'd love to see it. | 
| How you can help. Got mad scraping skills? Find interesting projects that need your expert assistance. | How you can help. Find interesting projects that match your processing talents. | How you can help. Searching for some interesting data to look at? We'd love your help visualizing these data sets. | 
The bigger picture:
Some of us have spent years scraping news sites. Others have spent them downloading government data. Others have spent them grabbing catalog records for books. And each time, in each community, we reinvent the same things over and over again: scripts for doing crawls and notifying us when things are wrong, parsers for converting the data to RDF and XML, visualizers for plotting it on graphs and charts.
It's time to start sharing our knowledge and our tools. But more than that, it's time for us to start building a bigger picture together. To write robust crawl harnesses that deal gracefully with errors and notify us when a regexp breaks. To start converting things into common formats and making links between data sets. To build visualizers that will plot numbers on graphs or points on maps, no matter what the source of the input.
We've all been helping to build a Web of data for years now. It's time we acknowledge that and start doing it together.
last modified March 28, 2008