20 node hadoop cluster with Cloudera Hadoop 5.12.0
Project | Collection name | Total # of tweet | Started at | Collection tool | Analysis service |
---|---|---|---|---|---|
IDEAL | Archive DB | 2,018,076,369 | 2012 | yTK1) | |
IDEAL | 1% sampling | 74,807,578 | 2015 | DMI-TCAT2) | Analysis |
IDEAL | User following | 11,212,862 | 2015 | DMI-TCAT2) | Analysis |
IDEAL | Keyword tracking | 21,646,879 | 2015 | DMI-TCAT2) | Analysis |
GETAR | Collection | 316,389,863 | 2015 | yTK1) | |
GETAR | Collection | 2,058,471,000 | 2016.9 | SFM3) | |
NIH | Keyword tracking | 660,660 | 2015 | DMI-TCAT2) | Analysis |
Total | Google Table | 4,501,265,211 |
Open source tools for collecting tweets
Project | Collection name | Hosted by | Service | Location |
---|---|---|---|---|
IDEAL | IA webpage collection | Internet Archive | Archive-it | IA link |
IDEAL | IA webpage collection (downloaded) |
Virginia Tech (DLRL) |
Hadoop | /data/IACollections in head node |