DLRL Hadoop cluster

20 node hadoop cluster with Cloudera Hadoop 5.12.0

1. Hadoop Service

2. Tweet Collections and Services

ProjectCollection nameTotal # of tweetStarted atCollection toolAnalysis service
IDEAL Archive DB 2,093,744,376 2012 yTK1)
IDEAL 1% sampling 74,807,578 2015 DMI-TCAT2) Analysis
IDEAL User following 11,212,862 2015 DMI-TCAT2) Analysis
IDEAL Keyword tracking 21,646,879 2015 DMI-TCAT2) Analysis
GETAR Collection 353,211,619 2015 yTK1)
GETAR Collection 2,434,150,825 2016.9 SFM3)
NIH Keyword tracking 660,660 2015 DMI-TCAT2) Analysis
Total Google Table 4,989,434,799

Open source tools for collecting tweets

  1. yourTwapperKeeper (yTK)
  2. DMI-TCAT
  3. Social Feed Manager (SFM)

3. Web Collections

ProjectCollection nameHosted byServiceLocation
IDEAL IA webpage collection Internet Archive Archive-it IA link
IDEAL IA webpage collection
(downloaded)
Virginia Tech
(DLRL)
Hadoop /data/IACollections in head node