DLRL Hadoop cluster

20 node hadoop cluster with Cloudera Hadoop 5.12.0

1. Hadoop Service

2. Tweet Collections and Services

ProjectCollection nameTotal # of tweetStarted atCollection toolAnalysis service
IDEAL Archive DB 1,739,346,273 2012 yTK1) Analysis using Hadoop
IDEAL 1% sampling 74,807,578 2015 DMI-TCAT2) Analysis
IDEAL User following 11,212,862 2015 DMI-TCAT2) Analysis
IDEAL Keyword tracking 21,646,879 2015 DMI-TCAT2) Analysis
GETAR Collection 164,682,889 2015 yTK1) Analysis using Hadoop
GETAR Collection 573,224,979 2016.9 SFM3) Analysis
NIH Keyword tracking 660,660 2015 DMI-TCAT2) Analysis
Total Google Table 2,585,582,120

Open source tools for collecting tweets

  1. yourTwapperKeeper (yTK)
  3. Social Feed Manager (SFM)

3. Web Collections

ProjectCollection nameHosted byServiceLocation
IDEAL IA webpage collection Internet Archive Archive-it IA link
IDEAL IA webpage collection
Virginia Tech
Hadoop /data/IACollections in head node