DLRL Hadoop cluster

20 node hadoop cluster with Cloudera Hadoop 5.12.0

1. Hadoop Service

2. Tweet Collections and Services

ProjectCollection nameTotal # of tweetStarted atCollection toolAnalysis service
IDEAL Archive DB 1,957,682,304 2012 yTK1) Analysis using Hadoop
IDEAL 1% sampling 74,807,578 2015 DMI-TCAT2) Analysis
IDEAL User following 11,212,862 2015 DMI-TCAT2) Analysis
IDEAL Keyword tracking 21,646,879 2015 DMI-TCAT2) Analysis
GETAR Collection 284,554,189 2015 yTK1) Analysis using Hadoop
GETAR Collection 1,374,154,780 2016.9 SFM3) Analysis
NIH Keyword tracking 660,660 2015 DMI-TCAT2) Analysis
Total Google Table 3,724,719,252

Open source tools for collecting tweets

  1. yourTwapperKeeper (yTK)
  3. Social Feed Manager (SFM)

3. Web Collections

ProjectCollection nameHosted byServiceLocation
IDEAL IA webpage collection Internet Archive Archive-it IA link
IDEAL IA webpage collection
Virginia Tech
Hadoop /data/IACollections in head node