This document shows you what data is available from a search of historical tweets. See demo tweet EDA for more demonstrations of how you can explore and visualize this data.

The search query used was "lang:en #CampFire OR #campfire OR #fire OR paradise OR @CALFIRE_ButteCo OR concow OR Pulga OR Magalia"

That is: only english language tweets, with each of the listed hashtags, or CalFire for Butte county @ mentioned.

Let’s read in the sample data.

Amount of data

We are currently a 72748 sampled tweets between 2013-11-07 15:31:57 and 2018-12-31 21:12:06.

Okay.. that’s because we pulled full timelines since like 2009 for the town of paradise, and Chico FD. Let’s look at post-fire. This is important b/c i’m interested in how frequent users like these tweetd before, during, and after the fire.

This means you will have to filter your dates before analysis to get tweets about the actual CampFire!

This sample contains tweets from the following time ranges:

  • 2018-11-07
    • 12 noon - 12:59pm
  • 2018-11-08
    • 5:56 am to 12 noon
    • 3:50 pm to 4pm
  • 2018-11-09
    • 7:09 am to 7:58 am
    • 2:56 pm to 2:59 pm

Plus a random sample of 22 hours between 11/09/18 and 12/31/18. (more to come) > Disclaimer: The hour blocks were randomly sampled. Not all tweets were obtained from that hour. > e.g. 11/18/18 16:54 - 16:59, 12/29 10:39-10:59.

Full variable list

We can use the glimpse() function from dplyr to see the variable names, data types, and examples of data values.

## Rows: 72,748
## Columns: 97
## $ user_id                 <chr> "141875161", "779816688644526080", "22...
## $ status_id               <chr> "1060184727515226112", "10601844918035...
## $ created_at              <dttm> 2018-11-07 14:58:30, 2018-11-07 14:57...
## $ screen_name             <chr> "munashe12", "VisionsOfNapa", "jaying7...
## $ text                    <chr> "Is #fire a solid, a liquid, or a gas?...
## $ source                  <chr> "Twitter for Android", "Twitter Web Cl...
## $ reply_to_status_id      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ reply_to_user_id        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ reply_to_screen_name    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ is_quote                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ is_retweet              <lgl> TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, ...
## $ favorite_count          <int> 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 4, 2, 0...
## $ retweet_count           <int> 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 2, 4, 0,...
## $ quote_count             <int> 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 1, 0, 0,...
## $ reply_count             <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ hashtags                <list> ["fire", <"fire", "weather">, <"fire"...
## $ symbols                 <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ urls_url                <list> ["youtu.be/YV8TT9LRBrY", "weather.gov...
## $ urls_t.co               <list> ["https://t.co/yxZtJ0R0cv", "https://...
## $ urls_expanded_url       <list> ["https://youtu.be/YV8TT9LRBrY", "htt...
## $ media_url               <list> [NA, NA, "http://pbs.twimg.com/media/...
## $ media_t.co              <list> [NA, NA, "https://t.co/nejQho8rc6", N...
## $ media_expanded_url      <list> [NA, NA, "https://twitter.com/KiingDG...
## $ media_type              <list> [NA, NA, "photo", NA, NA, NA, NA, "ph...
## $ ext_media_url           <list> [NA, NA, "http://pbs.twimg.com/media/...
## $ ext_media_t.co          <list> [NA, NA, "https://t.co/nejQho8rc6", N...
## $ ext_media_expanded_url  <list> [NA, NA, "https://twitter.com/KiingDG...
## $ ext_media_type          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ mentions_user_id        <list> ["1435717328", "910623276778405888", ...
## $ mentions_screen_name    <list> ["Txtxndx", "ai6yrham", "KiingDG_", "...
## $ quoted_status_id        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_text             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_created_at       <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ quoted_source           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_favorite_count   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_retweet_count    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_user_id          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_screen_name      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_name             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_followers_count  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_friends_count    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_statuses_count   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_location         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_description      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ quoted_verified         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ retweet_status_id       <chr> "1060095946887979008", "10601825853246...
## $ retweet_text            <chr> "Is #fire a solid, a liquid, or a gas?...
## $ retweet_created_at      <dttm> 2018-11-07 09:05:44, 2018-11-07 14:50...
## $ retweet_source          <chr> "Twitter Web Client", "Twitter Web Cli...
## $ retweet_favorite_count  <int> 0, 2, 4, 135, NA, NA, NA, NA, 8, 117, ...
## $ retweet_retweet_count   <int> 1, 4, 2, 63, NA, NA, NA, NA, 20, 85, N...
## $ retweet_user_id         <chr> "1435717328", "910623276778405888", "2...
## $ retweet_screen_name     <chr> "Txtxndx", "ai6yrham", "KiingDG_", "sr...
## $ retweet_name            <chr> "Tatenda", "AI6YR", "DG \U0001f60e\U00...
## $ retweet_followers_count <int> 815, 13312, 886, 1722, NA, NA, NA, NA,...
## $ retweet_friends_count   <int> 5002, 837, 963, 201, NA, NA, NA, NA, 5...
## $ retweet_statuses_count  <int> 3584, 43535, 7346, 20370, NA, NA, NA, ...
## $ retweet_location        <chr> "Lille, France", "DM04", "Chesapeake, ...
## $ retweet_description     <chr> "\"Music is Life. That’s why our heart...
## $ retweet_verified        <lgl> FALSE, FALSE, FALSE, FALSE, NA, NA, NA...
## $ place_url               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ place_name              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ place_full_name         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ place_type              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ country                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ country_code            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ coords_coords           <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, N...
## $ bbox_coords             <list> [<NA, NA, NA, NA, NA, NA, NA, NA>, <N...
## $ status_url              <chr> "https://twitter.com/munashe12/status/...
## $ name                    <chr> "munashe musoni", "Visions of Napa", "...
## $ location                <chr> "Zimbabwe", "Napa, California", "Somew...
## $ description             <chr> "Doctor of pharmacy(pharmd), MBA Marke...
## $ url                     <chr> NA, NA, NA, NA, NA, "http://www.Living...
## $ protected               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ followers_count         <int> 105, 603, 707, 163, 279, 293, 397, 39,...
## $ friends_count           <int> 139, 598, 633, 292, 254, 7, 488, 319, ...
## $ listed_count            <int> 1, 2, 1, 0, 3, 253, 1, 0, 5, 52, 1, 41...
## $ statuses_count          <int> 667, 7135, 1874, 23455, 334, 131307, 4...
## $ favourites_count        <int> 507, 16213, 1189, 71897, 875, 0, 41, 1...
## $ account_created_at      <dttm> 2010-05-09 08:25:26, 2016-09-24 22:56...
## $ verified                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ profile_url             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ profile_expanded_url    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ account_lang            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ profile_banner_url      <chr> NA, "https://pbs.twimg.com/profile_ban...
## $ profile_background_url  <chr> "http://abs.twimg.com/images/themes/th...
## $ profile_image_url       <chr> "http://pbs.twimg.com/profile_images/6...
## $ lat                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ lng                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ created_at_pst          <dttm> 2018-11-07 06:58:30, 2018-11-07 06:57...
## $ year                    <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 20...
## $ month                   <dbl> 11, 11, 11, 11, 11, 11, 11, 11, 11, 11...
## $ day                     <int> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,...
## $ hour                    <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,...
## $ minute                  <int> 58, 57, 57, 56, 56, 55, 55, 54, 53, 52...
## $ tweet_min               <dttm> 2018-11-07 06:59:00, 2018-11-07 06:58...
## $ tweet_hour              <dttm> 2018-11-07 07:00:00, 2018-11-07 07:00...

Below is a list of data types contained in this data set, and example variables that are that type.

  • <chr>character variables, or strings: screen_name and source
  • <dttm> date-time: created_at, and retweet_created_at.
  • <dbl>, <int> numeric (numbers): favorite_count, retweet_count
  • <list> list: hashtags, geo_coords

Lists are special data types that contain multiple entries per record. Here is what the hashtags variable looks like for the top 6 records.

## [[1]]
## [1] "fire"
## 
## [[2]]
## [1] "fire"    "weather"
## 
## [[3]]
## [1] "fire" "va"  
## 
## [[4]]
## [1] "NandhaGopalaKumaran" "NGKFeverBegins"     
## 
## [[5]]
## [1] "HudsonValley"   "fire"           "newyork"        "pleasantvalley"
## [5] "library"       
## 
## [[6]]
## [1] "Fire"    "recalls"

Dates and Times

Date-time measures <dttm> can be tricky to work with. I have already created a few simpler date/time variables for you. First note that Twitter stores all dates and times at the coordinated universal time UTC (or GMT).

All times are stored as 24 hour (military) time. Midnight is 00:00, Noon is 12:00, your class runs from 19:00 to 22:00.

## # A tibble: 6 x 8
##   created_at          created_at_pst      month   day  hour minute
##   <dttm>              <dttm>              <dbl> <int> <int>  <int>
## 1 2013-11-07 23:31:57 2013-11-07 15:31:57    11     7    15     31
## 2 2013-11-08 01:29:53 2013-11-07 17:29:53    11     7    17     29
## 3 2013-11-08 03:09:23 2013-11-07 19:09:23    11     7    19      9
## 4 2013-11-09 16:53:45 2013-11-09 08:53:45    11     9     8     53
## 5 2013-11-19 23:20:40 2013-11-19 15:20:40    11    19    15     20
## 6 2013-11-26 22:40:05 2013-11-26 14:40:05    11    26    14     40
## # ... with 2 more variables: tweet_min <dttm>, tweet_hour <dttm>
  • created_at This is at UTC. 15:00 (3pm) UTC is 07:00 (7am) Pacific time
  • created_at_pst This is our local time. This is the one we want to use.
  • I have not tested yet to see what happens with the time change. I’m going to assume the functions in the lubridate package work the appropriate magic for now, and check later when we get tweets that span the time change.
  • year/month/day/hour/minute - separated components of the date and time
  • tweet_min Tweets are recorded to the second. This variable rounds the tweet to the nearest minute.

Geotagging

The location of the tweet is stored in the variables lat and lng. better plot.

How much location data do we even have?

## 
## FALSE  TRUE 
##  1501 71247

Very little. Only 1 records out of the 72748 contain location data. This is important to remember when we think about generalizability. Not everyone has their location tracking turned on.

Media Variables

The media variables let you look at any videos, pictures or gifs that are attached to tweets. It’s important to note that not all of the tweets have media information.

For this particular version of the dataset, we have 72748 tweets and only 34.84% have media data.

The following examples use one media story in particular that I found interesting.
* media_url: contains the url for the media (picture or video) + Can be viewed online.

## [1] "http://pbs.twimg.com/media/Driv9H9U4AAMdoV.jpg"

http://pbs.twimg.com/media/Driv9H9U4AAMdoV.jpg

  • ext_media_url: contains the same url for the media (picture or video)
    • Same page as media_url
## [1] "http://pbs.twimg.com/media/Driv9H9U4AAMdoV.jpg"

http://pbs.twimg.com/media/Driv9H9U4AAMdoV.jpg

  • media_t.co: contains the url for the article on twitter that contains the media
    • Can be accessed online
## [1] "https://t.co/OdPj8DGgs8"

https://t.co/OdPj8DGgs8

  • ext_media_t.co: contains the same url as media_t.co
    • Can be accessed online
## [1] "https://t.co/OdPj8DGgs8"

https://t.co/OdPj8DGgs8

  • media_expanded_url: contains a url for access to the same article as media_t.co variable, but uses the full url instead of the shortened url that media_t.co uses.
## [1] "https://twitter.com/abcWNN/status/1060791248095600640/video/1"

https://twitter.com/abcWNN/status/1060791248095600640/video/1

  • ext_media_expanded_url: Another copy of the media_expanded_url
## [1] "https://twitter.com/abcWNN/status/1060791248095600640/video/1"

https://twitter.com/abcWNN/status/1060791248095600640/video/1

  • media_type: describes the type of media (video, photo, gif)
## [1] "photo"

Verified accounts

Accounts of public interest, the authenticity of which is denoted by a blue checkmark or “badge”. Typically, these accounts are maintained by users in music, acting, fashion, government, politics, religion, journalism, media, sports, business, and other key interest areas.

## 
## FALSE  TRUE 
## 67802  4946

Official News Organizations

Rtweet itself will not return any data that will indicate whether the account is an official news organization. In order to determine if the twitter account is an official news organization we can consider the number of followers an account has, whether it is verified, and whether it indicates it’s a news outlet through their account description and/or username. While this will give us decent results, we also know that there will be room for error due to the potential for missing news organizations as well as including some accounts that are not news organizations.

## # A tibble: 6 x 20
##   user_id screen_name name  location description url   protected
##   <chr>   <chr>       <chr> <chr>    <chr>       <chr> <lgl>    
## 1 759251  CNN         CNN   <NA>     "It’s our ~ http~ FALSE    
## 2 807095  nytimes     The ~ New Yor~ "News tips~ http~ FALSE    
## 3 287854~ ABC         ABC ~ New Yor~ "All the n~ http~ FALSE    
## 4 487118~ XHNews      Chin~ Headqua~ "China ins~ <NA>  FALSE    
## 5 145119~ HuffPost    Huff~ <NA>     "At HuffPo~ http~ FALSE    
## 6 6017542 BreakingNe~ Brea~ NYC, LA~  <NA>       http~ FALSE    
## # ... with 13 more variables: followers_count <int>, friends_count <int>,
## #   listed_count <int>, statuses_count <int>, favourites_count <int>,
## #   account_created_at <dttm>, verified <lgl>, profile_url <chr>,
## #   profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.2.1     stringr_1.4.0     rtweet_0.7.0      dplyr_0.8.99.9002
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4        pillar_1.4.3      compiler_3.6.2   
##  [4] tools_3.6.2       digest_0.6.25     lubridate_1.7.4  
##  [7] jsonlite_1.6.1    evaluate_0.14     lifecycle_0.2.0  
## [10] tibble_3.0.0      gtable_0.3.0      pkgconfig_2.0.3  
## [13] rlang_0.4.5.9000  cli_2.0.2         yaml_2.2.1       
## [16] xfun_0.9          withr_2.1.2       httr_1.4.1       
## [19] knitr_1.24        generics_0.0.2    vctrs_0.2.99.9011
## [22] grid_3.6.2        tidyselect_1.0.0  glue_1.4.0       
## [25] R6_2.4.1          fansi_0.4.1       rmarkdown_1.15   
## [28] purrr_0.3.3       magrittr_1.5      scales_1.0.0     
## [31] ellipsis_0.3.0    htmltools_0.3.6   assertthat_0.2.1 
## [34] colorspace_1.4-1  labeling_0.3      utf8_1.1.4       
## [37] stringi_1.4.6     lazyeval_0.2.2    munsell_0.5.0    
## [40] crayon_1.3.4