Data Cleaning

What data processing steps were taken to prepare the data for analysis?

Data information

Kinda like a codebook. What information is available?

Data access

Where is the data? How can I get it into R to start exploring?

The processed data files are in our class Google Drive under Twitter project docs/data. A copy of these data files dated 03-29-2020, as well as the starter_code.Rmd has been placed in a new R Cloud project called Twitter project.

  • You will need to click Dr. D’s Twitter project link in R studio cloud to make a copy of this workspace for yourself.
  • Delete the data set for the topic you are not studying
  • Download the most recent version of the data from Google drive, and upload to R Studio cloud.

Hydration station

Do you want to help hydrate tweets for Covid19? Requirements:

  • Mac, Windows or Linux only. No ipads or Chromebooks.
  • Have a Twitter Account
  • Be detail oriented and able to follow instructions.

If you’re in, here’s what to do.

  1. Read the tutorial from the Programming Historian
  • Only read this section linked. Stop at “Outputs and how to use them” (the map). You can read the entire tutorial but you’ll get data in another way - see step 2.
  1. Go to the raw covid data folder in our class Google Drive.
  2. Pick a .txt file containing ID’s to work with.
  3. Download this file to your computer. Open it up with notepad or textedit or something similar.
  4. Go through and remove any truncated id’s that look like 1.24534345349534E+18 instead of 1234345934593845735.
  5. Follow the historian’s tutorial to hydrate those tweets!
  6. When done, upload the .json and the .txt file to the /have_been_hydrated folder, and the .csvto the main /covid-raw folder.

Post any questions in the #coding-help Slack channel.