I’m working on a Computer Science exercise and need support.
For each data source listed below, do the following:
1. Write out the steps you’d use to clean/wrangle the data for your database. This can be a simple step by step process, pseudocode, sql, python, perl, or other language. These steps can be used to clean the data, remove unnecessary information, or normalize. If it’s just a website, write down what data you’d scrape from it.
2. Find one other dataset public or private that you could use in conjunction with this dataset.
3. What inferences could be made by using these two datasets together.
a. 2017-2014 Candy Hierarchy Data – Data from a survey across 4 years showing peoples preference in halloween candy. https://www.scq.ubc.ca/so-much-candy-data-seriousl… (Links to an external site.)
b. FDA’s National Drug Code Directory – https://www.fda.gov/drugs/drug-approvals-and-datab… (Links to an external site.)
c. The Avengers Death Database – https://github.com/fivethirtyeight/data/tree/maste… (Links to an external site.)
d. Bachelor/Bachelorette Dataset – https://github.com/fivethirtyeight/data/blob/maste… (Links to an external site.)
e. Daily Show Guests – https://github.com/fivethirtyeight/data/blob/maste… (Links to an external site.)