Part 1, Project Overview Data Acquisition Report (Part 1 of 4)Due at end of week 2, Sunday at 11:59 p.m. ET
Submit a two-page (minimum) report describing your analysis focus area, your initial set of requirements,your hypothesis, and a description of your data acquisition. Describe your data investigations, sourceslooked at, initial review of data obtained, and data formats encountered. Provide initial impressions of datavalidity and quality. Consider the following:
1. Research potential data sources in your area of interest. Many sources are available from the U.S.Government and private organizations. For instance, check out and selectDatasets from the top.
2. Determine the focus area.
-Determine an area of interest in which to perform analysis and which has a data sourceavailable.
3. Develop Initial Requirements.
Access the results of the analysis that are to be achieved and develop specific requirements foroutcomes that you expect.
Identify data/information necessary as inputs to the analysis, which is specified based upon the requirements of those directing the analysis or customers (who will use the finished product of the analysis).
Develop a set of questions that you will attempt to answer with your analysis.
Develop specific variables regarding a population that you will attempt to obtained
Data may be numerical or categorical.
Avoid textual data unless you plan to perform some form of Natural Language Processing or word vectorization. This type of analysis is highly technical and not recommended.
4. Develop a hypothesis of what you expect to determine in your analysis.
A hypothesis is a supposition or proposed explanation based on limited evidence as a starting point for further investigation.
An initial hypothesis helps to guide your investigation and search for data to support the hypothesis and/or the null hypothesis
5. Collect Datasets and Information based on your chosen area of interest.
Collect data and information from a variety of sources, as required.
Evaluate your data sources for validity and quality.
Consider the following data characteristics:
Defined, Measurable, Unitized, Relatable, Normalized, and Quality.