CSY-014. An exploratory data analysis of public school demographics.
PROJECT COMPLETED: January 2021
Exploratory Data Analysis
U.S. Department of Education Database | Public School Data by State
What is Exploratory Data Analysis?Exploratory data analysis (EDA) is a technique used by data scientists to inspect, characterize and briefly summarize the contents of a dataset. EDA is often the first step when encountering a new or unfamiliar dataset. EDA helps the data scientist become acquainted with a dataset and test some basic assumptions about the data. By the end of the EDA process, some initial insights can be drawn from the dataset and a framework for further analysis or modeling is established.
Work Process:
Here’s the process I used to complete this project—
Pre-Work
In this exploratory data analysis I explored a dataset of information on public schools in the United States. The underlying data was made freely available to the public and I obtained it from the U.S. Department of Education website.
Step 1 - Prepare the Workspace
Since I was working with a fairly large dataset, I decided that Python was the best tool to use to analyze it. I used a Jupyter Notebook to run my code.
Step 2 - Describe the Characteristics of the Dataset
First, I wanted to wrap my head around the dataset by finding out exactly what size it was and specifically what kind of information (variables) it contained.
Step 3 - Summarize the Dataset
Next, I did some basic calculations to produce a set of summary statistics.
Step 4 - Visualize the Dataset
Then, I made a number of visualizations that helped me to better understand what the data was saying.
Step 5 - Identify Insights
Finally, I could draw some insights from the dataset. Here are the main takeaways:
The dataset consists of 51 rows and 42 columns
The dataset consists of:
student enrollment
school staffing
student demographic information
There are 51 rows and 42 columns in the dataset. None of the rows are blank.
The dataset contains totals per state of the number of students in (2) gender categories and (7) race/ethnicity categories.
2018-19 US public school total enrollments by demographic group are as follows:
25.8 million male students
24.4 million female students
473K American Indian/Alaska Native students
2.6 million Asian or Asian/Pacific Islander students
13.7 million Hispanic students
7.6 million Black students
23.7 million White students
176K Hawaiian Nat./Pacific Isl. students
2 million multiracial students
The states with the highest number of Black public school students are: Florida, Georgia and Texas
The states with the highest number of Hispanic public school students are: California and Texas
The state with the highest number of Asian or Asian/Pacific Islander public school students is California. New York and Texas are a distant second and third.
The states with the highest number of American Indian/Alaska Native public school students by far is Oklahoma
The state with the highest number of Hawaiian/Pacific Islander public school students by far is Hawaii
The states with the highest number of White public school students are: California and Texas