Project 3: 911 Call Data Mining
Data Mining
Association:
Apriori Algorithm
Supervised learning (classification)
Decision Tree
Bayesian Classification
SVM-Support Vector Machine
K-Nearest Neighbor
Random Forest
Two approaches to avoid overfitting: prepruning, postpruning
Unsupervised learning (clustering)
Challenges:
This is a spatio temporal data mining problem. We have LiDAR data for the Downtown Lincoln, which consists of 734,000 points; 911 services calls data crawled from government sites; NOAA weather data. We also contacted the electricity system and water system. But they said the data are sensitive and cannot disclose.
The big problem is that we have different kinds of data based on location. However, they are not overlapped with each other. If the overlapped part dataset is small, it is highly possible that the result would be overfitted. On the other hand, the datasets are not exactly overlapped.
So we first write a program to clean the data and to find the relatively high overlapped data points. However, the data is so big and our program is crashed on my computer. Then we think whether we can use the computer center to process our data. But for student they only allocate small memory to do this. Finally, we came up an idea whether we can plot the data in different color, and see which area has high density. Then it worked!
Last updated