Wednesday, April 5, 2017

Exercise 6: Normalizing, Geocoding, and Error Assessment

Goals and Objectives

This lab required tedious normalizing of excel data about sand mine address locations that were not completely filled out correctly. The goal of this lab was to take those address and geocode that set of data given to us. We used this data to achieve a greater understanding of how normalization is key to providing correct reference data. The more accurate your data starts out with, the easier it is to geocode and locate the mines on the map.

Methods

The first step into this lab was to obtain the sand mine data that was given to me, and normalize the data in an Excel spreadsheet. I had to add new fields such as Street address, zip-code, and state in order to work with the geocoder (figure1).

Figure 1. Excel data given to me and normalized on the specific mines to geocode and locate

After normalizing the data was finished, I could now bring the table into ArcMap and use the geocoding tool. I had to set up the geocoding tool by selecting the correct columns, for example, Address to field Adress, City to the field City, and so on. The geocoder used was the "World Geocoding Service" to find and match the address. Even though there was a high percentage of matched locations, I still went and located and moved over half of them. I was able to geocode most of the addresses but some of the mines were only in the Public Land Survey System (PLSS) (figure 2). These mines that were only in the PLSS had to be manually located by using the description within the excel spreadsheet and the Wisconsin DNR data. In order to manually place the points, in the Interactive Rematch window for geocoding, I used the "Pick Address from Map" to relocate a specific point for it to be correct (figure 3). 

Figure 2. The results window after geocoding the sand mine data.

Figure 3. Interactive Rematch window where the "Pick Address from Map" tool is located near the bottom.

Lastly, I compared the results of my geocoded addresses with those of my classmates and those of the actual locations given to us by the WDNR. This required the use of the merge tool to input multiple datasets into a single new output dataset. Next I created specific layer by using query to specifically locate the mines I was assigned so they match up with the mines of my classmates and the mines of the actual locations. To find the distance between my specific mines, classmates, and the actual, was found using the Near tool and the average error was found using Statistics within the Near_Dist field.

Results
Map 1. Map of western Wisconsin Locating the differences in the mines that have been geocoded and those of true location.

In the map above, there was both location errors of a large magnitude and some that were less than 200 meters off. This is also shown in the data in Figure 4 below. Comparing data with my classmates had a larger average error to location distances than the actual locations (Figure 5). 

Figure 4. The queried mine locations between my own and the actual mine locations to show the error distances.

Figure 5. The queried mine locations between my own and some of my classmates mine locations to show the error distances.
Discussion

Multiple reasons come up to why there is such a difference in the distance between the geocoded points and the actual mine locations. In Lo they are listed out and discussed in a table. Gross errors is one that comes up but should not occur with this data because they are all assumed as sand mine locations. However, there could possibly be some systematic errors found within the data. A reason for this is the fact that not all of the mines were shown on the base map making it harder to be as accurate as possible. There could also be some random errors which would happen from either making a simple mistake or moving a point that was not supposed to be moved. This lab was a great lesson in learning how the different errors can really change the data results. Its very important to have accurate data when your using real life decision making skills. 

Inherent errors can be a large source of the errors in geocodings. This is when errors are made while digitizing the data the fact that each dataset was created differently by each of my classmates can mess up the data and therefore be misunderstood. Operation errors were also a considerable part of the lab because most of the students may have used the data incorrectly or made a mistake when inputting or using the data.

In order to get the correct points compared to the ones that are incorrect, a great way to go about it would be to obtain a complete list of all the addresses of each mine. Without this information however, as a group we could go through each point individually and discuss where the correct address is located on the map. Another helpful measure is acquiring the latitude and longitudinal data for each of the mines.

Conclusion
Normalizing data, understanding the data, and accurately using the data is extremely helpful when geocoding in ArcMap. In order to make the process easier, having a set of rules to the data would allow for a smaller chance of error later on. 


References:
Lo, C., & Yeung, A. (2003). Data Quality and Data Standards. In Concepts and Techniques in Geographic Information Systems (pp. 104-132). Pearson Prentice Hall.

No comments:

Post a Comment