Highlands County Address Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

Goals and Status

Map of Highlands County in FL

This import aims to add street addresses throughout Highlands County, FL, US for most residential and commercial areas.

Highlands County is a mostly rural county in Central Florida with adequately (though not completely) corrected TIGER data. Mapped locations are scattered and mostly limited to select commercial centers and throughways, and named natural areas. Address data will help simplify the process of adding POIs across the county.

Source

Geographical data comes from the Highlands County Property Appraiser Office in the form of a parcel shapefile and an attribute table containing:

  • Parcel numbers
  • Department of Revenue use codes
  • Neighborhood codes
  • Owner names and mailing addresses
  • (Possible) addresses of the parcel
  • A bunch of other tax and legal jargon

Except for parcel numbers, use codes, and parcel addresses, the rest of the data will be stripped off for privacy and usability.

As written on the OSM wiki page for Florida and this Wikipedia page, publicly created data is public domain and able to be used for OpenStreetMap.

Data Preprocessing

The code I used for processing the attributes is located at the end of this page. I used data from the source above and R in digitizing (I liked R because of the data.frame structure, as opposed to a Python list of lists).

Here's my methodology with QGIS 3.4 LTR

  1. Load parcel shapefile
  2. Fix geometries
  3. Create centroids from each parcel
  4. Join address data from processed attributes above
  5. Removed centroids with DOR codes of:
    • 00 - Vacant
    • 09 - Common Elements Area
    • 28 - Park Lots
    • 38 - Golf Course
    • 50 - Improved Agriculture
    • 53 - Cropland
    • 58 - Timberland
    • 63 - Grazing
    • 66 - Groves
    • 67 - Poultry/Fish/Bees
    • 69 - Ornamental
    • 70 - Vac Institutional
    • 81- Military
    • 82 - Forest
    • 84 - Colleges
    • 88 - Federal
    • 89 - Municipal
    • 94 - Rights of Way
    • 97 - Rec and Park
    • 9802
    • 99 - Non-Ag Acreage
    • N.
  6. Manual edit of centroid placement onto buildings, or removal if the above edit didn't remove the one-address-large-area issue I'm trying to avoid. Also trying to remove buildings I know have multiple addresses e.g. mixed use developments. I'm relying on ground-level knowledge and ESRI satellite imagery.

Merging In

I will validate my data against the OSM data and privilege preexisting OSM data. I am dividing the county into blocks based on major roads and boundaries and hand-comparing address data to import with preexisting OSM data. Where preexisting address data exists, I'll be deleting my import's node. Where data exists for a place but it doesn't have address data, I will copy the attributes from my import's address node onto the preexisting geometry's attribute, without overriding preexisting data (this is the case for a lot of commercial chains in the county).

Timeline

November - work with parcel and address data

early December - proposal to OSM mailing list

late December - import into OSM

LATE DECEMBER - address data fully imported

January - manual data verification, conversion/removal of hcpaogis DOR codes into OSM amenities

Code

### Address.r
### Address preprocessing

library(tidyverse)

# Read your data from wherever it is
# data <- read.csv("vac_impr.txt", sep = "|", col.names = c(1:80), stringsAsFactors = FALSE)

# Read the replacement table
# repl <- read.csv("RepTable.csv")

# Data preprocessing
data <- data[,c(1,2,4,13,14,15,16,17,18,19)]
colnames(data) <- c("PARCEL","DOR","NEIGHBORHOOD","HOUSE","ST_PR","ST_NAME","ST_SUF","ST_SUFDIR","CITY","ZIP")

# Expand abbreviations
for (r in 1:nrow(data)) {
  for (col in c(4:7)) {
    if (data[r,col] == "" | is.na(data[r,col]) | is.null(data[r,col])) {
      next
    }
    for (r2 in 1:nrow(repl)) {
      if (as.character(data[r,col]) == as.character(repl[r2,1])) {
        data[r,col] <- as.character(repl[r2,2])
        
      }
    }
  }
}

data <- unite(data, "Street",starts_with("ST_"))

# results in some formatting errors, manual correction here
data$Street <- gsub("^_", "", data$Street)
data$Street <- gsub("_$", "", data$Street)
data$Street <- gsub("_", " ", data$Street)
data$Street <- str_to_title(data$Street)
data$Street <- gsub("Us ", "US ", data$Street)
data$Street <- gsub("Sr ", "SR ", data$Street)
data$Street <- gsub("Cr ", "CR ", data$Street)

data$NEIGHBORHOOD <- str_to_title(data$NEIGHBORHOOD)
data$CITY <- str_to_title(data$CITY)

# Write the data out

write.csv(data, "Address.csv")

Address.r is used to pre-format the address data. What I've found is that this corrects "SUN N LAKES BLVD" to "Sun North Lakes Boulevard", and any street named with St. to Street (e.g. Saint Andrews). In these cases I caught them post-upload and edited them quickly.

STRING,REPL
NW,Northwest
NE,Northeast
SE,Southeast
SW,Southwest
NW,Northwest
NE,Northeast
SE,Southeast
SW,Southwest
BLVD,Boulevard
ST,Street
AVE,Avenue
WY,Way
DR,Drive
CT,Court
LN,Lane
RD,Road
PL,Place
CIR,Circle
TER,Terrace
N,North
E,East
S,South
W,West

String replacement table used in Address.r. Note that this replacement table misses abbreviations TRL, PKWY, HWY

### ParcelMiner.py

import requests
import urllib.request
import time
import csv
import re
from bs4 import BeautifulSoup

with open("STRAP.csv", "r") as csvFile :
	reader = csv.reader(csvFile)
	strap_list = list(reader)

for strap in strap_list :
	url_base = "https://www.hcpao.org/Search/Parcel/"
	STRAP = strap[0]
	url = url_base + STRAP
	
	print('Now searching', STRAP)

	response = requests.get(url)
	if response.status_code != 200 :
		continue
	
	soup = BeautifulSoup(response.text, "html.parser")

	address = soup.find(class_ = "col-xs-12 col-sm-7")
	address = str(address.get_text(",", strip=True))
	
	other = soup.find(class_ = "col-xs-12 col-sm-5")
	
	i=0
	for content in other.contents :
		if re.search("Owners", str(content)) :
			lower = i + 2
		if re.search("Mailing", str(content)) :
			upper = i - 4
			mail_limit = i + 2
			break
		i += 1
	
	temp_str = ""
	for content in other.contents[lower:upper] :
		if str(content) != "
" :
			temp_str = temp_str + str(content)
	temp_str = temp_str.replace("\r\n", "")
	owners = temp_str
	
	mailOwners = str(other.contents[mail_limit][4:]) + ',' + str(other.contents[mail_limit+2][8:])
	
	otherSub = other.findAll('a')
	DOR = str(otherSub[0].get_text())
	neighborhood = str(otherSub[1].get_text())
	
	with open("Parcel_Info.csv", "a") as csvFile :
		writer = csv.writer(csvFile)
		row = [STRAP, address, owners, mailOwners, DOR, neighborhood]
		writer.writerow(row)
	csvFile.close
	
	time.sleep(1)

I made this Python script to mine data about the parcels in Highlands County from the HCPAO website. Some information was available more accessibly though not to the extent of the data mined with this. Generate a list of STRAPs from the freely available parcel shapefile on HCPAO's website.