Hamilton County Building Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

This proposal describes the import, now complete and being validated, of roughly 300,000 building footprints in Hamilton County, Ohio, United States. Hamilton County is the most populous county in the Cincinnati metropolitan area. Data is available from the Cincinnati Area Geographic Information System (CAGIS) via the City of Cincinnati under a public domain license. Imported building footprints include some address, height, and building use information.

Source

There are two datasets involved in this import. Both are from the same source.

  • Building footprints [1] contains attributes like estimated height, number of units, zoning use category and number of stories. The dataset contains 358,167 features.
  • Parcel polygons [2] contains address information. We are not importing parcel boundaries but rather use these to add addresses to building footprints (prior to import) where buildings can be matched to parcels unambiguously. The dataset contains 419,342 features.

Data comes projected in NAD83 Ohio south state plane, EPSG:3735.

License

On its website, CAGIS doesn't explicitly indicate the copyright status of the building and parcel datasets but only requires that a disclaimer be acknowledged. The City of Cincinnati, a CAGIS member agency, lists both datasets as being in the public domain.

Bogdan Petrea (Telenav), Nate_Wessel, and Minh Nguyen independently e-mailed CAGIS and each received this response:

If you agree with the disclaimer provided on our data site and provide CAGIS with a data creator credit, then you may use our data.

So, please make sure that if you want to use CAGIS data (available GIS layers), refer us properly and provide CAGIS disclaimer on your website.

CAGIS Disclaimer

THE PROVIDER MAKES NO WARRANTY OR REPRESENTATION, EITHER EXPRESSED OR IMPLIED WITH RESPECT TO THIS INFORMATION, ITS QUALITY, PERFORMANCE, MERCHANTABILITY, OR FITNESS A PARTICULAR PURPOSE. AS A RESULT THIS INFORMATION IS PROVIDED 'AS IS'. AND YOU, THE REQUESTER, ARE ASSUMING THE ENTIRE RISK AS TO ITS QUALITY AND PERFORMANCE.

IN NO EVENT WILL THE PROVIDER BE LIABLE FOR DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES RESULTING FROM ANY DEFECT IN THE INFORMATION. EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. IN PARTICULAR, THE PROVIDER SHALL HAVE NO LIABILITY FOR ANY OTHER INFORMATION, PROGRAMS OR DATA USED WITH OR COMBINED WITH THE REQUESTED INFORMATION, INCLUDING THE COST OF RECOVERING SUCH INFORMATION, PROGRAMS OR DATA.

"Available GIS layers" refers to the shapefiles available for download on CAGIS's main website. We use a subset from the CAGIS Open Data Portal and credit CAGIS in the form of source=CAGIS on changesets and have added their disclaimer to Contributors#CAGIS.

Preprocessing

Code used in the preprocessing of data prior to import, as described in the following sections, is be available on Github.

Conflation

Sample map showing conflation of buildings with existing OSM data. Buildings to be imported initially are shown in blue, those already in OSM in green.

The import will have two phases, each with a separate tasking manager project:

  1. (Completed) Add CAGIS buildings that do not intersect with existing or recently deleted OSM buildings.
  2. (Planned) Manually conflate the remaining CAGIS buildings with OSM buildings. Where the CAGIS geometry and tags are superior to OSM data, this JOSM plugin will be used to transfer the geometry and tags to the existing way, preserving history.

When we started, there were already ~62,000 building footprints in the county contributed by OSM users. 82.5% of the import data (~295,000 buildings) however did not intersect with these. Buildings in the central neighborhoods were largely complete, so initially, the import has mostly affected suburban parts of the county. This makes things a bit easier as most of the buildings to be imported do not share nodes with existing geometries.

Very often, a building is deleted from OSM when the physical building is demolished. To avoid restoring demolished buildings, the conflation process will account for deletions that occurred after the CAGIS dataset was last updated. In case the deletion merely represented the replacement of one way with another way, CAGIS buildings that intersect with recently deleted OSM buildings will be included in the second phase for manual conflation. Participants are also tagging many demolished buildings as demolished:building=*, to keep these buildings from being restored either in the import or by armchair mappers by accident.

Address assignment

Sample map showing tentative identification of minor outbuildings that would not be assigned addresses.

Assigning addresses based on parcels is simple in some cases and more complex in others. To avoid using sliver parcels, we first only consider a building as belonging to a parcel if 90%+ of its footprint overlaps the parcel. We start by assigning addresses in the simplest case where there is a clear 1:1 correspondence between parcels and buildings.

Next we move to parcels that match to multiple buildings. Most of these contain minor outbuildings like sheds or garages that are subsidiary to a major building like a house. We want to assign an address only to the major (largest) building in these cases. Some large parcels have many buildings (e.g. > 20), so in this step we only look at parcels with 2 or 3 buildings.

What remains are cases where address assignment is more ambiguous.

  • Parcels with more than three buildings
  • Buildings that sit across multiple parcels
  • Multiple parcels with identical geometry underlying one or more buildings

This last case is seemingly used to indicate multiple ownership on the same parcel such as condos or duplexes. In these cases, we assign multiple semicolon-separated addresses to a single buildings sitting on multiple identical parcels.

As one final attempt to match addresses, any buildings sitting across multiple parcels, but where only one parcel has an address, and where the building is larger than 1000 square feet to avoid outbuildings, are assigned addresses from that parcel.

In total, the combination of these techniques, escalating from simplest to more complex produces address assignments (or correct non-assignments) for 86.5% of all buildings. The contribution of each method is displayed in the table below.

Methods of assigning addresses to buildings
Address assignment category Number of buildings % of total Cumulative %
1:1 match with parcel 166900 46.6% 46.6%
multiple buildings per parcel (major building, address assigned) 62300 17.4% 63.9%
multiple buildings per parcel (minor outbuilding, no address) 67795 18.9% 82.9%
single building, multiple identical parcels, multiple housenumbers with semicolon 900 0.25% 83.2%
building larger than 1000sqft overlapping single parcel with address 11,875 3.3% 86.5%

The remaining buildings not assigned addresses are quite scattered and present many issues that will just need to be dealt with manually at a later date, or by import with a better address dataset in future.

Simplification

Many buildings contain points midway along an essentially straight line. Such points are simplified away with a tolerance of 0.2 meters. Topology is maintained where buildings share nodes.

Tag mapping

Only the building footprints are being imported but some tags are be drawn from the parcel dataset. The buildings dataset has information on building height (including levels) and use category. The parcel dataset has address information. The address information goes into two tags, addr:housenumber=* and addr:street=*. addr:street=* is be derived from two fields: a name for the street and the street suffix, e.g. Avenue, Road, etc. Street addresses were be expanded from abbreviated forms and checked for Title Case Capitalization. Details on the method of address assignment from parcels are in the next section.

Source field OSM tag
addrno (parcels) addr:housenumber=*
addrst + addrsf (parcels) addr:street=*
storyabove (buildings) building:levels=*

Buildings also have some information on use that could be mapped into various building=* tags.

cwwuse
Source value OSM value Building Count % of total
APART building=apartments 6958 1.94%
INDUST building=industrial 1412 0.39%
MNFTRG building=industrial 555 0.15%
MLTFM building=residential 12674 3.54%
RESDNT building=residential 156048 43.57%
SCHOOL building=school 260 0.07%
GENBUS building=commercial 7859 2.19%
NULL building=yes 168422 47.02%
anything else building=yes 4006 1.12%

Known Quality Issues

This is a list of data quality issues discovered during import. Keep an eye out for them as you validate.

  • Some larger sheds/garages are tagged building:levels=2 or building:levels=3, which often seems implausible.
  • Many buildings have one extra node that should have been removed by simplification. These should be removed if possible.
  • Some buildings need squaring - orthogonality seems to vary a lot by neighborhood.

There is occasional disagreement between addr:street names and the names on streets, for example whether a way is named '...Street' or '...Road'. These can be hard to catch during editing - a query after the import may be the best way to catch these cases. This usually effects all buildings on a street.

Workflow

Tasking manager projects:

  • Initial import (Complete and being validated)
  • Manual conflation (planned)

Schedule

The first phase of the import began in December 2018 and is mostly complete. Some validation work still remains.

Minh Nguyen began undertaking the second phase manually on December 25, 2021. This phase can migrate to the tasking manager if others express interest in collaborating on it. There are 62,106 ways to conflate with existing ways.

  • December 31, 2021: Finished conflating 6,150 buildings to the northeast of I-71 and I-275.
  • January 10, 2021: Finished conflating 440 buildings north of I-275 between I-75 and I-71.

Contributors

Contributors will use special-purpose import accounts with names ending in _cincyimport.