Import/Catalogue/Queensland GTFS Data

From OpenStreetMap Wiki
Jump to navigation Jump to search

Goal

Complete an import of gtfs stop data maintained by the Queensland Government Department of Transport and Main Roads (TMR). This import will also conflate existing South East Queensland (SEQ) manually surveyed bus stop nodes imported in 2009 (See QROTI page for details. There are currently 6,600 nodes imported from this privately owned dataset).

Schedule

The project is currently focused on importing Bus stop data with possible extensions to include routes a future consideration. Should routes be included within this import scope then they will only be completed after stop node importing is complete. Route and Route Master creation may utilise the gtfs data from this dataset and it may not. Consideration to creation of a Wiki page specifically dedicated to Route creation is being made. (Reference here is the Bus Routes in London Wiki page )

Import Data

Background

Data source site: http://data.qld.gov.au
Data license: Creative Commons Attribution
Link to permission/attribution: Queensland Government data (Department of Transport and Main Roads - TMR)

Data description

There are 2 recognised regions in Queensland with regards to the datasets concerned. Both are in GTFS format

South East Queensland (SEQ)

South East Queensland (SEQ) contains data for the Capital City area and surrounding regions (Sunshine Coast, Gold Coast & Ipswich). This Mapcraft pie shows the extent of the single GTFS file for the SEQ region. It will be used for importing and QA of this region given the high number of bus stop nodes already existing within this geography.

Regional Queensland

In the context of this import the second dataset is for the remaining regional centres throughout Queensland and this contains town named files. Note: Mapcraft will not be used for this town based dataset. Each town is geographically separate so each importer will work on and complete a town in isolation so as not without risk of duplication of effort. The table below shows the status of each town import along with any notes the mapper chooses to provide:

Name Import Status Notes (Mappers to include a link to the changeset(s) of completed imports in this section)
airlie node (view, Github) Incomplete This data, sourced from a prior version of this dataset has been manually imported and reviewed prior to this exercise (See changeset: 19628395 (This was done by the author of this Wiki page prior to understanding the correct import process. If necessary this can be reverted - @imports will be asked to advise on this point. If deemed okay the tagging will need to be updated to reflect the information in this Wiki page)
bowen node (view, Github) Incomplete
bundaberg node (view, Github) Incomplete
cairns node (view, Github) Incomplete #139700396
gladstone node (view, Github) Incomplete
gympie node (view, Github) Incomplete
innisfail node (view, Github) Incomplete
kilcoy node (view, Github) Incomplete
mackay node (view, Github) Incomplete
magnetic-island node (view, Github) Incomplete
maleny-landsborough node (view, Github) Incomplete
maryborough-herveybay node (view, Github) Incomplete
rockhampton node (view, Github) Incomplete
sealink node (view, Github) Incomplete This is a Ferry Terminal import. There are only 2 nodes.
toowoomba node (view, Github) Incomplete #139699966
townsville node (view, Github) Incomplete #139699017
yeppoon node (view, Github) Incomplete

Import Type

This is a one-time import, but it will require periodic reviews to keep the information up-to-date. JOSM is the editor being used for this import. Josm will be used for checking, validation, uploading and reverting of changesets if needed.

GO-Sync

A modified version of GO-Sync will be used to correlate the agency data with the existing OSM data. From GO-Sync, a changeset will be exported using the dummy upload option. This will be verified and uploaded using JOSM.

Data Preparation

Data Reduction & Simplification

The github repo [1] that holds each source file has the following file variants included in each folder:

  • stops.txt (The raw, unedited source file)
  • stops.csv (file extension renamed, File header change to allow for a JOSM import (see tables below for transformation of header information)
  • stops.osm (The saved file after importing into JOSM using the opendata plugin)

Tagging Plans

Public transport schema 2 will be used for tagging. This is the map between source attributes and OSM tags.

Bus stops (qconnect dataset - stops.txt > stops.csv > stops.osm)

File GTFS attribute CSV Header OSM tag
stops.txt stop_id gtfs_id gtfs_id=*
stops.txt stop_name name name=*
stops.txt stop_url url url=*
stops.txt stop_lat lat
stops.txt stop_lon lon
highway=bus_stop
public_transport=platform
bus=yes

Bus stops (SEQ dataset - stops.txt > stops.csv > stops.osm)

This file will have the station data removed from it. This station data will be retained in a new file that does not come with the dataset called stations.csv (See table below for how this content will be handled). Nodes that have a parent_station=* OR platform_code=* will not have the highway=bus_stop, public_transport=platform or bus=yes applied. During manual verification of these nodes a determination of whether these are Bus Station or Rail Station nodes will be made and the appropriate key:value=* will be applied. These nodes have been imported and saved to the stops.osm file for the SEQ dataset.

File GTFS attribute CSV Header OSM tag
stops.txt stop_id gtfs_id gtfs_id=*
stops.txt stop_name name name=*
stops.txt stop_url url url=*
stops.txt stop_lat lat
stops.txt stop_lon lon
highway=bus_stop
public_transport=platform
bus=yes

Bus & Rail Stations (SEQ dataset - stops.txt > stations.csv > stations.csv)

The stations.csv file is a subset of the original stops.txt source file. The location_type is the field used to delineate between stops and stations. (See table below for how this content will be handled)

File GTFS attribute CSV Header OSM tag
stops.txt stop_id gtfs_id gtfs_id=*
stops.txt stop_name name name=*
stops.txt stop_lat lat
stops.txt stop_lon lon
public_transport=station
railway=station OR amenity=bus_station ^
agency.txt agency_name network=TransLink SEQ

^ The railway=station will be applied, as a node, to the track way it is associated with as part of a manual verification of each station. The SEQ geography has a rail network along with a Busway network so the distinction of each will be done manually rather than programmatically. Where no track ways exist but Bing aerial imagery suggests this is a rail station then the node will be placed at the coordinates provided and the railway=station tagging will be applied.

Changeset Tags

We will use the following changeset tags.

Data Transformation

TBA

Data Transformation Results

TBA

Data Merge Workflow

There is a significant amount of existing bus stop data already in OSM for the SEQ region of the state (>10,000 Nodes with "highway"="bus_stop" tagging). The merging and/or deprecation of these nodes will be considered as part of this workflow.

Within this existing dataset there are nodes the were created from an import in 2009 (Approximately 7000). The sourcing and accuracy of this dataset have been verified and the decision taken will be to retain existing stops and conflate the new tags into them. Actions taken against each existing QROTI tags are documented in the table below. The existing QROTI Wiki page will be updated with a circular reference to this section to ensure future maintainers are aware of the veracity of this dataset and the changes made as a result of this import.

QROTI Tag Handling

Counts are sourced from taginfo and are current from: 2014-03-19 23:58 UTC.

Count @ 2014-03-19 Current Count Key Action
6,622 qroti:place_id Remove - Marker was for a site that no longer exists
6,529 qroti:mode Remove - No documentation exists
6,526 qroti:mode_name Remove - No documentation exists
6,366 qroti:surveyed Retain - This is when the bus stop was last surveyed in the field
6,331 qroti:url Remove - These URLs no longer exist
4,646 qroti:stop_num Remove - Imported gtfs_id deprecates this value
2,014 qroti:name Remove - Imported name deprecates this value
1,245 qroti:name_onsite Remove - Imported name deprecates this value
182 qroti:part Remove - Individual platforms are included in the import dataset and will be mapped
41 qroti:major_location Remove - Stations are now being mapped (See Bus and Rail Stations)
1 qroti:fare_zone Remove

Team Approach

TBA

References

This import was discussed in talk-au.

Workflow

  1. For the Initial SEQ import, a separate OSC file will be created for each of the applicable GO-Sync "Stops to view" categories ("New GTFS stops with Potential Matches in OSM", "New GTFS stops with No OSM Matches", "Existing stops with Updates") using the dummy upload option. For the "New GTFS stops with Potential Matches in OSM", category, the GTFS stops will be matched to the existing OSM nodes.
    In go-sync, stops can be added for export individually or category-at once ("Upload All"). During the matching process, notes will be taken for stops that will need to be created after the existing stops have been matched.
  2. The OSC files will be converted to OSM using osmconvert. osmconvert DUMMY_OSM_CHANGE.txt > category.osm
  3. These will be imported into JOSM and using the JOSM search function with the following queries:
    query results
    gtfs_location_type=1 Bus & Rail Stations
    parent_station|platform_code platforms and sublocations
    to find the appropriate nodes to find the stops and stations that require manual verification. Once this is done the Network tag will be verified. Then all remaining extraneous tags will be removed.
  4. The set will then be exported and uploaded to the github repository.

"New stops with no matches" should be run again after importing "New GTFS stops with Potential Matches in OSM" into OpenStreetMap as there may be new stops that were unable to be matched (as they may have been stops within the 400m inclusion zone).

Bus stops

Other more detailed workflow is TBA.

Quality Assurance

TBA