User:EmericusPetro/sandbox/OpenStreetMap CLDR equivalent/Ideas

From OpenStreetMap Wiki
Jump to navigation Jump to search

Package ideas for OpenStreetMap CLDR equivalent

information sign

This article is a stub. You can help OpenStreetMap by expanding it.

Main text at User:EmericusPetro/sandbox/OpenStreetMap_CLDR_equivalent

Main list

A. Specification of file containers

Publish the specifications of at least the most common types of files used to transport the OpenStreetMap data (see also OSM file formats) seems to be a great starting point.

Files such as XML, JSON and Protobuf can use one or more strategies (e.g. how to encode in a way that existing tools can even test validity of data transfered using it) to specify in machine readable format. It's not to say that Level0 would be infeasible, but that regardless of the tagging of the data, the schema of the files that transport the data are great candidates.

B. Literal specifications of tagging

About the feasibility, different distinct projects already done or continuously still doing data mining from tagging on OSM wiki, such as Taginfo and (at some point) the bot used by Data Items. This idea is to do the same, but on a strictly documented machine readable file format which could then be published.

The literal list term in the title is that, unless decided otherwise, the suggestion is do few to none manual intervention to "correct" what's intentionally that way (such as not use this package to correct conflicts definitions).

This package would be a great candidate to run with incremental schema changes over the years and great candidate for automation. The naming/code might be more neutral (e.g. not state that is a Wiki tagging dump)

C. Enhanced specifications of tagging

As additional complement to the B, any additional significant post-processing, in special adding, removing or, for sake of consistency, opt for one convention over others, would be different objective. The temporary name "enhanced" vs "literal" is mostly that these tags likely would have more metadata not available on the full list of documented tags, but at same time may be focused on primary features on Map features.

Compared to the B, this package may not be feasible for the first global release unless some consensus between the target audience is viable.

Note: packages that might be usable by both B and C on this proposal would be decoupled.

D. Literal specification of presets

See Preset.

There's at least one good candidate, the Name Suggestion Index. Already have a strong community of volunteers curating its content on https://github.com/osmlab/name-suggestion-index/ .

The existing data make candidate for automation to merge to global distribution. However it's name (on the distribution with other packages), schema and which data to export need planning.

E. Enhanced specification of presets

While the literal publication of presets have an immediate usefulness inside Editors, from a more ontological point of view, they are close to a concept (plus additional suggested fields to user add). This means that any enhanced version of presets could be usable outside the suggestion for users, but also close to how a schema is mapped inside OpenStreetMap and how it could be represented outside. Also note that even the B. Literal specifications of tagging would not fulfill this role, because we could still use as a permanent identifier something like "highway=service", but can get very complex when the idea actually uses 2 or more tags.

No direct equivalent exists today and this is not feasible for direct automation compared to D. Consider this proposal less feasible for first release of packages.

F. Structured tagging key patterns

Some tagging aren't exact key strings, but a pattern, often suffix ("en" of "name:en=*") or prefix (such as "source=*" in "source:name=*"). The target audience would want a full list, however creating every combination (or compiling the used ones) is unlikely to be a good efficient approach.

The source of data may not be viable only from data mining of OSM Wiki or may be too complex to relly only on generic data mining. The actually number of patterns migth be better alredy be handcrafted, but at least part could be backported to specifications on the OSM wiki.

G. Specification of geometry-related feature of OpenStreetMap data model

While there's some challenges very specific to parse data at OpenStreetMap scale (such as Coastline), very detailed precision) which are still related to geometry, it makes sense to be part of the distribution of packages and also how the typically geometry-related meaning of OpenStreetMap data.

The initials sublist of items likely be incomplete, however the focus here is how to interpret more complex geometries from at least the most common cases. Ideally, we should allow more software to understand data of OpenStreetMap data model directly, without downgrade to intermediary formats.

G.1. Specification on multipolygon, multilines and areas

See Relation:multipolygon , Relation:multilinestring and Areas.

The deriverable here likely to be text

G.2. Machine readable list of taggings that imply geometry

The way software on OpenStreetMap works, as one example , the tagging such as area=yes on Closed way are assumed to be redundant in presence of one or more semantic tags, to a point even quality assurance tools complain. So such implicit implications on geometry exist, are used by popular OpenStreetMap tools, but aren't published explicitly.

The work on this package likely will be handcrafted, without automation, but results could also be backported to literal B or part of the enhanced C.

H. (Re) publication of specifications for values with tagging with complex (but predictable) semantics

OpenStreetMap tagging in specific situations neither uses a controlled vocabulary for values nor is a machine-readable specification specified by external standards organizations. This means that implements either use documentation inside OpenStreetMap or rely on libraries that were done in the past. While the threshold to need this level of focus is up to discussion, at minimum the use cases which the values or its semantics already are implemented by a sufficiently different number of cases should have extra attention.

Some structured values (such as those using date time, or could be expressed as regex) viable to be expressed as part of the B or C don't need to become dedicated packages.

H.1 Opening hours

See Key:opening_hours and Key:opening_hours/specification.

Considering its semantics and how compact the opening hours specification, makes it a good candidate to be shared as a dedicated package (even if it means first version be literal text contained in OpenStreetMap Wiki). For relevance already is used beyond OpenStreetMap, such as part of specifications like the [Indoor Mapping Data Format (Apple)] .

H.2 Conditionals on tagging values

See Conditional restrictions, but maybe there more uses (with not same semantics) than access restriction or change of values based on the condition (that may not be only time/day).

This group is a catch all for the fact that some of the documented expected values on OpenStreetMap tagging (sometimes with suffixes such as ":conditional", can actually have conditionals, such as based on variations of Opening Hours.

I. Natural Languages

A package with natural languages know to be used on OpenStreetMap with additional metadata (such as interlinking with external references) makes sense, because very often at least the codes will be necessary to understand other data.

Natural language is so important that even one special kind of suffix ("en" of Key:name:en") is a reference to natural languages. The actuall amount of key patterns which this can be suffix is quite large (outside the Names, "note:en=*", "description:en=*", "fixme:en=*" and beyond) and explicitly specification of the natural language a field contains allow take action on what's inside (for example, source to translate, prioritize to show to user,...).

The deliverable of this package would be hancrafted table.

J. Tagging used on changesets

See Changeset.

The package A would contain specifications that are strongly tied to the transport of files, however there's additional metatags that, while not strictly required, become common sense at least for a group of tools in that area of use. We could start with, for example, the most common ones used by Editors. In the future eventually it could include conventions used by tools focused on a more specific task (such as ones used for Reverts or Imports).

This package (which maybe just be part of B or C, since this kind feedback in chosen the tags actually could be backported to the Wiki) becomes very relevant (or, to say in other terms, a different kind than the taxonomy used for internal data) as it works as an extension both for uploaded changesets and as files for interoperability between tools for data not uploaded.

L. Semantics on key=values encoded on free text of OpenStreetMap Notes

See Notes.

Notes aren't part even on the discussions of OpenStreetMap data model, however they still one type of feedback which aren't as metadata on elements (such as using fixme=*). Some OpenStreetMap editors, in special ones which are mobile, in addition to the natural language comment from the user, they append text that could be also parseable by machine. The result is, ultimately, equivalent to a key=value, with value often either machine-readable value (such as date of OpenStreetMap release, which may be weeks older than the submission date of the note). The proposal here starts with the de facto most popular existing tagging-values in such a way that could help automated data mining from the text.

At some point in the future this could evolve to invite both tools that submit notes and potential new ones to discuss between then potential conventions. The perfect use case would be softwares already able to understand the package schema, and as soon as there's a new version, learn new types of meta information in notes.