Osmfilter version 0

From OpenStreetMap Wiki
Jump to navigation Jump to search

osmfilter version 0 is a tool to provide an easy way of shrinking a .osm XML file in situations where you need only a few tags. It has proven useful to filter the OSM XML file before you import it into a PostgreSQL database with osm2pgsql. Although the result will be the same, you can speed-up the import process significantly, in particular because osmfilter is written in C and quite fast.

Note that there is a newer filter program (osmfilter version 1) which offers some more options and is easier to use if object dependencies are to be cared about.

Download

These Downloads are available:

(As usual: There is no warranty, to the extent permitted by law.)

Program Description

This program operates as filter for OSM XML data. Only sections containing certain tags will be copied from standard input to standard output. Use the calling line parameter -k to determine which sections you want to have in standard output.
For example:

-k"key1=val1 key2=val2 key3=val3"
-k"amenity=restaurant =bar =pub =cafe =fast_food =food_court =nightclub"
-k"barrier="
-K/ -k"description=something with blanks/name=New York"

Limitations: the maximum number of key/value pairs is 1000, the maximum length of keys and/or values is 100. The -t option invokes a test mode which prints a list of accepted search strings to standard output.

To suppress certain records, please use the -d option. For example:

 -d"highway=path =footway =cycleway railway=rail"

All objects containing at least one of the mentioned values will be dropped, regardless of their being part of a relation which is not dropped. I.e., key/val pairs in the -d parameter overrule the pairs which have been defined in the -k parameter.

Considering Dependencies

To get dependent elements, e.g. nodes of a selected way or ways of a selected relation, you need to feed the input OSM XML file more than once. You need to do this at least 3 times to get the nodes of a way which is referred to by a relation.
If you want to ensure that relations which are referred by other relations are also processed correctly, you must input the file a 4th time. If there are more than one inter-relational hierarchies to be considered, you will need to do this a 5th or 6th time.

If you feed the input file into an osmfilter more than once, you must tell the program the exact beginning and ending of the pre-processing sequence. For example:

cat lim a.osm a.osm a.osm a.osm lim a.osm | ./osmfilter -k"lit=yes" >new.osm

where 'lim' is a file containing this sequence as a delimiter:

<osmfilter_pre/>

If you have a compressed input file, you can use bzcat instead of. cat. If this is the case, be sure to have compressed the 'lim' file as well.

To speed-up the filter process, the program uses some main memory for a hash table. By default, it uses 320 MiB for storing a flag for every possible node, 60 for the way flags, and 20 relation flags.
Every byte holds the flag for 8 ID numbers, i.e., in 320 MiB the program can store 2684 million flags. As there are less than 1000 million IDs for nodes at present (Oct 2010), 120 MiB would suffice. So, for example, you can decrease the hash sizes to e.g. 130, 12 and 2 MiB using this option:

-h130-12-2

But keep in mind that the OSM database is continuously expanding. For this reason the program-own default value is higher than shown in the example, and it may be appropriate to increase it in the future. If you do not want to bother with the details, you can enter the amount of memory as a sum, and the program will divide it by itself. For example:

-h1000

These 1000 MiB will be split in three parts: 800 for nodes, 150 for ways, and 50 for relations.

Because we are taking hashes, it is not necessary to provide all the suggested memory; the program will operate with less hash memory too. But, in this case, the filter will be less effective, i.e., some nodes and some ways will be left in the output file although they should have been excluded.
The maximum value the program accepts for the hash size is 4000 MiB; If you exceed the maximum amount of memory available on your system, the program will try to reduce this amount and display a warning message.

Optimizing the Performance

As there are no nodes which refer to other objects, preprocessing does not need the node section of the OSM XML file. Nearly the same applies to ways, so the ways are needed only once in preprocessing - in the last run.
If you want to enhance performance, you should take pre-filtering the OSM XML file into consideration. Pre-filtering can be done using the drop option. For example:

cat a.osm | ./osmfilter --drop-nodes >wr.osm
cat wr.osm | ./osmfilter --drop-ways >r.osm
cat lim r.osm r.osm wr.osm lim a.osm | ./osmfilter -k"lit=yes" >new.osm

If you are using pre-filtering, there will be no other filtering, i.e., the parameter -k will be ignored.

Example of Use

The project OpenGastroMap.de uses osmfilter to speed-up the database import. That makes it possible to run the application on a small virtual Internet server. Here are the details: OpenGastroMap/install#Tool_osmfilter.