Skip to content

Performance optimized Overpass API fork

License

Notifications You must be signed in to change notification settings

mmd-osm/Overpass-API

Β 
Β 

Repository files navigation

Experimental Overpass API fork

πŸ‘‹ Welcome!

NEWS

Read more about the 2022 planet import challenge:

Overview

  • Performance optimized experimental Overpass API fork 0.7.59_mmd
  • Based on 0.7.57.0, plus selected functional additions and bugfixes included in 0.7.58.* releases

πŸŽ‰ New features

  • Supporting CMake based build, in addition to autotools.

    • Unity builds (USE_UNITY_BUILD = ON) for faster compilation
    • cpack: creates debian packages
    • Various HAVE_* flags to control libraries and features in use
  • Libosmium-based output modes:

    • PBF/OPL format output: setting [out:pbf] and [out:opl]
    • LZ4 compression for PBF output: [out:pbf(lz4)]
    • Libosmium based XML output: [out:osmxml] - faster output; Overpass API-specific OSM XML format extensions are not supported
  • Regular expressions:

    • ICU regular expressions: setting [regexp:ICU]
    • PCRE2 (+JIT) regular expression engine: setting [regexp:PCRE] or [regexp:PCREJIT]
  • New/improved statements:

    • XAPI-like union operation for tag values (syntactic sugar): node[place=city|town|village]; query syntax can be used instead of node['place'~'^(city|town|village)$'];

    • New query statement filter 'has no key like ...' - node[amenity=recycling][!~"^recycling:"]; (Related upstream issue: drolbr#589)

    • New function all_vertex() which evaluates to true, if all vertices of a way fulfill a given expression.

      Query

      way({{bbox}})[building] (if:lrs_in(1,per_vertex(abs(angle()) > 170)));

      can now be rewritten as:

      way({{bbox}})[building] (if:!all_vertex(abs(angle()) <= 170));

    • Ad-hoc area creation on any closed way/relation

      Example query for ad-hoc area creation (click to open)
      ```
      way[landuse=residential]({{bbox}});
      
      foreach ->.pivot {
        (
          .pivot;
          node(w.pivot);
        );
      
        ( make_area [.pivot]; .result;)->.result;
      }
      
      rel[type=multipolygon][landuse=residential]({{bbox}});
      foreach ->.pivot {
        (
          way(r.pivot);
          node(w);
        );
        ( make_area [.pivot]; .result;)->.result;
      }
      
      foreach .result -> .area {
      
        way[building](area.area);
      
        if(count(ways) == 0) {
          way(pivot.area);
          out geom meta;
          rel(pivot.area);
          out geom meta;
        }
      }
      ```
      
  • Dockerfile to facilitate building 0.7.59_mmd and 0.7.56 binaries (see docker/ directory)

  • PBF planet and diff files can be imported without any external PBF->XML conversion tools. By avoiding expensive XML parsing, and leveraging libosmium parallel file processing, imports see a significant speedup. To enable this new file importer, add command line parameter --use-osmium when calling update_from_dir or update_database. Command line parameter -f can be used to override the default input file format (PBF). See https://osmcode.org/file-formats-manual/ for permitted values.

  • PBF planet initial load supports LocationsOnWays extension. When using osmium tool to create the PBF file, it's essential to use the, --keep-untagged-nodes option to keep the untagged nodes in the output file.

    osmium add-locations-to-ways -n extract.osm.pbf -o extract_low.osm.pbf
    
  • map_demo: alternative implementation for API 0.6 /map call for very large areas with > 100 million nodes. Returns PBF file format.

  • Improved dispatcher process security for use in multi-user environments (upstream issue 247)

  • Improved dispatcher signal handling for SIGTERM and SIGINT, same behavior as --terminate command line parameter

  • Environment variables:

    • OVERPASS_MAX_TIMEOUT: global override for maximum permitted [timeout:...] value.
    • OVERPASS_MAX_ELEMENT_LIMIT: global override for maximum permitted [maxsize:...] value.
    • OVERPASS_FCGI_MAX_REQUESTS: number of FastCGI requests before interpreter process is being terminated (when idle)
    • OVERPASS_FCGI_MAX_ELAPSED_TIME: maximum time after which FastCGI process is being terminated (when idle)
    • OVERPASS_REGEXP_ENGINE: default regexp engine to use if none is specified in Overpass QL settings. Possible values include: POSIX, ICU, PCRE and PCREJIT. PCREJIT is recommended for best performance.
    • OVERPASS_LOG_LEVEL: define transactions.log log level. Available levels: 0 (error), 1 (warn), 2 (info), 3 (debug, default value), 4 (trace)
    • OVERPASS_SHARED_NAME_SUFFIX: define /dev/shm/osm3s*... shared memory file suffix, allowing multiple parallel Overpass instances on one system
    • OVERPASS_MAX_SPACE_LIMIT: maximum size of a FastCGI process's virtual memory (address space), in bytes. Default value: 2^33 (=8 GiB); memory will be unlimited when the parameter value is set to 0.

πŸ› Bugfixes

  • Reject unsupported area filter for areas (area(area))
  • Reject empty poly statement ((poly))
  • Print error messages in CSV output mode. Errors can be detected as empty line followed by the error message.
  • Fix replication scripts apply_osc_to_db.sh and fetch_osc.sh which now handle the global state.txt file correctly.
  • Fix Timestamp constructor may trigger segmentation fault (upstream issue 625)
  • Fix Change_Entry comparison operator bug (upstream issue 623)
  • Avoid accumulating area_blocks in foreach loop (upstream issue 568)
  • Fixed some memory leaks in test classes
  • Validate object ids during import (when using osmium based importer), rejecting object ids which are too large to be stored in 40 bits / 32 bits respectively.

πŸ’₯ Incompatible database format changes

  • Use of 40 bit node ids.
    • Due to upstream issue 465, node ids were already limited to 42 bits before.
    • LZ4 can no longer compress 40 bit node ids, resulting in a slightly worse compression ratio (average 69% before --> 75% fixed now).
  • Way nodes stored as varint using protozero (original idea: drolbr#250)
  • Node changelog is replaced by node change packages. They form a timestamp based delta encoded list of node ids, similar to way nodes. Also, old_idx and new_idx fields in Change_Entry have been removed (upstream issue 654). Both changes result in an overall size reduction from >60GB down to 6.5GB for node changelog details.
  • Timestamp data type has been changed from 40 bit to 32 bit, with year 2000 as baseline. Supporting 16384 years for OSM object metadata seemed a bit excessive. The new data type still covers years 2000..2063.

Three conversion tools are available for easy conversion from the official database file format to the custom one.

  • Conversion tool to convert 0.7.56 clone database into 0.7.59_mmd format (runtime ~3h). See upstream/README_0_7_57_1_patched.md for details.
  • Conversion tool to created tagged nodes table absent in 0.7.56 database (create_tagged_nodes [db_dir])
  • Conversion tool to create node_changepack.bin based on an existing node_changelog.bin (create_node_changepack [db_dir])

All conversion tools need to be executed without an active dispatcher instance.

A full attic database using lz4 compression for bin+map needs about 400G on 0.7.59 (mmd), based on 01/2023 data.

βŒ› Performance

(List does not include some rather technical changes)

  • FastCGI support: avoid starting a new interpreter binary for each request, thereby enabling further caching options (upstream pull request 383)

  • Index and username caching: further taking advantage of FastCGI, database indices and usernames are only updated once per minute, and can be reused for many queries.

  • epoll based dispatcher processing for better scalability. This replaces round robin based unix domain socket polling with 10ms..100ms time intervals.

  • Hybrid array/bitmap container for better memory efficiency of large query statements (subset of features found in roaring bitmaps, https://arxiv.org/pdf/1603.06549.pdf)

  • Avoid object instantiation to access node/way/relation properties wherever possible in osm3s template db backend by leveraging CRTP.

  • Attic: speed up object reconstruction, avoid expensive copying of objects

  • Parallel processing support for update_from_dir/update_database. New parameter --parallel=n, where n denotes the max. number of parallel processes

  • around statement:

    • bbox based pruning. Calculate bounding box for each way and avoid O(nΒ²) great circle distance calculations, if bounding boxes don't intersect. (upstream issue 167, > 120x speedup).
    • Avoid excessive object allocations by replacing vector by tuple.
    • condense_ranges as standalone function: improve range expansion for global queries (example: https://overpass-turbo.eu/s/1azD -> 180x speedup).
    • Reduce memory consumption by releasing temporary structure immediately after statement execution.
  • Area caching, use of memoization to avoid expensive calculation

  • Lazy way geometry store loading for "out qt" to reduce memory consumption, also in attic mode. In addition avoid recalculation of loop invariants in Way_Geometry_Store.

  • Ignore bounding box if it covers the whole globe.

  • Lazy (if: ) filter evaluation to short circuit boolean expression evaluation.

  • (if: ) expression evaluation now uses std::variant for intermediate results to avoid back and forth conversions from number to string types, resulting in significant speedups for angle() et al.

  • Changeset based (changed: ) filtering to process huge changesets in Achavi. Additional caching in changed to avoid recalculation of already available results.

Example syntax for changeset 46503970: (syntax is subject to change)

[adiff:"2017-03-01T20:28:34Z","2017-03-01T20:28:42Z"];
(node(changed!46503970);way(bn);way(changed!46503970);relation(changed!46503970););out meta geom qt;
  • Separate nodes table for fast lookup of tagged nodes. This way only ~3% of all nodes need to be processed for tag node queries.
  • union statement optimization to avoid unnecessary copying of so called "stack frames". Speeds up typically used (._; .result;)->.result; statements inside foreach loops.
  • Use lz4 compression by default for bin and map files
  • LZ4: fallback to uncompressed input, if result size increases during compression
  • Using fmt library to print fixed decimal values
  • Reduce memory consumption for attic tag queries (collect_attic_kregv and al.)
  • Significant reduction in memory consumption for out qt; based output mode using index based prefetching.
  • Expensive health check: use linear complexity approach in collect_items_range and similar functions.
  • Use of SharedDataPointer to reduce cost of copying large objects (ways, relations, areas, etc.)
  • Parallel clone generation. Number of parallel threads can be set using the --clone-parallel=n command line parameter.
  • Use of C++17 string_view to execute key value checks on raw data, avoiding copying altogether. This includes ICU and PCRE(JIT) regular expressions. Note: POSIX regular expressions still require additional copying due to API limitations.
  • Key value queries to validate first element of a new index only. Immediately skip all elements, in case key value pairs are not matching.
  • Replace std::map based user data cache by std::vector, also avoid expensive sorting operations for millions of user id/display_name pairs.
  • Use monobound_binary_search instead of std::binary_search for some hot code paths.
  • Make updater and backend use std::vector instead of std::set, and std::unordered_map instead of std::map
  • Added multi-threaded processing in updater
  • Many more micro-optimizations, see git changelog

❌ Removed features

  • [out:custom] and [out:popup] output modes
  • Public transport diagrams (sketch route tools)
  • local (localize)
  • XAPI compatibility layer (obsolete, no real world usage anymore)
  • Compile time option --disable-overpassxml to remove support for Overpass XML dialect
  • Support for jsonp trick. Obsoleted nowadays by CORS.

♻️ Cleanup

  • clang tidy: modernize/performance (see commit log for details)
  • Removed some old content in doc/

⚑ Performance metrics

  • Full attic database creation 4-5 days (currently: > 24 days)
  • Initial area creation (using 0.7.57 area creation rules): 28.5 min
  • Dispatcher process can handle 5000-7000 requests/s.

Performance test results

Test setup:

  • Reprocess lz4 transactions.log, dated 2021/12/17
  • 1'837'079 queries in total
  • 1 CPU used to execute queries, no parallel processing
  • Areas created beforehand
  • No database updates

Results:

  • Roughly 7x speedup on average vs. 0.7.57
  • Total processing time: 100'054 s (about 27 hours, 45.5 minutes), based on runtime measurement in transactions.log
  • Average: 1017 queries/minute, based on start/end times in Apache log
  • Distribution
    • 495 queries (=0.027%) account for 20% of the total processing time (queries w/ > 20s runtime)
    • 8916 queries (=0.485%) account for 50% of the total processing time (queries w/ > 1.1s runtime)
  • Errors during execution
    • Oversized: 199 queries (vs. 110 in original transactions.log file)
    • Timeout: 274 queries (vs. 1402 in original transactions.log file)
    • Frequently triggered by expensive Achavi queries still lacking filtering based on changeset id, as well as a single umap map without proper limit on zoom level

Query runtime:

mean           0.054 s
min            0.000 s
max          205.513 s
quantile     runtime (in s)
---------------------------
10%            0.002
50%            0.007
90%            0.042
95%            0.111
99%            0.601
99.5%          1.065
99.9%          5.829
99.95%        12.425
99.99%        30.380
99.995%       56.768
99.999%      120.544
99.9999%     148.805

(-> 99.5% of all queries take less than 1 second)

Used supervisord config settings

environment=
    OVERPASS_FCGI_MAX_REQUESTS=10000,
    OVERPASS_FCGI_MAX_ELAPSED_TIME=900,
    OVERPASS_REGEXP_ENGINE="PCREJIT",
    OVERPASS_DEFAULT_TIMEOUT=60,
    OVERPASS_MAX_TIMEOUT=120,
    OVERPASS_MAX_SPACE_LIMIT=8589934592,
    OSMIUM_POOL_THREADS=1,
    OVERPASS_LOG_LEVEL=2

Installation

Base image: Ubuntu 20.04

Clone source code

git clone https://github.com/mmd-osm/Overpass-API.git
cd Overpass-API
git checkout test7591
git submodule update --init

Install dependencies

sudo apt-get update -qq || true
sudo apt-get install -y g++ git make autoconf automake ca-certificates libtool \
       libfcgi-dev libxml2-dev zlib1g-dev \
       expat libexpat1-dev liblz4-dev libbz2-dev libicu-dev \
       libfmt-dev libpcre2-dev libcereal-dev libgoogle-perftools-dev \
      --no-install-recommends

Compiling and installing binaries

C++17 compiler support is mandatory to build binaries.

pushd src/
chmod u+x test-bin/*.sh
autoscan
aclocal
autoheader
libtoolize
automake --add-missing
autoconf
popd
mkdir -p build
cd build
../src/configure CXXFLAGS="-Werror=implicit-function-declaration  -D_FORTIFY_SOURCE=2 -fexceptions -fpie -Wl,-pie -fpic -shared -fstack-protector-strong -Wl,--no-as-needed -pipe -Wl,-z,defs -Wl,-z,now -Wl,-z,relro -fno-omit-frame-pointer -flto -fwhole-program -march=native -O2 -ftree-vectorize -g3 -ggdb" LDFLAGS="-ltcmalloc -flto -fwhole-program -lpcre2-8 -lfmt" --prefix=$EXEC_DIR --enable-lz4 --enable-fastcgi --enable-tests
make V=0 -j3
make install

(also see docker/ directory for examples)

supervisord config

/etc/supervisor/conf.d/overpass.conf

[fcgi-program:interpreter]
socket=unix:///var/run/interpreter.socket
socket_owner=www-data
socket_mode=0660
environment=
    OVERPASS_FCGI_MAX_REQUESTS=10000,
    OVERPASS_FCGI_MAX_ELAPSED_TIME=900,
    OVERPASS_REGEXP_ENGINE="PCREJIT"
command=/home/user/osm3s/fcgi-bin/interpreter
numprocs=6
priority=999
process_name=%(program_name)s_%(process_num)02d
user=www-data
autorestart=true
autostart=true
startsecs=1
startretries=3
stopsignal=QUIT
stopwaitsecs=10
redirect_stderr=true
stdout_logfile=/var/log/interpreter.log
stdout_logfile_maxbytes=10MB

Apache config

Forwarding calls to /api/interpreter to local socket managed by supervisord. Requires mod_proxy_fcgi.

   ProxyPass /api/interpreter unix:///var/run/interpreter.socket|fcgi://localhost/api/interpreter

Replacing /api/map shell script: use Apache rewrite engine to use /api/interpreter endpoint instead:

   <LocationMatch "^/api/map$">
      RewriteEngine On
      RewriteCond %{QUERY_STRING} ^bbox=([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+)$
      RewriteRule ".*" "/api/interpreter?data=[timeout:300][maxsize:2000000000][bbox:%2,%1,%4,%3];(node(%2,%1,%4,%3);way(bn);node(w););(._;(rel(bn)->.a;rel(bw)->.a;);rel(br););out meta;" [PT]
   </LocationMatch>

   <LocationMatch "^/api/map\.pbf$">
      RewriteEngine On
      RewriteCond %{QUERY_STRING} ^bbox=([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+)$
      RewriteRule ".*" "/api/interpreter?data=[out:pbf][timeout:300][maxsize:2000000000][bbox:%2,%1,%4,%3];(node(%2,%1,%4,%3);way(bn);node(w););(._;(rel(bn)->.a;rel(bw)->.a;);rel(br););out meta;" [PT]
   </LocationMatch>

Languages

  • C++ 86.3%
  • Shell 3.8%
  • HTML 3.8%
  • CMake 3.2%
  • CodeQL 1.1%
  • Makefile 0.6%
  • Other 1.2%