Asynchronous NBE geocoding and region coding service
geocoding-secondary is a feedback secondary. What is a feedback secondary? Like other secondaries it runs as an instance of secondary-watcher. But instead of writing to its own store, it feeds back its computations to truth
by posting mutation scripts to data-coordinator.
geocoding-secondary operates on computed columns with computation strategies of the following types:
geocoding
georegion_match_on_point
or legacygeoregion
georegion_match_on_string
It computes the value of the target computed column from the source columns described in the computationStrategy
.
Each computation strategy type defines what should be found in parameters
.
We have a strategy definition validation library for computation strategies.
geocoding-secondary geocodes the value of the target point column from an address constructed from the text source columns described in the computationStrategy
.
For a geocoded computed column of type "geocoding"
- the column
dataTypeName
must be"point"
. - the column
computationStrategy.type
must be"geocoding"
. - source columns must be
text
columns.*
The computation strategy has the following required** parameter:
"defaults.country"
: country address default value
** A superadmin can set a domain wide default for defaults.country
which will be used unless it is overridden at the computation strategy level. If there is no domain default and the parameter is not provided to the computation strategy, it will be inserted with value "US"
.
And the following optional parameters:
Source columns:
"sources.address"
: field name of the address source column"sources.locality"
: field name of the locality source column"sources.subregion"
: field name of the subregion source column"sources.region"
: field name of the region source column"sources.postal_code"
: field name of the postal code source column"sources.country"
: field name of country source column
Default values:
"defaults.address"
: address default value"defaults.locality"
: locality default value"defaults.subregion"
: subregion default value"defaults.region"
: region default value"defaults.postal_code"
: postal code default value
The source column field names in the computationStrategy.source_columns
and computationStrategy.parameters
must match, further those columns must exist and cannot be deleted until the computed column is first deleted.
If a default value is specified for an address part, then that value will be used in the computation always if no source column is specified for that address part, otherwise it will be used as default value for that address part in the computation for null values at the row level.
Version:
"version"
: api version, this version will be"v1"
If the api version is not provided the version will default to the current version and be inserted into the computation strategy.
An example of creating a new geocoded column with a POST to /views/<lensId>/columns.json
(note that not all parameters.sources
options are present--or required--and that no "fieldName"
is supplied since it will be assigned by the backend):
{
"name": "Location",
"dataTypeName": "point",
"computationStrategy": {
"type": "geocoding",
"source_columns": ["street_address", "zip_code"],
"parameters": {
"sources": {
"address": "street_address",
"postal_code": "zip_code"
},
"defaults": {},
"version": "v1"
}
}
}
An example of a geocoded column from a GET request to /api/views/<lensId>.json
(note the presence of the "fieldName"
property):
{
"name": "Location",
"dataTypeName": "point",
"fieldName": "location",
"computationStrategy": {
"type": "geocoding",
"source_columns": [ "street_address", "city", "state", "zip_code"],
"parameters": {
"sources": {
"address": "street_address",
"locality": "city",
"region": "state",
"postal_code": "zip_code"
},
"defaults": {
"region": "WA",
"country": "US"
},
"version": "v1"
}
}
}
* postal_code
source columns may be number
columns, but really postal codes should be of type text
.
A superadmin can set a domain wide default for any of the default values. These domain wide default values will be used unless overridden in a computation strategy.
These can be set through the admin panel by providing the property to the "geocoding"
configuration:
- key:
"defaults"
- value:
A json object with the optional fields:
"address"
"locality"
"subregion"
"region"
"postal_code"
"country"
Setting a domain wide default for "country"
is encouraged as otherwise "computationStrategy.defaults.country"
will be defaulted to "US"
.
geocoding-secondary region codes the value of the target number column from either a lat/lon or string value from the point or text source column described in the computationStrategy
using the specified curated region dataset found in parameters
.
For a region coded computed column of type "georegion_match_on_point"
(legacy "georegion"
)
- the column
dataTypeName
must be"number"
. - the column
computationStrategy.type
must be"georegion_match_on_point"
(or legacy"georegion"
). - a single source column must be of type
"point"
.
The computation strategy has the following required parameters:
"region"
which is the resource name of the curated region"primary_key"
which is the primary key of the curated region
For example:
{ "type": "georegion_match_on_point",
"source_columns": ["location_point"],
"parameters": {
"region": "_nmuc-gpu5",
"primary_key": "_feature_id"
}
}
For a region coded computed column of type "georegion_match_on_string"
- the column
dataTypeName
must be"number"
. - the column
computationStrategy.type
must be"georegion_match_on_string"
. - a single source column must be of type
"text"
.
The computation strategy has the following required parameters:
"region"
which is the resource name of the curated region"column"
TODO: what is this?"primary_key"
which is the primary key of the curated region
For example:
{ "type": "georegion_match_on_string",
"source_columns": ["location_string"],
"parameters": {
"region": "_nmuc-gpu5",
"column": "column_1",
"primary_key": "_feature_id"
}
}
If you want to use the geocoding secondary you will need to add a MapQuest app-token to the config and add it to the secondary_stores_config
table in datacoordinator
(truth).
INSERT INTO secondary_stores_config (store_id, next_run_time, interval_in_seconds, is_feedback_secondary) VALUES( 'geocoding', now(), 5, true);
For geocoding-secondary
to use MapQuest create the .gitignored local override file configs/local.conf
and override the value for com.socrata.geocoding-secondary.geocoder.mapquest.app-token
with a real MapQuest app token.
> cat configs/local.conf
com.socrata.geocoding-secondary {
geocoder.mapquest.app-token = "SOME REAL MAPQUEST APP TOKEN"
}
With sbt
:
> sbt -Dconfig.file=configs/application.conf run
Running the assembled jarfile:
> sbt assembly
> java -Djava.net.preferIPv4Stack=true -Dconfig.file=configs/application.conf -jar target/scala-2.10/secondary-watcher-geocoding-assembly-<version>-SNAPSHOT.jar
To run the tests
sbt test