Servlet configuration format

JSON format describing a servlet configuration

    [
        {
        "name":"node name/alias, alphanumeric only", required
        "exec":{    required
            "path":"executable path, URL (see Url.md)", required
            "name":"executable name", optional
            "args":"executable command line", optional
            "env":{    executable environment, optional
                "key1":"value1",
                "key2":"value2"
                }
        },
        "file_list":[ list of devices/files, optional deprecated
        "devices":[ list of devices/files, optional
            {
            "device":"device name", required deprecated
            "name":"device name", required
            "path":"device file path, URL (see Url.md)", optional
            "content_type": "MIME type of the content", optional, ignored for read-only devices
            "meta": {   metadata for the object, optional, ignored for read-only devices
                "metakey1":"value1",
                "metakey2":"value2"
                }
            "mode": "changes stat() type of device, can be 'file', 'block', 'char' or 'pipe'", optional
            "min_size":0, minimal data size in bytes, optional, ignored for read-only devices
            },
        ],
        "count":1, number of nodes, optional
        "connect":[ destination nodes to connect to, optional
            "nodename1",
            "nodename2"
        ],
        "replicate":1, how many replicas of this node should run, optional
        "attach": "default", where zerovm session will be attached, optional, see below
        "location": "", location hint, optional, see below
        }
    ,
    ....
    ]

Deprecated keywords are still supported, but can disappear without a trace in any next version

Rules

Allowed device names are:

stdout stdin stderr input output image debug script

All other device names are invalid
There are also 'system images' which are devices as well.
If zerovm_sysimage_devices is set in proxy and object server config files
you can use the device names you've set there in servlet config
and they are added to the list of allowed device names.

Path property devices[i].path is optional
Only the following devices can be supplied without path property set

stdout stderr output

All other devices must have a path attribute. There can be exactly one output device without a path.
If the device is a 'system image' it also does not have a path, its path is predefined in zerovm_sysimage_devices configuration directive

Devices stdin, input and script can be supplied without path in case where the X-Zerovm-Source header is used in request.

If X-Zerovm-Source header is used the POST payload will be used as a channel payload for these devices. You cannot have multiple readable devices without a path. But the same POST payload can be sent to different job nodes if each one of then has one readable device with no path.

Path property devices[i].path can contain wild-card(s)
Asterisk * character must be used as wild-card
Any number of wild-cards is allowed in one path
Wild-cards are expanded following the /.*/ regex rule
If path contains wild-card the count is ignored
debug device cannot have a wild-card in path

If all devices in devices do not contain path or there are 0 devices in devices a count should be supplied
If count is not supplied count: 1 is assumed

If the following devices have wild-cards in path

stdin input image

Then the following devices (connected to the same mode) also must have wild-cards in path:

stdout stderr output

Or they can have no path attribute (if it's allowed for this device)

Connection list "connect": [] represents one-way directed connection from this node to the nodes in connect
The current node is a source of the connection and each node in connect is a destination
Node can be connected to itself if and only if its count > 1 implicitly (by having path with wild-card) or explicitly by having count property set

Argument string args represents a command line as a string
Command line should be supplied by user

Env dictionary env represents an execution environment as key->value hash
Keys must be unique
Keys and values should be supplied by user
Keys must be alphanumeric, cannot contain whitespace characters

There are the following device types

READABLE WRITABLE RANDOM SEQUENTIAL CDR NETWORK

Each device has the following type rules

stdin: SEQUENTIAL + READABLE stdout: SEQUENTIAL + WRITABLE stderr: SEQUENTIAL + WRITABLE input: RANDOM + READABLE output: RANDOM + WRITABLE image: CDR debug: NETWORK script: RANDOM + READABLE

All system image devices are RANDOM + READABLE

Device debug is a special network device
It must have path property set
Its path has the following semantics:
proto://hostname:port
Where proto must be one of:

tcp udp

Port must be numeric
hostname must be a valid hostname, allowed characters: alphanumeric, ., -

Each path must start with / character
The following devices must have existing, readable path:

stdin input image script

exec property must have existing, readable path
exec property can have a relative path (not starting with /).
If the path is relative it is assumed that executable file is archived inside image device or any of the system image devices.
First the image device is checked for the relative path, then all the system image devices are checked in the order they appear in config file, the process stops when the file is found.

Device can have content_type property set. Its value will be posted as a Content-Type header for this object.
content_type makes sense only for WRITABLE devices as it won't change existing content types just write new ones.
The Content-Type for a device without a path property will be set for the HTTP response.
content_type property set for read-only devices will be ignored.

Device with content_type: message/http or content_type: message/cgi has a special meaning.
The output object of such device will be parsed as an HTTP response.
Headers taken from the parsed response will be supplied either to PUT if this object to be saved in object store, or in the response if it is an immediate response object (no path set)
Only the following headers will be parsed from this object: Content-Type X-Object-Meta-*
message/http is for CGI NPH applications and message/cgi is for regular CGI applications.
CGI/1.1 is supported on server, CGI environment variables will be supplied to the application.

Device can have meta property set. meta contains meta-tag dictionary for this object.
The meta-tags will be written alongside the object when the object is saved.
Meta-tags will be sent within HTTP response if it is an immediate response object (no path set).
meta will be ignored for read-only objects.

Each node can have replicate property set. Default replication value is 1 (no replication).
This property supports replicate: 2 and replicate: 3 for double and triple replication respectively.
If it was set to > 1 additional copies of the node will run.
Zerovm will replicate channels data for these nodes and compare them at runtime.
If it encounters errors the data from these nodes will be temporarily ignored.
This feature allows cluster to be fault-tolerant and improves the cluster processing speeds slightly.
Zerocloud may decide to run specific nodes in replicated way even when it was not specified in the servlet config file.
It does so to add redundancy or set replicas for the resulting Swift objects directly.

Some applications may expect devices/channels to produce specific output when stat() is run on the descriptor.
You can change the type of the file with the mode property of the device.
The types are:
file - regular file
char - character device
pipe - FIFO pipe (default for standard channels stdin stdout stderr)
block - block device (default for all other channels)
The most useful combinations are: file instead of block and char instead of pipe.
Other combinations are supported but make little sense.

Some users may want to control where the session will be started.
To colocate the session with the largest object, for example. Or to faster write the output file/object.
It can be controlled with the attach keyword.
Default value for attach is default which means that the following strategy will be employed by middleware: code will be sent to the storage node that holds the first (topmost) swift:// path encountered in the job config. But before that the paths will be sorted by placing read only objects on top, write only after the former ones, and all other objects at the bottom. Sort will be a stable one. If such path does not exist the node for a session will be chosen randomly.
If another location is desired, user needs to specify a device in attach, ex.: "attach": "stdout".
If the device is a swift:// path the session will be started in co-location with the desired object. If such device is not a swift:// path the node for a session will be chosen randomly.

Executable can have exec.name property. It is an optional argument that sets executable name to the supplied string.
Executable name is visible to the application under ARGV[0], as usual.
If exec.name is not supplied, the name would be set to the cluster node name property

There is a special device called script. It has all the functionality of input device, with some key difference: it is never co-located with the executable (unless this is the only READABLE device). Its primarily purpose is to be used as a script source input for apps written in a scripting language (python, for example).

Each WRITABLE device can have min_size property set.
It defines what will be the minimal data size produced by the device.
If device produced less than the amount it won't be processed, saved or otherwise used.
Default is min_size: 0 which means that even totally empty data files will be written to disk or serialized to user.

Users can further fine tune the location of specific sessions, by providing location hint. This hint is an arbitrary string.
The sessions with the same location string will be co-located on the same physical computation node if possible. The co-location is just a "best effort", if for some reason the sessions can not be started on the same node they will still be started albeit on different nodes.
The information about successful co-location will be returned in specific headers in response. See Response.md docs.

Examples

Sort files in "/data/binary*.data"

[ { "name":"sort", "exec":{"path":"swift://my_account/exec/sort.nexe"}, "devices":[ {"name":"stdin","path":"swift://my_account/data/binary*.data"}, {"name":"stdout","path":"swift://my_account/data/sorted*.data"}, {"name":"stderr"} ], "args":"1048576" } ]

This job will look into swift://my_account/data/binary*.data path and match objects by wild-card.

For each object it will create zerovm instance running swift://my_account/exec/sort.nexe object as nexe. It will create swift://my_account/data/sorted*.data output objects, exactly the same count as input objects.

Each output object will contain same characters instead of * that were matched in the original object.
.../data/binary\_log_345.data -> .../data/sorted\_log_345.data

Errors will be written to /dev/stderr, it does not have a path, therefore errors will be sent in HTTP response.

Each nexe will be run as sort.nexe 1048576 because args property was set, in case of sort it is a sort chunk size (specific to nexe, as it should be).

Create a map-reduce job

[ { "name":"mapper", "exec":{"path":"swift://my_account/exec/maper.nexe"}, "devices":[ {"name":"stdin","path":"swift://my_account/data/binary*.data"}, {"name":"stderr"} ], "connect":["mapper","reducer"] }, { "name":"reducer", "exec":{"path":"swift://my_account/exec/reducer.nexe"}, "connect":["manager"], "count":5 }, { "name":"manager", "exec":{"path":"swift://my_account/exec/manager.nexe"}, "devices":[ {"name":"stdout","path":"swift://my_account/data/mapred_result.data"}, {"name":"stderr"} ] } ]

This one uses networking, connect property is set on some nodes.

Mappers are connected to mappers (between themselves) and to reducers.

Each connect means unidirectional connect, i.e. each mapper can send to each other mapper and to each reducer.

Each reducer can send only to manager node.

Manager node has no count and its devices has no wild-cards, means there is exactly one manager node.

There are 5 reducers: count: 5.

There are N mappers, it depends how many objects match the swift://my_account/data/binary*.data wildcard. If there are 10 matching objects, there will be 10 mappers, each mapper will have 9 connections to 9 other mappers and 5 connections to each reducer.

Each mapper will see something like this in its /dev/in/ directory:
mapper-1 mapper-2 mapper-3 mapper-4 ........
In /dev/out/ directory:
mapper-1 mapper-2 mapper-3 mapper-4 ........ reducer-1 reducer-2 ........

The own node name can be found by reading argv[0] parameter (i.e. executable name). Using its own name from argv[0] and names of all other connected nodes from traversing /dev/in and /dev/out, each node can have full network map of all nodes it needs to communicate with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Servlets.md

Servlets.md

Servlet configuration format

Rules

Examples

Sort files in "/data/binary*.data"

Create a map-reduce job

Files

Servlets.md

Latest commit

History

Servlets.md

File metadata and controls

Servlet configuration format

Rules

Examples

Sort files in "/data/binary*.data"

Create a map-reduce job