Skip to content

Latest commit

 

History

History
147 lines (97 loc) · 6.47 KB

File metadata and controls

147 lines (97 loc) · 6.47 KB

Amazon S3 Source

This source app supports transfer of files using the Amazon S3 protocol. Files are transferred from the remote directory (S3 bucket) to the local directory where the application is deployed.

Messages emitted by the source are provided as a byte array by default. However, this can be customized using the --mode option:

  • ref Provides a java.io.File reference

  • lines Will split files line-by-line and emit a new message for each line

  • contents The default. Provides the contents of a file as a byte array

When using --mode=lines, you can also provide the additional option --withMarkers=true. If set to true, the underlying FileSplitter will emit additional start-of-file and end-of-file marker messages before and after the actual data. The payload of these 2 additional marker messages is of type FileSplitter.FileMarker. The option withMarkers defaults to false if not explicitly set.

See also MetadataStore options for possible shared persistent store configuration used to prevent duplicate messages on restart.

mode = lines

Headers:

  • Content-Type: text/plain

  • file_orginalFile: <java.io.File>

  • file_name: <file name>

  • correlationId: <UUID> (same for each line)

  • sequenceNumber: <n>

  • sequenceSize: 0 (number of lines is not know until the file is read)

Payload:

A String for each line.

The first line is optionally preceded by a message with a START marker payload. The last line is optionally followed by a message with an END marker payload.

Marker presence and format are determined by the with-markers and markers-json properties.

mode = ref

Headers:

None.

Payload:

A java.io.File object.

Options

The s3 source has the following options:

Properties grouped by prefix:

file.consumer

markers-json

When 'fileMarkers == true', specify if they should be produced as FileSplitter.FileMarker objects or JSON. (Boolean, default: true)

mode

The FileReadingMode to use for file reading sources. Values are 'ref' - The File object, 'lines' - a message per line, or 'contents' - the contents as bytes. (FileReadingMode, default: <none>, possible values: ref,lines,contents)

with-markers

Set to true to emit start of file/end of file marker messages before/after the data. Only valid with FileReadingMode 'lines'. (Boolean, default: <none>)

metadata.store.dynamo-db

create-delay

Delay between create table retries. (Integer, default: 1)

create-retries

Retry number for create table request. (Integer, default: 25)

read-capacity

Read capacity on the table. (Long, default: 1)

table

Table name for metadata. (String, default: <none>)

time-to-live

TTL for table entries. (Integer, default: <none>)

write-capacity

Write capacity on the table. (Long, default: 1)

metadata.store.jdbc

region

Unique grouping identifier for messages persisted with this store. (String, default: DEFAULT)

table-prefix

Prefix for the custom table name. (String, default: <none>)

metadata.store.mongo-db

collection

MongoDB collection name for metadata. (String, default: metadataStore)

metadata.store.redis

key

Redis key for metadata. (String, default: <none>)

metadata.store

type

Indicates the type of metadata store to configure (default is 'memory'). You must include the corresponding Spring Integration dependency to use a persistent store. (StoreType, default: <none>, possible values: mongodb,redis,dynamodb,jdbc,zookeeper,hazelcast,memory)

metadata.store.zookeeper

connect-string

Zookeeper connect string in form HOST:PORT. (String, default: 127.0.0.1:2181)

encoding

Encoding to use when storing data in Zookeeper. (Charset, default: UTF-8)

retry-interval

Retry interval for Zookeeper operations in milliseconds. (Integer, default: 1000)

root

Root node - store entries are children of this node. (String, default: /SpringIntegration-MetadataStore)

s3.common

endpoint-url

Optional endpoint url to connect to s3 compatible storage. (String, default: <none>)

path-style-access

Use path style access. (Boolean, default: false)

s3.supplier

auto-create-local-dir

Create or not the local directory. (Boolean, default: true)

delete-remote-files

Delete or not remote files after processing. (Boolean, default: false)

filename-pattern

The pattern to filter remote files. (String, default: <none>)

filename-regex

The regexp to filter remote files. (Pattern, default: <none>)

list-only

Set to true to return s3 object metadata without copying file to a local directory. (Boolean, default: false)

local-dir

The local directory to store files. (File, default: <none>)

preserve-timestamp

To transfer or not the timestamp of the remote file to the local one. (Boolean, default: true)

remote-dir

AWS S3 bucket resource. (String, default: bucket)

remote-file-separator

Remote File separator. (String, default: /)

tmp-file-suffix

Temporary file suffix. (String, default: .tmp)

Amazon AWS common options

The Amazon S3 Source (as all other Amazon AWS applications) is based on the Spring Cloud AWS project as a foundation, and its auto-configuration classes are used automatically by Spring Boot. Consult their documentation regarding required and useful auto-configuration properties.

Some of them are about AWS credentials:

  • cloud.aws.credentials.accessKey

  • cloud.aws.credentials.secretKey

  • cloud.aws.credentials.instanceProfile

  • cloud.aws.credentials.profileName

  • cloud.aws.credentials.profilePath

Other are for AWS Region definition:

  • cloud.aws.region.auto

  • cloud.aws.region.static

And for AWS Stack:

  • cloud.aws.stack.auto

  • cloud.aws.stack.name

Examples

java -jar s3-source.jar --s3.remoteDir=/tmp/foo --file.consumer.mode=lines