Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-0009 proxy message factory. #544

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

quotidian-ennui
Copy link
Member

@quotidian-ennui quotidian-ennui commented Oct 8, 2020

Motivation

There is a specific use-case where Interlok may have possibly been the wrong choice. It's essentially behaving as a passthrough stream of data into Elasticsearch. It's arguable that this is the wrong approach but Interlok Hammer

What can we do to mitigate the IOPS cost that happens when we download an S3 blob and then essentially call some other relatively expensive operation with the contents of the blob.

Pre Merge : https://github.com/adaptris/interlok/blob/ADR-0009-mutable-messages/docs/adr/0009-mutable-messages.md
Post Merge : https://github.com/adaptris/interlok/blob/develop/docs/adr/0009-mutable-messages.md

@codecov
Copy link

codecov bot commented Oct 8, 2020

Codecov Report

Merging #544 into develop will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##             develop     #544      +/-   ##
=============================================
- Coverage      91.24%   91.20%   -0.05%     
+ Complexity     12023    12014       -9     
=============================================
  Files           1186     1186              
  Lines          33146    33135      -11     
  Branches        2320     2320              
=============================================
- Hits           30244    30220      -24     
- Misses          2405     2416      +11     
- Partials         497      499       +2     
Impacted Files Coverage Δ Complexity Δ
...rvices/splitter/PoolingMessageSplitterService.java 87.03% <0.00%> (-12.97%) 15.00% <0.00%> (-1.00%)
...daptris/core/jms/JmsAsyncProducerEventHandler.java 87.75% <0.00%> (-4.09%) 9.00% <0.00%> (ø%)
...ava/com/adaptris/core/AdaptrisPollingConsumer.java 96.15% <0.00%> (-3.85%) 24.00% <0.00%> (ø%)
...main/java/com/adaptris/core/fs/FsConsumerImpl.java 78.12% <0.00%> (-3.13%) 27.00% <0.00%> (ø%)
...in/java/com/adaptris/core/lms/LargeFsConsumer.java 89.47% <0.00%> (-2.64%) 12.00% <0.00%> (-1.00%)
...tris/core/jms/ActiveJmsConnectionErrorHandler.java 77.77% <0.00%> (-1.12%) 13.00% <0.00%> (ø%)
...src/main/java/com/adaptris/core/SharedService.java 94.28% <0.00%> (ø) 17.00% <0.00%> (ø%)
...main/java/com/adaptris/core/SharedServiceImpl.java 100.00% <0.00%> (ø) 20.00% <0.00%> (-2.00%)
...n/java/com/adaptris/core/DynamicSharedService.java 100.00% <0.00%> (ø) 17.00% <0.00%> (-6.00%)
...tris/security/keystore/InlineKeystoreLocation.java 77.77% <0.00%> (ø) 13.00% <0.00%> (ø%)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 701b9cf...2eb6968. Read the comment docs.

@quotidian-ennui
Copy link
Member Author

The more I think about this, the more I think that while a mutable message factory might be useful generically, it may not be useful at all in this particular use-case since effectively there's 4 IOPS costs

  • Download the Blob -> network IOPS
  • Write the blob to disk (since it is large) -> disk IOPS
  • Read the blob from disk -> disk IOPS
  • batched index to elastic search -> network IOPS

By having a s3-readonly-factory we only semi-avoid the disk IOPS which may not be the "high cost", unless we have repeated operations.

Equally if we look to the javadocs for S3 SDK, then we will probably ultimately end up calling AmazonS3Client#getObject() since TransferManager only has download() to a file. This then leads us down to S3Object#getObjectContent has javadoc notes :

    /**
     * Gets the input stream containing the contents of this object.
     *
     * <p>
     * <b>Note</b>: The method is a simple getter and does not actually create a
     * stream. If you retrieve an S3Object, you should close this input stream
     * as soon as possible, because the object contents aren't buffered in
     * memory and stream directly from Amazon S3. Further, failure to close this
     * stream can cause the request pool to become blocked.
     * </p>
     *
     * @return An input stream containing the contents of this object.
     *
     * @see S3Object#getObjectMetadata()
     * @see S3Object#setObjectContent(InputStream)
     */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants