Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support regex pattern in SFTPHOOK #15409

Closed
wants to merge 1 commit into from
Closed

Conversation

blcksrx
Copy link
Contributor

@blcksrx blcksrx commented Apr 16, 2021

In many cases, users are interested in some cases of files or types, not the whole thing. for example, users want to retrieve only CSV files in the /tmp directory and etc. Unfortunately, Paramiko does not support wi

related: #15332

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@blcksrx
Copy link
Contributor Author

blcksrx commented Apr 19, 2021

@mik-laj The author of this issue prefers to support fnmacth, I mean wildcards. whats you opinion?

@mik-laj
Copy link
Member

mik-laj commented Apr 19, 2021

I think, we should use fnmatchr because other operators also uses this function. This allows you to standardize the experience of using different operators so that you don't have to think about the parameter format.

@mik-laj
Copy link
Member

mik-laj commented Apr 19, 2021

It would be great if your methods were similar to the SFTP for GCS operator.
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/transfers/sftp_to_gcs.py#L138

def retrieve_file(self, remote_full_path: str, local_full_path: str) -> None:
def retrieve_file(
self, remote_full_path: str, local_full_path: str, regex_pattern: Optional[str] = None
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I really dislike about this implementation is it subtly changes semantics depending on the arguments. In this function, if regex_pattern is None, remote_full_path should point to a file, but when pattern matching is performed, it should point to a directory instead (and the file name matched with the pattern). This will be very unintuitive in practice and likely cause headaches to both users and maintainers.

Pattern matching is definitely a useful feature here, but I really don’t think this is how it should be done.

@blcksrx
Copy link
Contributor Author

blcksrx commented Apr 21, 2021

@saveriogzz Do you want to implement it?

@saveriogzz
Copy link
Contributor

@saveriogzz Do you want to implement it?

yes, I will work on it!

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jul 24, 2021
@github-actions github-actions bot closed this Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants