Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for custom datasources as plugins #74

Open
universalmind303 opened this issue Mar 27, 2024 · 1 comment
Open

Allow for custom datasources as plugins #74

universalmind303 opened this issue Mar 27, 2024 · 1 comment

Comments

@universalmind303
Copy link

The current system makes it pretty easy to add new transformations (expr)'s as plugins, but there is currently no good way for users to provide custom datasources.


Ideally, custom datasources should be as easy as implementing a trait or macro. There is already the AnonymousScan trait that mostly works for this use case, but doesn't work via pyo3-polars due to (de)serialization issues (see #67). Maybe we can have an FFI equivalent instead of the in memory AnonymousScan?

If we loosely base it off of datafusion's TableProvider it may look something like this

pub struct DummyDatasource {}

impl PolarsDatasource for DummyDatasource {
  fn schema(&self) -> SchemaRef {
    Arc::new(Schema::empty())
  }
  fn scan(&self, projection: &Option<Vec<usize>>, filters: &[Expr], limit: Option<usize>) -> Result<Box<dyn Executor>> {
    Ok(Box::new(DummyExec::new()))}
  }
}

pub struct DummyExec {}

impl DummyExec {
    pub fn new() -> Self {
        DummyExec {}
    }
}

impl Executor for DummyExec {
    fn execute(&mut self, cache: &mut ExecutionState) -> PolarsResult<DataFrame> {
        Ok(DataFrame::empty())
    };
}

Related issues

#67

@NielsPraet
Copy link

For my thesis I am currently looking at how I can hook an existing backend query service into Polars to use the Lazy DataFrame API. This however would need to be passed from the Rust side to the Python side as the use-case is aimed at Data Scientists / ML Engineers working in Python. From what I gathered it unfortunately seems to be impossible to do so right now, so I want to +1 this issue as this would in general open up a lot of possibilities for the Polars eco system!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants