Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvertToOnnx should also accept DataViewSchema #6448

Open
FranklinWhale opened this issue Nov 12, 2022 · 3 comments · May be fixed by #6449
Open

ConvertToOnnx should also accept DataViewSchema #6448

FranklinWhale opened this issue Nov 12, 2022 · 3 comments · May be fixed by #6449
Labels

Comments

@FranklinWhale
Copy link
Contributor

FranklinWhale commented Nov 12, 2022

Is your feature request related to a problem? Please describe.
Currently, saving a model to zip file only requires a DataViewSchema, but saving a model to ONNX requires IDataView.

Inside ConvertToOnnxProtobufCore, a prediction was performed (transform.Transform(inputData)), which may be expensive if the training data set is large.

Describe the solution you'd like
ConvertToOnnx should have overloads that accept DataViewSchema, then convert the DataViewSchema to an empty IDataView, and pass the empty IDataView to the methods accepting IDataView.

The performance of methods accepting IDataView may be improved if EmptyDataView is created from the Schema of the IDataView and passed to ConvertToOnnxProtobuf, instead of full data.

Describe alternatives you've considered
Nil

Additional context
I have implemented the proposed solution and that seems working well. It is unfortunate that EmptyDataView is an internal class, so I have to implement my own EmptyDataView.

sealed record EmptyDataView(DataViewSchema Schema) : IDataView {
	public bool CanShuffle => true;

	public long? GetRowCount() => 0L;

	public DataViewRowCursor GetRowCursor(IEnumerable<DataViewSchema.Column> columnsNeeded, Random? rand = null)
		=> new EmptyDataViewRowCursor(Schema);

	public DataViewRowCursor[] GetRowCursorSet(IEnumerable<DataViewSchema.Column> columnsNeeded, int n, Random? rand = null)
		=> Array.Empty<DataViewRowCursor>();
}

sealed class EmptyDataViewRowCursor : DataViewRowCursor {
	private readonly DataViewSchema schema;

	public EmptyDataViewRowCursor(DataViewSchema Schema) {
		schema = Schema;
	}

	public override DataViewSchema Schema => schema;

	public override long Position => -1L;

	public override bool IsColumnActive(DataViewSchema.Column column) => false;

	public override bool MoveNext() => false;

	public override long Batch => 0L;

	public override ValueGetter<TValue> GetGetter<TValue>(DataViewSchema.Column column)
		=> throw new InvalidOperationException();

	public override ValueGetter<DataViewRowId> GetIdGetter()
		=> throw new InvalidOperationException();
}
@FranklinWhale FranklinWhale added the enhancement New feature or request label Nov 12, 2022
@ghost ghost added the untriaged New issue has not been triaged label Nov 12, 2022
@FranklinWhale FranklinWhale linked a pull request Nov 12, 2022 that will close this issue
@ghost ghost added the in-pr label Nov 12, 2022
@michaelgsharp
Copy link
Member

So the call to Transform itself is a lazy call. Unless you get a cursor and start itterating over the data itself this call should basically be a NO-OP (not quite true as it does some validations and still sets up the pipeline, but no actual "processing" should take place).

Are you seeing something that is showing its actually doing data processing there?

@ghost
Copy link

ghost commented Nov 28, 2022

This issue has been marked needs-author-action and may be missing some important information.

@ghost ghost removed the untriaged New issue has not been triaged label Nov 28, 2022
@FranklinWhale
Copy link
Contributor Author

Replied at #6449 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants