You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a MetaDataset which contains two HDFDatasets and I want to apply a sequence list filter file. The MetaDataset has an option seq_list_file, but the docstring says
You only need it if the tag name is not the same for all datasets.
It will currently not act as filter,
as the subdataset controls the sequence order (and thus what seqs to use).
Since the tag names are identical in my case, this does not seem to help. Therefore, I use seq_list_filter_file for each HDFDataset, something like
Reading sequence list for MetaDataset 'dev' from sub-dataset 'dev_features'
Dataset 'alignment' has less sequences (252366) than in sequence list (252377) read from 'features', this cannot work out!
Seq tag 'switchboard-1/sw02663A/sw2663A-ms98-a-0022' in dataset 'features' but not in dataset 'alignment'.
although the sequence list file only contains 300 lines and the stated seq tag is not contained in them.
If no seq_list_file is provided for the MetaDataset, it calls get_all_tags() of the default dataset. HDFDataset.get_all_tags() then returns all tags that are included in the hdf files and does not apply the seq list. This seems unexpected to me and results in the error above. Modifying HDFDataset.get_all_tags() to apply the filter, however, leads to issues in Dataset.get_seq_order_for_epoch().
What is a good way to fix the described issues?
The text was updated successfully, but these errors were encountered:
I have a
MetaDataset
which contains twoHDFDatasets
and I want to apply a sequence list filter file. TheMetaDataset
has an optionseq_list_file
, but the docstring saysSince the tag names are identical in my case, this does not seem to help. Therefore, I use
seq_list_filter_file
for eachHDFDataset
, something likeWhen running this config, RETURNN complains
although the sequence list file only contains 300 lines and the stated seq tag is not contained in them.
If no
seq_list_file
is provided for theMetaDataset
, it callsget_all_tags()
of the default dataset.HDFDataset.get_all_tags()
then returns all tags that are included in the hdf files and does not apply the seq list. This seems unexpected to me and results in the error above. ModifyingHDFDataset.get_all_tags()
to apply the filter, however, leads to issues inDataset.get_seq_order_for_epoch()
.What is a good way to fix the described issues?
The text was updated successfully, but these errors were encountered: