Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest2parquet Readme file needs to be cleaned up #47

Closed
shahrokhDaijavad opened this issue May 1, 2024 · 1 comment
Closed

ingest2parquet Readme file needs to be cleaned up #47

shahrokhDaijavad opened this issue May 1, 2024 · 1 comment
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation fixed Marks an issues as fixed in the dev branch

Comments

@shahrokhDaijavad
Copy link
Member

  1. There are still references to IBM bluepile in the Readme (command line options)
    AST string containing input/output paths. input_folder: Path to input folder of files to
    be processed output_folder: Path to output folder of processed files Example: {
    'input_folder': '/cos-optimal-llm-pile/bluepile-
    processing/rel0_8/cc15_30_preproc_ededup', 'output_folder': '/cos-optimal-llm-
    pile/bluepile-processing/rel0_8/cc15_30_preproc_ededup/processed' }

  2. In the section: Run the script via command-line, shouldn't it be:
    python ingest2parquet_local.py , instead of python ingest2parque.py ?

@shahrokhDaijavad shahrokhDaijavad added bug Something isn't working documentation Improvements or additions to documentation and removed bug Something isn't working labels May 1, 2024
@shahrokhDaijavad shahrokhDaijavad added the bug Something isn't working label May 6, 2024
@shahrokhDaijavad
Copy link
Member Author

shahrokhDaijavad commented May 6, 2024

This is more than a Readme file issue, because for S3 runs of this tool, we do it differently from other transforms, in which we use "make minio-start" (which starts the minio server, creates the input directory and puts a sample input file there), so I re-classify this a bug, because of the inconsistency of S3 runs of this tool with S3 runs of all other transforms.

@shahrokhDaijavad shahrokhDaijavad added the fixed Marks an issues as fixed in the dev branch label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation fixed Marks an issues as fixed in the dev branch
Projects
None yet
Development

No branches or pull requests

2 participants