Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing single file ended with ~9 GCS calls #975

Open
kobi-lemberg opened this issue Mar 8, 2023 · 0 comments
Open

Writing single file ended with ~9 GCS calls #975

kobi-lemberg opened this issue Mar 8, 2023 · 0 comments

Comments

@kobi-lemberg
Copy link

Hi,

we are using HDFS GCS connector to read and write files to GCS.
as part of our RDD flow, we have a code which write to GCS in a custom logic.
below there is a code snippet that create 3 simple file in GCS:

val path = "gs://collection-dev/kobi6/test11"
writeToFile(s"$path/text1.csv", lines = Seq("1", "2", "3"))
writeToFile(s"$path/text2.csv", lines = Seq("1", "2", "3"))
writeToFile(s"$path/text3.csv", lines = Seq("1", "2", "3"))

def writeToFile(fileName: String, lines: Seq[String], withSeparator: Boolean = false): Option[Throwable] = {
  writeContext(fileName) { outputStream =>
    // create index and regular file
    for (currentLine <- lines) {
      outputStream.write((currentLine + '\n').getBytes)
    }
    outputStream.close()
  } match {
    case Success(_) => None
    case Failure(err) =>
      logger.errorP(s"Failed to write to $fileName", err, category)
      Some(err)
  }
}

def writeContext(filename: String)(f: OutputStream => Unit): Try[Unit] = {
  val path = new Path(filename)
  val fileSystem = path.getFileSystem(getConfiguration)
  val out = fileSystem.create(path, true)
  val writeAttempt = Try {
    f(out)
    out.close()
  }
  writeAttempt
}

when running this code, for each file, we noticed that there were made 9 GCS calls (get directories structure, get there metadata and so on)
this is translated to costs

I also tried to put the following configuration:

"fs.gs.status.parallel.enable", "false"
"fs.gs.create.items.conflict.check.enable", "false"

but I still see those calls through stack driver audit logs
below is a picture of the audit logs:
ithhkbtx
is there any option to eliminate those calls?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant