Bring back GCS ops. #1229

michaelbanfield · 2020-12-14T23:00:20Z

This was original deleted because

https://pypi.org/project/tensorflow-gcs-config/

exists, however the source code of this is actually built from tensorflow/io . The reason we have the dedicated package is to have a smaller .so for disk space constrained environments.

We would like to leave these ops in tensorflow/io also.

google-cla · 2020-12-14T23:00:29Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

google-cla · 2020-12-14T23:18:54Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

This reverts commit 9702a15.

vnghia · 2020-12-15T01:45:35Z

I would like to ask if it has some overlaps with gcs filesystem. In addition, we now use google-cloud-cpp as a dependency, I think you could use that library instead of writing your own function 😄

kvignesh1420

@michaelbanfield Currently, the file system plugins are being migrated from tensorflow to tensorflow-io and the progress is being tracked here #1183. The goal is to have the file system extensions use a common modular approach and have a similar python API usage for all the schemes.

yongtang · 2020-12-15T15:42:28Z

My understanding is that the GCS ops is to be able to make adjustment (e.g., max_cache_size, block_size, service account, refresh token, etc), in real time within the same graph session. This is useful in situation where tensorflow has already been initialized when user get access.

vnghia · 2020-12-15T17:13:29Z

I didn't expect there are many dependencies related to gcs. We will have to think about the design of gcs right now. There are some problems:

As I am using google-cloud-cpp, we lost control to some parameters ( auth, token, ... ).
The inheritance ( like RetryingGcsFileSystem* ) will not be possible. I am thinking about introducing a new env to control it. add test for filesystem plugins #1221 (comment) )

Apologize for all the back and forth.

yongtang · 2020-12-15T17:22:51Z

@vnvo2409 I think the issue with env variable is that you have to pass the env at the initialization time. This can post some limitations where tensorflow might have already been imported/initialized.

An alternative approach is to hold a global variable in C/C++ for all the gcs related configurations. This will allow having an extra ops to modify the gcs configurations in runtime within the same graph session. This is more or less the same way gcs ops works in this PR.

michaelbanfield · 2020-12-21T19:13:31Z

Thanks for the comments, moving gcs filesystem to use google-cloud-cpp makes a lot of sense.

As discussed above these ops allow clients to modify gcs auth while tensorflow is already initialized. There are a few use cases that depend on this so we would want to retain this functionality when splitting to the modular filesystem.

Is there any concerns merging in these ops for the current filesystem? They used to live within tensorflow but were moved out to IO. Once the modular GCS filesystem supports configuration after initialization we can move to that.

yongtang · 2020-12-22T22:57:04Z

@michaelbanfield That should be fine. Though the test is failing now. Can you apply the following patch:

diff --git a/tensorflow_io/core/BUILD b/tensorflow_io/core/BUILD
index 3563b04..e661149 100644
--- a/tensorflow_io/core/BUILD
+++ b/tensorflow_io/core/BUILD
@@ -724,6 +724,7 @@ cc_binary(
         "//tensorflow_io/core:text_ops",
         "//tensorflow_io/core:ignite_ops",
         "//tensorflow_io/core:mongodb_ops",
+        "//tensorflow_io/gcs:gcs_config_ops",
         "@local_config_tf//:libtensorflow_framework",
         "@local_config_tf//:tf_header_lib",
     ] + select({
diff --git a/tests/test_gcs_config_ops.py b/tests/test_gcs_config_ops.py
index 7d01409..291986c 100644
--- a/tests/test_gcs_config_ops.py
+++ b/tests/test_gcs_config_ops.py
@@ -29,7 +29,7 @@ tf_v1 = tf.version.VERSION.startswith("1")
 class GcsConfigOpsTest(test.TestCase):
     """GCS Config OPS test"""
 
-    @pytest.mark.skipif(sys.platform == "darwin", reason=None)
+    @pytest.mark.skipif(sys.platform == "win32", reason="Windows not working yet")
     def test_set_block_cache(self):
         """test_set_block_cache"""
         cfg = gcs.BlockCacheParams(max_bytes=1024 * 1024 * 1024)

Once the patch is applied I think all tests will pass.

kvignesh1420 · 2021-03-30T14:17:46Z

tensorflow_io/gcs/kernels/gcs_config_op_kernels.cc

+  *fs = dynamic_cast<RetryingGcsFileSystem*>(filesystem);
+  if (*fs == nullptr) {
+    return errors::Internal(
+        "The filesystem registered under the 'gs://' scheme was not a "
+        "tensorflow::RetryingGcsFileSystem*.");
+  }


@michaelbanfield I think the dynamic_cast<RetryingGcsFileSystem*>(filesystem) is returning a null pointer which in turn is raising the error. I think this is because the gs file system plugin has already been registered and there is a type mismatch. Can we remove this typecast operation? Even the tests are failing for linux and macos: https://github.com/tensorflow/io/pull/1229/checks?check_run_id=2221952977

…w-io==0.17.0 (tensorflow#1230) * Update the API Compatibility test to include tf-nightly vs. tensorflow-io==0.17.0 as we release tensorflow-io==0.17.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump Linux and Windows version checks Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Bump Apache Arrow to 2.0.0 Also bumps Apache Thrift to 0.13.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update code to match Arrow Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump pyarrow to 2.0.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Stay with version=1 for write_feather to pass tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump flatbuffers to 1.12.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows issue Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove -std=c++11 and leave default -std=c++14 for arrow build Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update sha256 of libapr1 As the hash changed by the repo. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

This PR bumps Avro to 1.10.1. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Bump com_github_googleapis_google_cloud_cpp to `1.21.0` * Add gcs testbench * Bump `libcurl` to `7.69.1`

…ow#1238)

Building shared libraries on CentOS 8 is pretty much the same as on Ubuntu 20.04 except `apt` should be changed to `yum`. For that our CentOS 8 CI test is not adding a lot of value. Furthermore with the upcoming CentOS 8 change: https://www.phoronix.com/scan.php?page=news_item&px=CentOS-8-Ending-For-Stream CentOS 8 is effectively EOLed at 2021. For that we may want to drop the CentOS 8 build (only leave a comment in README.md) Note we keep CentOS 7 build for now as there are still many users using CentOS 7 and CentOS 7 will only be EOLed at 2024. We might drop CentOS 7 build in the future as well if there is similiar changes to CentOS 7 like CentOS 8. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* [mongoDB] update API docs * lint fixes * rename wrong API * lint fixes

tensorflow#1242)

…w#1246) This PR adds `fail-fast: false` to API Compatibility GitHub Actions. The main reason is to make sure if any job fails, the parallel jobs within the same matrix of the workflow can continue. The API Compatibility is to see how our plugin binaries match with different versions and as such we want to see the whole compatibility match up results. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

…orflow#1247) Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Fixes wrong benchmark tests names caused by last commit Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

…1304) This PR patchs arrow to temporarily resolve the ARROW-11518 issue. See 1281 for details Credit to diggerk. We will update arrow after the upstream PR is merged. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

This PR raises a warning instead of an error in case plugins .so module is not available, so that tensorflow-io package can be at least partially used with python-only functions. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

tensorflow#1241) * Remove external headers from tensorflow, and use third_party headers instead This PR removes external headers from tensorflow, and use third_party headers instead. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Address review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

This is breaking everything below it. https://www.tensorflow.org/io/api_docs/python/tfio/experimental/IODataset

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Fix read/STDIN_FILENO Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Switch to modular file system for hdfs This PR is part of the effort to switch to modular file system for hdfs. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Build against tf-nightly Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Adjust the if else logic, follow review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

With tensorflow upgrade to tf-nightly, the test_write_kafka test is failing and that is block the plan to modular file system migration. This PR disables the test temporarily so that CI can continue to push tensorflow-io-nightly image (needed for modular file system migration) Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* modify --plat-name for macosx wheels * switch to 10.14

This PR is part of the effort to switch to modular file system for s3. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update to enable python 3.9 building on Linux Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to always use ubuntu:20.04 Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

…ow#1315) This PR is an early experimental implementation of wavefront obj parser in tensorflow-io for 3D objects. This PR is the first step to obtain raw vertices in float32 tensor with shape of `[n, 3]`. Additional follow up PRs will be needed to handle meshs with different shapes (not sure if ragged tensor will be a good fit in that case) Some background on obj file: Wavefront (obj) is a format widely used in 3D (another is ply) modeling (http://paulbourke.net/dataformats/obj/). It is simple (ASCII) with good support for many softwares. Machine learning in 3D has been an active field with some advances such as PolyGen (https://arxiv.org/abs/2002.10880) Processing obj files are needed to process 3D with tensorflow. In 3D the basic elements could be vertices or faces. This PR tries to cover vertices first so that vertices in obj file can be loaded into TF's graph for further processing within graph pipeline. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

…nsorflow#1320)

…htly (tensorflow#1320)" (tensorflow#1323) This reverts commit 07d833f.

This PR enables python 3.9 build on macOS, as tf-nightly is available with macOS now. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

…he parsing time (tensorflow#1283) * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues -Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Bump Apache Arrow to 2.0.0 (tensorflow#1231) * Bump Apache Arrow to 2.0.0 Also bumps Apache Thrift to 0.13.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update code to match Arrow Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump pyarrow to 2.0.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Stay with version=1 for write_feather to pass tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump flatbuffers to 1.12.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows issue Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove -std=c++11 and leave default -std=c++14 for arrow build Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update sha256 of libapr1 As the hash changed by the repo. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add emulator for gcs (tensorflow#1234) * Bump com_github_googleapis_google_cloud_cpp to `1.21.0` * Add gcs testbench * Bump `libcurl` to `7.69.1` * Remove the CI build for CentOS 8 (tensorflow#1237) Building shared libraries on CentOS 8 is pretty much the same as on Ubuntu 20.04 except `apt` should be changed to `yum`. For that our CentOS 8 CI test is not adding a lot of value. Furthermore with the upcoming CentOS 8 change: https://www.phoronix.com/scan.php?page=news_item&px=CentOS-8-Ending-For-Stream CentOS 8 is effectively EOLed at 2021. For that we may want to drop the CentOS 8 build (only leave a comment in README.md) Note we keep CentOS 7 build for now as there are still many users using CentOS 7 and CentOS 7 will only be EOLed at 2024. We might drop CentOS 7 build in the future as well if there is similiar changes to CentOS 7 like CentOS 8. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * add tf-c-header rule (tensorflow#1244) * Skip tf-nightly:tensorflow-io==0.17.0 on API compatibility test (tensorflow#1247) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * [s3] add support for testing on macOS (tensorflow#1253) * [s3] add support for testing on macOS * modify docker-compose cmd * add notebook formatting instruction in README (tensorflow#1256) * [docs] Restructure README.md content (tensorflow#1257) * Refactor README.md content * bump to run ci jobs * Update libtiff/libgeotiff dependency (tensorflow#1258) This PR updates libtiff/libgeotiff to the latest version. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * remove unstable elasticsearch test setup on macOS (tensorflow#1263) * Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232) -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues - Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls Co-authored-by: Abin Shahab <ashahab@linkedin.com> * Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request. * Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232) * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Fixes Lint Issues * Removes Optional typing for method parameter - * Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Uncomments skip for macOS pytests * Fixes Lint issues Co-authored-by: Abin Shahab <ashahab@linkedin.com> * add avro tutorial testing data (tensorflow#1267) Co-authored-by: Cheng Ren <1428327+chengren311@users.noreply.github.com> * Update Kafka tutorial to work with Apache Kafka (tensorflow#1266) * Update Kafka tutorial to work with Apache Kafka Minor update to the Kafka tutorial to remove the dependency on Confluent's distribution of Kafka, and instead work with vanilla Apache Kafka. Signed-off-by: Dale Lane <dale.lane@uk.ibm.com> * Address review comments Remove redundant pip install commands Signed-off-by: Dale Lane <dale.lane@gmail.com> * add github workflow for performance benchmarking (tensorflow#1269) * add github workflow for performance benchmarking * add github-action-benchmark step * handle missing dependencies while benchmarking (tensorflow#1271) * handle missing dependencies while benchmarking * setup test_sql * job name change * set auto-push to true * remove auto-push * add personal access token * use alternate method to push to gh-pages * add name to the action * use different id * modify creds * use github_token * change repo name * set auto-push * set origin and push results * set env * use PERSONAL_GITHUB_TOKEN * use push changes action * use github.head_ref to push the changes * try using fetch-depth * modify branch name * use alternative push approach * git switch - * test by merging with forked master * Disable s3 macOS for now as docker is not working on GitHub Actions for macOS (tensorflow#1277) * Revert "[s3] add support for testing on macOS (tensorflow#1253)" This reverts commit 81789bd. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * rename testing data files (tensorflow#1278) * Add tutorial for avro dataset API (tensorflow#1250) * remove docker based mongodb tests in macos (tensorflow#1279) * trigger benchmarks workflow only on commits (tensorflow#1282) * Bump Apache Arrow to 3.0.0 (tensorflow#1285) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add bazel cache (tensorflow#1287) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add initial bigtable stub test (tensorflow#1286) * Add initial bigtable stub test Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix kokoro test Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add reference to github-pages benchmarks in README (tensorflow#1289) * add reference to github-pages benchmarks * minor grammar change * Update README.md Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> * Clear outputs (tensorflow#1292) * fix kafka online-learning section in tutorial notebook (tensorflow#1274) * kafka notebook fix for colab env * change timeout from 30 to 20 seconds * reduce stream_timeout * Only enable bazel caching writes for tensorflow/io github actions (tensorflow#1293) This PR updates so that only GitHub actions run on tensorflow/io repo will be enabled with bazel cache writes. Without the updates, a focked repo actions will cause error. Note once bazel cache read-permissions are enabled from gcs forked repo will be able to access bazel cache (read-only). Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable ready-only bazel cache (tensorflow#1294) This PR enables read-only bazel cache Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Rename tests (tensorflow#1297) * Combine Ubuntu 20.04 and CentOS 7 tests into one GitHub jobs (tensorflow#1299) When GitHub Actions runs it looks like there is an implicit concurrent jobs limit. As such the CentOS 7 test normally is scheduled later after other jobs completes. However, many times CentOS 7 test hangs (e.g., https://github.com/tensorflow/io/runs/1825943449). This is likely due to the CentOS 7 test is on the GitHub Actions queue for too long. This PR moves CentOS 7 to run after Ubuntu 20.04 test complete, to try to avoid hangs. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update names of api tests (tensorflow#1300) We renamed the tests to remove "_eager" parts. This PR updates the api test for correct filenames Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix wrong benchmark tests names (tensorflow#1301) Fixes wrong benchmark tests names caused by last commit Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Patch arrow to temporarily resolve the ARROW-11518 issue (tensorflow#1304) This PR patchs arrow to temporarily resolve the ARROW-11518 issue. See 1281 for details Credit to diggerk. We will update arrow after the upstream PR is merged. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove AWS headers from tensorflow, and use headers from third_party … (tensorflow#1241) * Remove external headers from tensorflow, and use third_party headers instead This PR removes external headers from tensorflow, and use third_party headers instead. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Address review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to use github to download libgeotiff (tensorflow#1307) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add @com_google_absl//absl/strings:cord (tensorflow#1308) Fix read/STDIN_FILENO Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to modular file system for hdfs (tensorflow#1309) * Switch to modular file system for hdfs This PR is part of the effort to switch to modular file system for hdfs. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Build against tf-nightly Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Adjust the if else logic, follow review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Disable test_write_kafka test for now. (tensorflow#1310) With tensorflow upgrade to tf-nightly, the test_write_kafka test is failing and that is block the plan to modular file system migration. This PR disables the test temporarily so that CI can continue to push tensorflow-io-nightly image (needed for modular file system migration) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to modular file system for s3 (tensorflow#1312) This PR is part of the effort to switch to modular file system for s3. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add python 3.9 on Windows (tensorflow#1316) * Updates the PR to use attribute instead of Env Variable -Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental variable. Because tensorflow-io rarely uses env vars to fine tune kernal ops this was changed to an attribute. See comment here: tensorflow#1283 (comment) * Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request. * Updates the PR to use attribute instead of Env Variable -Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental variable. Because tensorflow-io rarely uses env vars to fine tune kernal ops this was changed to an attribute. See comment here: tensorflow#1283 (comment) * Adds addtional comments in source code for understandability Co-authored-by: Abin Shahab <ashahab@linkedin.com> Co-authored-by: Yong Tang <yong.tang.github@outlook.com> Co-authored-by: Vo Van Nghia <vovannghia2409@gmail.com> Co-authored-by: Vignesh Kothapalli <vikoth18@in.ibm.com> Co-authored-by: Cheng Ren <chren@linkedin.com> Co-authored-by: Cheng Ren <1428327+chengren311@users.noreply.github.com> Co-authored-by: Dale Lane <dale.lane@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Mark Daoust <markdaoust@google.com>

…atasets (tensorflow#1280) * super_serial automatically creates TFRecords files from dictionary-style Tensorflow datasets. * pep8 fixes * more pep8 (undoing tensorflow 2 space tabs) * bazel changes * small change so github checks will run again * moved super_serial test to tests/ * bazel changes * moved super_serial to experimental * refactored super_serial test to work for serial_ops * bazel fixes * refactored test to load from tfio instead of full import path * licenses * bazel fixes * fixed license dates for new files * small change so tests rerun * small change so tests rerun * cleanup and bazel fix * added test to ensure proper crash occurs when trying to save in graph mode * bazel fixes * fixed imports for test * fixed imports for test * fixed yaml imports for serial_ops * fixed error path for new tf version * prevented flaky behavior in graph mode for serial_ops.py by preemptively raising an exception if graph mode is detected. * sanity check for graph execution in graph_save_fail() * it should be impossible for serial_ops not to raise an exception now outside of eager mode. Impossible. * moved eager execution check in serial_ops

Correct the link to Avro Reader tests in notebook

…ow#1336) * Bump abseil-cpp to 6f9d96a1f41439ac172ee2ef7ccd8edf0e5d068c This PR bumps abseil-cpp to 6f9d96a1f41439ac172ee2ef7ccd8edf0e5d068c to fix the build issue. See related changes in tensorflow/tensorflow/commit/1c9eeb9eaa1b712d71fc29bcc9054c25c7236fa2 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove flaky CentOS 7 build Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* switch to env * switch to gcs on tensorflow-io according to tensorflow/tensorflow#47247

* lazy loading for `s3` environements variables * `S3_ENDPOINT` supports http/https * remove `S3_USE_HTTPS` and `S3_VERIFY_SSL`

google-cla · 2021-03-30T17:17:26Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

michaelbanfield added 5 commits December 15, 2020 00:15

Add a standalone binary build for GCS ops

59ddbc4

Revert "Deprecate gcs-config (tensorflow#1024)"

ca8e327

This reverts commit 9702a15.

Rebase change

cb8ffe7

Clean up merge

3536338

Fix lint errors

11c8a51

michaelbanfield force-pushed the master branch from c9ab4dc to 11c8a51 Compare December 15, 2020 00:17

kvignesh1420 reviewed Dec 15, 2020

View reviewed changes

michaelbanfield added 2 commits March 29, 2021 09:27

Skip tests for windows

3560ac6

Move build target to exclude windows

42c2867

michaelbanfield requested a review from kvignesh1420 March 29, 2021 20:34

kvignesh1420 reviewed Mar 30, 2021

View reviewed changes

yongtang and others added 11 commits March 30, 2021 10:16

Bump Avro to 1.10.1 (tensorflow#1235)

9a3663c

This PR bumps Avro to 1.10.1. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Add emulator for gcs (tensorflow#1234)

2e6936f

* Bump com_github_googleapis_google_cloud_cpp to `1.21.0` * Add gcs testbench * Bump `libcurl` to `7.69.1`

fix nightly build because of missing google-cloud-storage (tensorfl…

04d6913

…ow#1238)

[MongoDB] update API docstrings (tensorflow#1243)

6c29813

* [mongoDB] update API docs * lint fixes * rename wrong API * lint fixes

Remove redundant output of dataset.element_spec in PostgreSQL tutorial (

371877e

tensorflow#1242)

add tf-c-header rule (tensorflow#1244)

88b9d8d

Skip tf-nightly:tensorflow-io==0.17.0 on API compatibility test (tens…

b0ffa2e

…orflow#1247) Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

yongtang and others added 28 commits March 30, 2021 10:16

Fix wrong benchmark tests names (tensorflow#1301)

79ccf5e

Fixes wrong benchmark tests names caused by last commit Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Fix docstring. (tensorflow#1305)

cc93afa

This is breaking everything below it. https://www.tensorflow.org/io/api_docs/python/tfio/experimental/IODataset

Switch to use github to download libgeotiff (tensorflow#1307)

f34d193

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Add @com_google_absl//absl/strings:cord (tensorflow#1308)

801569f

Fix read/STDIN_FILENO Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Modify --plat-name for macosx wheels (tensorflow#1311)

53c9a71

* modify --plat-name for macosx wheels * switch to 10.14

Switch to modular file system for s3 (tensorflow#1312)

3f7f292

This PR is part of the effort to switch to modular file system for s3. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Update to enable python 3.9 building on Linux (tensorflow#1314)

33fca56

* Update to enable python 3.9 building on Linux Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to always use ubuntu:20.04 Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

Add python 3.9 on Windows (tensorflow#1316)

314f406

Use -p 9000:9000 (and hide 8088) when launch hadoop (tensorflow#1317)

fb5cab8

update protobuf version to 3.11.4 to match tensorflow-nightly (te…

3b81b85

…nsorflow#1320)

Revert "update protobuf version to 3.11.4 to match tensorflow-nig…

1c85b77

…htly (tensorflow#1320)" (tensorflow#1323) This reverts commit 07d833f.

Enable python 3.9 build on macOS (tensorflow#1324)

ef46f8c

This PR enables python 3.9 build on macOS, as tf-nightly is available with macOS now. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

switch mnist dataset mirror to a more reliable one (tensorflow#1327)

3121308

remove flaky centos 7 based build action (tensorflow#1328)

57d840b

Fix link in avro reader notebook (tensorflow#1333)

9644be3

Correct the link to Avro Reader tests in notebook

Release nightly even if test fails (tensorflow#1339)

3de431d

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

remove unused/stale azure_ops (tensorflow#1338)

ef8a5d5

gcs switch to env (tensorflow#1319)

4154a2c

* switch to env * switch to gcs on tensorflow-io according to tensorflow/tensorflow#47247

improvements for s3 environements variables (tensorflow#1343)

64eb761

* lazy loading for `s3` environements variables * `S3_ENDPOINT` supports http/https * remove `S3_USE_HTTPS` and `S3_VERIFY_SSL`

vnghia mentioned this pull request May 5, 2021

Finalize testing for gcs filesystem #1400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring back GCS ops. #1229

Bring back GCS ops. #1229

michaelbanfield commented Dec 14, 2020

google-cla bot commented Dec 14, 2020

google-cla bot commented Dec 14, 2020

vnghia commented Dec 15, 2020

kvignesh1420 left a comment

yongtang commented Dec 15, 2020

vnghia commented Dec 15, 2020

yongtang commented Dec 15, 2020

michaelbanfield commented Dec 21, 2020

yongtang commented Dec 22, 2020

kvignesh1420 Mar 30, 2021 •

edited

google-cla bot commented Mar 30, 2021

Bring back GCS ops. #1229

Are you sure you want to change the base?

Bring back GCS ops. #1229

Conversation

michaelbanfield commented Dec 14, 2020

google-cla bot commented Dec 14, 2020

google-cla bot commented Dec 14, 2020

vnghia commented Dec 15, 2020

kvignesh1420 left a comment

Choose a reason for hiding this comment

yongtang commented Dec 15, 2020

vnghia commented Dec 15, 2020

yongtang commented Dec 15, 2020

michaelbanfield commented Dec 21, 2020

yongtang commented Dec 22, 2020

kvignesh1420 Mar 30, 2021 • edited

Choose a reason for hiding this comment

google-cla bot commented Mar 30, 2021

kvignesh1420 Mar 30, 2021 •

edited