Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): Add support for mutable clustering configuration #11225

Merged
merged 2 commits into from Apr 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
25 changes: 18 additions & 7 deletions google-cloud-bigquery/acceptance/bigquery/table_test.rb
Expand Up @@ -482,13 +482,24 @@
end
end

it "allows tables to be created with time_partitioning and clustering" do
table = time_partitioned_table
_(table.time_partitioning?).must_equal true
_(table.time_partitioning_type).must_equal "DAY"
_(table.time_partitioning_field).must_equal "dob"
_(table.time_partitioning_expiration).must_equal seven_days
_(table.clustering_fields).must_equal clustering_fields
it "allows tables to be created and updated with time_partitioning and clustering" do
begin
table = time_partitioned_table
_(table.time_partitioning?).must_equal true
_(table.time_partitioning_type).must_equal "DAY"
_(table.time_partitioning_field).must_equal "dob"
_(table.time_partitioning_expiration).must_equal seven_days
_(table.clustering_fields).must_equal clustering_fields

new_clustering_fields = ["last_name"]
table.clustering_fields = new_clustering_fields
_(table.clustering_fields).must_equal new_clustering_fields

table.clustering_fields = nil
_(table.clustering_fields).must_be :nil?
ensure
time_partitioned_table.delete
end
end

it "allows tables to be created with range_partitioning" do
Expand Down
39 changes: 23 additions & 16 deletions google-cloud-bigquery/lib/google/cloud/bigquery/load_job.rb
Expand Up @@ -484,7 +484,7 @@ def range_partitioning_end
# Checks if the destination table will be time partitioned. See
# [Partitioned Tables](https://cloud.google.com/bigquery/docs/partitioned-tables).
#
# @return [Boolean, nil] `true` when the table will be time-partitioned,
# @return [Boolean] `true` when the table will be time-partitioned,
# or `false` otherwise.
#
# @!group Attributes
Expand Down Expand Up @@ -560,10 +560,15 @@ def time_partitioning_require_filter?
###
# Checks if the destination table will be clustered.
#
# See {LoadJob::Updater#clustering_fields=}, {Table#clustering_fields} and
# {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and using clustered tables
#
# @return [Boolean, nil] `true` when the table will be clustered,
# @return [Boolean] `true` when the table will be clustered,
# or `false` otherwise.
quartzmo marked this conversation as resolved.
Show resolved Hide resolved
#
# @!group Attributes
Expand All @@ -578,14 +583,16 @@ def clustering?
# be first partitioned and subsequently clustered. The order of the
# returned fields determines the sort order of the data.
#
# See {LoadJob::Updater#clustering_fields=}.
# BigQuery supports clustering for both partitioned and non-partitioned
# tables.
#
# See {LoadJob::Updater#clustering_fields=}, {Table#clustering_fields} and
# {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/partitioned-tables
# Partitioned Tables
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and Using Clustered Tables
# Creating and using clustered tables
#
# @return [Array<String>, nil] The clustering fields, or `nil` if the
# destination table will not be clustered.
Expand Down Expand Up @@ -1819,23 +1826,23 @@ def time_partitioning_require_filter= val
end

##
# Sets one or more fields on which the destination table should be
# clustered. Must be specified with time-based partitioning, data in
# the table will be first partitioned and subsequently clustered.
# Sets the list of fields on which data should be clustered.
#
# Only top-level, non-repeated, simple-type fields are supported. When
# you cluster a table using multiple columns, the order of columns you
# specify is important. The order of the specified columns determines
# the sort order of the data.
#
# See {LoadJob#clustering_fields}.
# BigQuery supports clustering for both partitioned and non-partitioned
# tables.
#
# See {LoadJob#clustering_fields}, {Table#clustering_fields} and
# {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/partitioned-tables
# Partitioned Tables
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and Using Clustered Tables
# Creating and using clustered tables
#
# @param [Array<String>] fields The clustering fields. Only top-level,
# non-repeated, simple-type fields are supported.
Expand Down
39 changes: 23 additions & 16 deletions google-cloud-bigquery/lib/google/cloud/bigquery/query_job.rb
Expand Up @@ -514,7 +514,7 @@ def range_partitioning_end
# Checks if the destination table will be time-partitioned. See
# [Partitioned Tables](https://cloud.google.com/bigquery/docs/partitioned-tables).
#
# @return [Boolean, nil] `true` when the table will be time-partitioned,
# @return [Boolean] `true` when the table will be time-partitioned,
# or `false` otherwise.
#
# @!group Attributes
Expand Down Expand Up @@ -589,10 +589,15 @@ def time_partitioning_require_filter?
###
# Checks if the destination table will be clustered.
#
# See {QueryJob::Updater#clustering_fields=}, {Table#clustering_fields} and
# {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and using clustered tables
#
# @return [Boolean, nil] `true` when the table will be clustered,
# @return [Boolean] `true` when the table will be clustered,
# or `false` otherwise.
#
# @!group Attributes
Expand All @@ -607,14 +612,16 @@ def clustering?
# be first partitioned and subsequently clustered. The order of the
# returned fields determines the sort order of the data.
#
# See {QueryJob::Updater#clustering_fields=}.
# BigQuery supports clustering for both partitioned and non-partitioned
# tables.
#
# See {QueryJob::Updater#clustering_fields=}, {Table#clustering_fields} and
# {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/partitioned-tables
# Partitioned Tables
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and Using Clustered Tables
# Creating and using clustered tables
#
# @return [Array<String>, nil] The clustering fields, or `nil` if the
# destination table will not be clustered.
Expand Down Expand Up @@ -1445,23 +1452,23 @@ def time_partitioning_require_filter= val
end

##
# Sets one or more fields on which the destination table should be
# clustered. Must be specified with time-based partitioning, data in
# the table will be first partitioned and subsequently clustered.
# Sets the list of fields on which data should be clustered.
#
# Only top-level, non-repeated, simple-type fields are supported. When
# you cluster a table using multiple columns, the order of columns you
# specify is important. The order of the specified columns determines
# the sort order of the data.
#
# See {QueryJob#clustering_fields}.
# BigQuery supports clustering for both partitioned and non-partitioned
# tables.
#
# See {QueryJob#clustering_fields}, {Table#clustering_fields} and
# {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/partitioned-tables
# Partitioned Tables
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and Using Clustered Tables
# Creating and using clustered tables
#
# @param [Array<String>] fields The clustering fields. Only top-level,
# non-repeated, simple-type fields are supported.
Expand Down
83 changes: 66 additions & 17 deletions google-cloud-bigquery/lib/google/cloud/bigquery/table.rb
Expand Up @@ -471,8 +471,13 @@ def require_partition_filter= new_require
###
# Checks if the table is clustered.
#
# See {Table::Updater#clustering_fields=}, {Table#clustering_fields} and
# {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and using clustered tables
#
# @return [Boolean, nil] `true` when the table is clustered, or
# `false` otherwise, if the object is a resource (see {#resource?});
Expand All @@ -491,14 +496,16 @@ def clustering?
# first partitioned and subsequently clustered. The order of the
# returned fields determines the sort order of the data.
#
# See {Table::Updater#clustering_fields=}.
# BigQuery supports clustering for both partitioned and non-partitioned
# tables.
#
# See {Table::Updater#clustering_fields=}, {Table#clustering_fields=} and
# {Table#clustering?}.
#
# @see https://cloud.google.com/bigquery/docs/partitioned-tables
# Partitioned Tables
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and Using Clustered Tables
# Creating and using clustered tables
#
# @return [Array<String>, nil] The clustering fields, or `nil` if the
# table is not clustered or if the table is a reference (see
Expand All @@ -512,6 +519,53 @@ def clustering_fields
@gapi.clustering.fields if clustering?
end

##
# Updates the list of fields on which data should be clustered.
#
# Only top-level, non-repeated, simple-type fields are supported. When
# you cluster a table using multiple columns, the order of columns you
# specify is important. The order of the specified columns determines
# the sort order of the data.
#
# BigQuery supports clustering for both partitioned and non-partitioned
# tables.
#
# See {Table::Updater#clustering_fields=}, {Table#clustering_fields} and
# {Table#clustering?}.
#
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and using clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables#modifying-cluster-spec
# Modifying clustering specification
#
# @param [Array<String>, nil] fields The clustering fields, or `nil` to
quartzmo marked this conversation as resolved.
Show resolved Hide resolved
# remove the clustering configuration. Only top-level, non-repeated,
# simple-type fields are supported.
#
# @example
# require "google/cloud/bigquery"
#
# bigquery = Google::Cloud::Bigquery.new
# dataset = bigquery.dataset "my_dataset"
# table = dataset.table "my_table"
#
# table.clustering_fields = ["last_name", "first_name"]
#
# @!group Attributes
#
def clustering_fields= fields
reload! unless resource_full?
if fields
@gapi.clustering ||= Google::Apis::BigqueryV2::Clustering.new
@gapi.clustering.fields = fields
else
@gapi.clustering = nil
end
patch_gapi! :clustering
end

##
# The combined Project ID, Dataset ID, and Table ID for this table, in
# the format specified by the [Legacy SQL Query
Expand Down Expand Up @@ -3062,27 +3116,22 @@ def range_partitioning_end= range_end
end

##
# Sets one or more fields on which data should be clustered. Must be
# specified with time-based partitioning, data in the table will be
# first partitioned and subsequently clustered.
# Sets the list of fields on which data should be clustered.
#
# Only top-level, non-repeated, simple-type fields are supported. When
# you cluster a table using multiple columns, the order of columns you
# specify is important. The order of the specified columns determines
# the sort order of the data.
#
# You can only set the clustering fields while creating a table as in
# the example below. BigQuery does not allow you to change clustering
# on an existing table.
# BigQuery supports clustering for both partitioned and non-partitioned
# tables.
#
# See {Table#clustering_fields}.
# See {Table#clustering_fields} and {Table#clustering_fields=}.
#
# @see https://cloud.google.com/bigquery/docs/partitioned-tables
# Partitioned Tables
# @see https://cloud.google.com/bigquery/docs/clustered-tables
# Introduction to Clustered Tables
# Introduction to clustered tables
# @see https://cloud.google.com/bigquery/docs/creating-clustered-tables
# Creating and Using Clustered Tables
# Creating and using clustered tables
#
# @param [Array<String>] fields The clustering fields. Only top-level,
# non-repeated, simple-type fields are supported.
Expand Down
9 changes: 9 additions & 0 deletions google-cloud-bigquery/support/doctest_helper.rb
Expand Up @@ -879,6 +879,15 @@ def mock_storage
end
end

doctest.before "Google::Cloud::Bigquery::Table#clustering_fields=" do
mock_bigquery do |mock|
mock.expect :get_dataset, dataset_full_gapi, ["my-project", "my_dataset"]
mock.expect :get_table, table_full_gapi, ["my-project", "my_dataset", "my_table"]
mock.expect :patch_table, table_full_gapi, ["my-project", "my_dataset", "my_table", Google::Apis::BigqueryV2::Table, Hash]
mock.expect :get_table, table_full_gapi, ["my-project", "my_dataset", "my_table"]
end
end

# Google::Cloud::Bigquery::Table#data@Paginate rows of data: (See {Data#next})
# Google::Cloud::Bigquery::Table#data@Retrieve all rows of data: (See {Data#all})
doctest.before "Google::Cloud::Bigquery::Table#data" do
Expand Down