Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC-307] [2024.1] Replace/decommision a node #22207

Open
wants to merge 472 commits into
base: master
Choose a base branch
from

Conversation

aishwarya24
Copy link
Collaborator

@aishwarya24 aishwarya24 commented Apr 30, 2024

@aishwarya24 aishwarya24 added the area/documentation Documentation needed label Apr 30, 2024
@aishwarya24 aishwarya24 self-assigned this Apr 30, 2024
@aishwarya24 aishwarya24 added this to In progress in Documentation via automation Apr 30, 2024
@aishwarya24 aishwarya24 changed the title [DOC-307] Replace/decommision a node [DOC-307] [2024.1]Replace/decommision a node Apr 30, 2024
Copy link

netlify bot commented Apr 30, 2024

Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 833ab10
🔍 Latest deploy log https://app.netlify.com/sites/infallible-bardeen-164bc9/deploys/664cb54ee5027500085bccda
😎 Deploy Preview https://deploy-preview-22207--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@aishwarya24 aishwarya24 changed the title [DOC-307] [2024.1]Replace/decommision a node [DOC-307] [2024.1] Replace/decommision a node May 1, 2024
nchandrappa and others added 21 commits May 2, 2024 12:41
…event getting SyntaxWarning.

Summary:
- Changed all the string literals to have an r prefix where SyntaxWarning was being thrown.
- Add a check for has_errors when setting ui_port_available temp variable. Without this check, when an incorrect advertise_address is given, it results in exceptions.

Test Plan: manual tests

Reviewers: sgarg-yb

Reviewed By: sgarg-yb

Subscribers: yugabyted-dev

Differential Revision: https://phorge.dev.yugabyte.com/D34647
Summary:
Implement following validations for Kubernetes provider:
- Zone labels
- Storage class
- Namespace
- cert-manager
- MCS pod address template

We create a merged CloudInfo per zone and then run these validations
on it. Merged CloudInfo has values from provider, region, and
zone. kubeconfigs are written to a temporary directory which is
deleted after the validations.

In case of RBAC errors we log those as warnings and skip that
particular validation if those permissions are not required for normal
functioning of YBA. Unknown exception are thrown and shows as an error
to the user.

See [[ https://docs.google.com/document/d/11s3yh0mD0BbTQXhLZBsiWqg1odfR0JDiePDNYbVcbUw/edit | design document ]] for more details.

Test Plan:
Enabled the flag:

```
curl --location --request PUT 'http://localhost:9000/api/v1/customers/<customer UUID>/runtime_config/00000000-0000-0000-0000-000000000000/key/yb.provider.kubernetes_provider_validation' \
  --header 'X-AUTH-TOKEN: <token>' \
  --header 'Content-Type: text/plain' \
  --data-raw 'true'
```

Tested different combinations for all the fields. Example output:

```
{
  "success": false,
  "error": {
    "$.regions[0].zones[2].details.cloudInfo.kubernetes.kubernetesStorageClass": [
      "Storage class doesn't have allowVolumeExpansion set to true",
      "Storage class volumeBindingMode is not set to 'WaitForFirstConsumer'"
    ],
    "$.regions[0].zones[2].details.cloudInfo.kubernetes.certManagerIssuer": [
      "Namespace must be provided when using Issuer"
    ],
    "$.regions[0].zones[1].details.cloudInfo.kubernetes.kubernetesStorageClass": [
      "Storage class volumeBindingMode is not set to 'WaitForFirstConsumer'"
    ],
    "errorSource": [
      "providerValidation"
    ],
    "$.regions[0].zones[0].details.cloudInfo.kubernetes.kubernetesStorageClass": [
      "Storage class doesn't exist in the cluster"
    ],
    "$.regions[0].zones[1].code": [
      "Cluster doesn't have any nodes in the asia-south1-z zone"
    ],
    "$.regions[0].zones[0].details.cloudInfo.kubernetes.kubePodAddressTemplate": [
      "Pod address template generated an invalid DNS, allowed placeholders are: {pod_name}, {service_name}, {namespace}, and {cluster_domain}"
    ],
    "$.regions[0].zones[1].details.cloudInfo.kubernetes.certManagerIssuer": [
      "Issuer doesn't exist in the bhavin-test namespace"
    ],
    "$.regions[0].zones[0].details.cloudInfo.kubernetes.kubeNamespace": [
      "Namespace doesn't exist in the cluster"
    ]
  }
  "requestJson": {
    <original request body here>
  }
}
```

Reviewers: #yba-api-review, svarshney, anijhawan, dshubin

Reviewed By: #yba-api-review, svarshney, anijhawan, dshubin

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D33944
* moved triggers to advanced faetures

* comments from review
…undaries handling

Summary:
When range size is quite low and we have a lot of data belonging to the same DocKey, Tablet::GetTabletKeyRanges may return duplicated keys and keys containing either lower bound or upper bound.
This can happen if we have a lot of updates to the same row that will be recorded as different RocksDB records having the same DocKey and as a result without additional checks, simply skipping N blocks might be positioned to the same DocKey.

This leads to `ERROR:  Operation list must not be empty` user error for parallel query execution in such cases.

1) YSQL layer is using returned keys as bounds for scan requests and the latest returned key as a starting point for next GetTabletKeyRanges request. To avoid returning the same key to the upper layer on subsequent request, updated `Tablet::GetTabletKeyRanges` contract to always exclude lower/upper bound keys from the output. YSQL is also using these lower/upper bounds as bounds for scan requests, so there is no value in returning them as range keys since it will result in having ranges like `(lower_bound, lower_bound)` and `(upper_bound, upper_bound)` at YSQL layer.
2) Fixed `Tablet::GetTabletKeyRanges` to avoid returning duplicate keys.
3) Added tests.
Jira: DB-11002

Test Plan:
```
ybd --cxx-test ql-tablet-test --gtest_filter GetTabletKeyRangesTest.Distribution* -n 50 -- -p 1
ybd --cxx-test ql-tablet-test --gtest_filter GetTabletKeyRangesTest.Boundaries* -n 50 -- -p 1
```

Reviewers: arybochkin, amartsinchyk

Reviewed By: arybochkin, amartsinchyk

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34341
Summary:
The design of our newer modals all use confirmation textboxes instead of confirmation checkboxes.
This diff aligns the HA promotion modal to do the same.

Test Plan:
Verify form submit is disabled when `PROMOTE` is not entered.
Verify form submit is not disabled when `PROMOTE` is entered.
{F173446}

Reviewers: rmadhavan

Reviewed By: rmadhavan

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34502
Summary:
`create_bitmap_subplan` takes several lists as input, and each list should be set to a real value or
NIL.

D32651 / 773869c added another list to this function for pushdown
quals, but did not set the value in all cases. Callers of the function would pass in an unitialized
value for `indexpushdownquals`, and expect it to be a valid list. Instead, it might remain
unitialized and potentially cause a crash.

Fix the issue by handling the one remaining branch where indexpushdownquals is not set.
Jira: DB-11003

Test Plan:
```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressGin'

Reviewers: amartsinchyk, tnayak

Reviewed By: tnayak

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34367
…nsaction + Changes subtxn id semantics at pg/pg_client_service

Summary:
As we allow buffering of operations in YSQL, multiple batches belonging to the same subtxn id are launched in parallel in an async manner. There is only one outstanding batch executing `PgClientSession::SetupSession`, while there can be many outstanding batches executing `YBTransaction::Impl::Prepare` as part of callback from `LookupByKeyRpc`.

In the existing implementation, it leads to data race issues with `YBSubTransaction`. A previously launched batch is trying to access `highest_subtransaction_id_` during `Prepare` to populate in-flight ops metadata, while a subsequent batch is trying to set the same field `highest_subtransaction_id_`. Though the writer thread tries to overwrite `highest_subtransaction_id_` with the same old value, it leads to a data race. Also, for RC transactions, we seem to hit a similar race while executing `RollbackToSubTransaction`. Seems like the pg backend issues a rollback on error before all inflight operations complete.
```
Status PgApiImpl::RollbackToSubTransaction(SubTransactionId id) {
  pg_session_->DropBufferedOperations();
  return pg_session_->RollbackToSubTransaction(id);
}
```

**Change 1**
To address the data race, we now set subtxn metadata for the batch (batch of ops) by setting it during `Batcher::FlushAsync`. Batcher then launches `YBTransaction::Impl::Prepare` for the underlying transaction, which sets only the transaction metadata. If the batch fails, the subtxn metadata is copied to the retry batcher.

**Change 2**
This diff also addresses an anomaly with `active_sub_transaction_id_` passed from `pg_session`. Postgres assigns subtransaction id(s) starting from 1. But in the existing implementation, we see that `active_sub_transaction_id_` starts from 0 and then bumps up to 2 on savepoint creation (value as seen in the requests at `pg_client_session.cc`). In `client/transaction.cc`, we leave the subtxn metadata unpopulated if no savepoint has been created yet. The downstream code assumes that the subtxn belonged to id 1 since the subtxn metadata was unpopulated. To avoid this confusion, we change the default value of `active_sub_transaction_id_` and populate the subtxn metadata pb only when subtxn is not in its default state.

Enabling test `PgTableSizeTest.PartitionedTableSize` to run in tsan mode, as the most of the race issues with pggate are resolved now.

Note: The earlier version of the [[ https://phorge.dev.yugabyte.com/D23412 | fix ]] lead to a data loss [[ yugabyte#16988 | issue ]] as the subtxn metadata was not being populated for retry batchers (which are quite probable in case of tablet splits). Hence, these ops were wrongly being treated as belonging to `kMinSubtransactionId` by downstream code, which might have lead to the issue. Added a test that validates the reasoning.

**Additional note**
We discussed offline that this change shouldn't have upgrade/downgrade consequences. Since pg process is a child process of the tserver process, changes at pg_client_session/service and pg_txn_manager would have taken effect in combination. So the subtxn metadata in the actual rpc to docdb would remain unchanged.
Jira: DB-5343, DB-10441

Test Plan:
Jenkins
```
./yb_build.sh tsan --gtest_filter PgTableSizeTest.PartitionedTableSize -n 100
./yb_build.sh --gtest_filter PgTabletSplitTest.SplitAmidstRunningTransaction
./yb_builds tsan --cxx-test='TEST_F(GeoPartitionedReadCommiittedTest, TestPromotionAmidstConflicts) {'
```

Reviewers: pjain, esheng, rsami, rthallam

Reviewed By: pjain

Subscribers: yql, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D25071
…e#22225)

* Add docs for configure_read_replica command.

* Updated docs for backup, restore, configure_read_replica, finalize_upgrade commands.

* Fixing links.

* Apply suggestions from code review

---------

Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com>
Summary:
Do not display TEST flags in the human readable `/varz` endpoint unless the user overrides them. TEST flags have tag `kHidden` and their name starts with `TEST_`. This is ensured by the `DEFINE_test_flag` macro.

Fixes yugabyte#14755
Jira: DB-4087

Test Plan: MasterPathHandlersItest.TestTestFlag

Reviewers: esheng

Reviewed By: esheng

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34701
* get started guide

* misc changes

* misc edits

* Edits

* format

* Add suggested links

* minor edit

* reorganization

* Triggering rebuild

* Triggering preview rebuild

modified the title

* review comments

* comments

---------

Co-authored-by: Premkumar <contactprem+premkumr@gmail.com>
Co-authored-by: Premkumar <pthangamani@yugabyte.com>
…pped successfully

Summary:
To reproduce the bug:

```
yugabyte=# create table foo(id int);
CREATE TABLE
yugabyte=# alter table foo drop column id;
ALTER TABLE
```

Note that the ALTER TABLE statement completed successfully. Look at yb-master.INFO, saw:

```
I0502 19:26:37.713428 27752 ysql_ddl_handler.cc:396] Sending Alter Table request as part of rollback for table foo
```

The log says `rollback` which suggests the DDL operation failed. But in this
case `roll forward` is more appropriate. We first mark the column id to be dropped
in DocDB metadata for table `foo`, then drop the column from PG catalog metadata
for table `foo`. Once the PG side has successfully committed, we then roll forward
the DocDB metadata change by dropping the column. This completes the entire DDL
operation.

I made a fix to log `roll forward` if the PG side commits successfully, and
`rollback` if the PG side aborts.

Also made some small test code cleanup for DDL atomicity because --ysql_yb_ddl_rollback_enabled
is true by default now.
Jira: DB-11161

Test Plan:
(1) Manually do the test described in the summary, saw the new log:

```
I0502 20:12:33.628252  4412 ysql_ddl_handler.cc:399] Sending Alter Table request as part of roll forward for table foo
```
(2) ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgUniqueConstraint'
(3) YB_ENABLE_YSQL_CONN_MGR_IN_TESTS=true ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgUniqueConstraint'

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34704
…gn 2) (yugabyte#21308)

* Enable Drift & move kapa.ai to header right after search

* Change icon and text

* Fix button issue while page is loading

* Enale/Disable Drift option and move drift in head

* Update search banner message

* Update search banner message
Summary:
The diff adds the YSQL syntax to support DB cloning: creating a database as a lightweight copy of another database. User can now create a database `db2` as a clone of database `db1` as of a point in time `t1`, by writing the following sql query:
`CREATE DATABASE db2 TEMPLATE db1 AS OF t1;`
where t1 is a linux timestamp in microseconds.
Using the template syntax to support DB cloning is in line with postgres [[ https://www.postgresql.org/docs/current/manage-ag-templatedbs.html#MANAGE-AG-TEMPLATEDBS | intention ]] to support copying databases.
Before this diff, it was not supported to use the TEMPLATE keyword with any database other than `template0` and `template1`. Now we removed this limitation.

Yugabyte can also perform the cloning without the strict condition inherited from postgres: `no other sessions can be connected to the source database while it is being copied`. So removed this limitation in this diff and we can copy the database even if there are other connections to the source database.

**Upgrade/Rollback safety:**
The clone feature is guarded by the preview flag `enable_db_clone `.

Jira: DB-8620

Test Plan:
ybd --cxx-test integration-tests_minicluster-snapshot-test --gtest_filter PgCloneTest.CloneYsqlSyntax

Removed `CREATE DATABASE test TEMPLATE = some_template;` from `CREATE DATABASE` unsupported options in regress tests as now we support copying databases from templates other than `template0` or `template1`. The same statement cannot be added to the regress test as supported because we require having a snapshot schedule on the template database as a requirement for copying the database. However, the unit test `PgCloneTest.CloneYsqlSyntax` is covering this case.

Reviewers: asrivastava, yguan

Reviewed By: asrivastava, yguan

Subscribers: pjain, yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D33408
…ection Manager

Summary:
This diff adds the support for login limit connections per user in ysql connection manager.

The only known limitation of this implementation is that the client has to wait for few seconds (by default 10seconds) for connection manager stats to get updated which is used to calculate current number of connections made by a particular user. Track Here: [[ yugabyte#21645 | GH yugabyte#21645 ]]

One of the existing java unit tests (`org.yb.pgsql.TestPgAuthorization#testAttributes`) which tests connection limit per user is not fixed as part of this diff, as it still fails while dropping the user which is not supported in ysql connection manager at the time of creating this diff. (Tracked by [[ yugabyte#21862 | GH yugabyte#21862 ]]  ). Added a TODO comment to fix conn per limit user part of the test.

Jira: DB-7747

Test Plan:
 Added java unit test:
```./yb_build.sh --enable-ysql-conn-mgr-test --java-test org.yb.ysqlconnmgr.TestUserLoginLimit```

Reviewers: rbarigidad

Reviewed By: rbarigidad

Subscribers: rbarigidad, nkumar, yql

Differential Revision: https://phorge.dev.yugabyte.com/D33469
Summary: The ssh port is moved out off region metadata. This diff implements the fix to show the correct Port no

Test Plan:
Tested manually.

{F175066}

Reviewers: lsangappa

Reviewed By: lsangappa

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34691
* yba prepare section

* misc edits

* more edits

* overview page

* add more requirements

* more requirements

* more updates

* misc updates

* more updates

* misc edits

* minor fixes

* Update with latest changes

* Misc edits

* prepare landing page

* Update docs/content/preview/yugabyte-platform/prepare/_index.md

* menu order

* ear changes

* links and redirects

* review comments

* redirects and links

* links

* header links

* tidyups

* minor edit

* review comments

* review comments

* minor edit

* review comments

* format

* review comments

* review comments

* typo

* review comments

* review, format

* more fixes

* image

* comments

* format, edit

* minor edits

* intro format

* fix for Replication migration

* review comments

* minor edit

* fix links and delet unused files

* minor edits

* minor edit
Summary:
[PLAT-13089] SystemdUpgrade is now retriable
Adjusted the state check in verify params to allow SystemdUpgrade.
to go through on a retry.

Test Plan:
{F175509}

{F175508}phabricator
localtest

Reviewers: #yba-api-review!, nsingh

Reviewed By: nsingh

Subscribers: nsingh, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34710
…ender

Summary:
We have debug logs in place in walsender to help identify issues in case of errors. Some of these logs are printed for every row being streamed. As a result, they can have a substantial overhead.

This revision updates them with the following strategy:
1. Batch level: DEBUG1. These logs are printed once per batch of changes received from the VWAL. The current batch size is 500. So these are summary logs
2. Row level: DEBUG2. These logs are printed for every row being streamed.

Also did misc. improvements in the logs.
Jira: DB-11152

Test Plan:
Jenkins: test regex: .*ReplicationSlot.*

Existing tests

Reviewers: asrinivasan

Reviewed By: asrinivasan

Subscribers: ybase, ycdcxcluster, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34682
rajmaddy89 and others added 27 commits May 17, 2024 12:36
…s in decomissioned state.

Summary: Node Perform check looks to be disabled when the node is in decomissioned state, but actually the button is clickable. When its disabled, it should not be clickable.

Test Plan: Tested locally

Reviewers: cwang

Reviewed By: cwang

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D35107
Summary:
Please refer to the screeenshot on all the issues fixed
1. Main release list page
{F178030}

2. Release Details Page side panel
{F178031}

3. Add Release Modal
{F178033}

4. Edit release modal
{F178034}

Test Plan:
Please refer to the screenshots

{F178045}

{F178046}

{F178047}

Got APPROVAL from Jin on a call.

Reviewers: jmak

Reviewed By: jmak

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D35051
Summary:
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                yugabyte#2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                yugabyte#3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35116
* added release notes for 2.21

* added link
… and xcluster

Summary: Add UTs for commit: D32379 / 7cc40d7

Test Plan: UTs passed

Reviewers: sanketh, jmak

Reviewed By: jmak

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D32494
* renew hashicorp vault tokens

* format

* backport

* format
Summary:
According to the postgres protocol (https://www.postgresql.org/docs/current/protocol-flow.html),
the server will always respond with a ReadyForQuery message when it finishes processing a request
to let the client know that it's ready to process a new query.

We set the metadata after the first input message after a ReadyForQuery message was sent.
We unset this metadata when we are just about the send a ReadyForQuery message to the client.
This way we don't have to worry about all the individual cases, and whether this is a simple query
or an extended query.

This means that for every incoming request from the client, we set the root request id once.
If there is a multi-statement query request, all the queries will have the same root request id
because all of them came in a single request from the client. Note that the query id will not
be same in this case, it will keep changing as the server keeps processing each individual
query.

Only in case the client sends a termination request, or the connection is terminated
unexpectedly, we set the metadata but never unset it. But it should be fine, since the
backend process is killed just after that, that means we cannot fetch samples for this
backend.

```
bin/ysqlsh -c "create table if not exists a(id int); insert into a select i from generate_series(1, 1000) as i;"

yugabyte=# SELECT
yugabyte-#     SUBSTRING(query, 1, 50) AS query,
yugabyte-#     root_request_id,
yugabyte-#     wait_event_component,
yugabyte-#     wait_event,
yugabyte-#     wait_event_type,
yugabyte-#     COUNT(*)
yugabyte-# FROM
yugabyte-#     yb_active_session_history
yugabyte-# JOIN
yugabyte-#     pg_stat_statements
yugabyte-# ON
yugabyte-#     query_id = queryid
yugabyte-# WHERE
yugabyte-#     sample_time >= current_timestamp - interval '20 minutes'
yugabyte-# GROUP BY
yugabyte-#     query,
yugabyte-#     root_request_id,
yugabyte-#     wait_event_component,
yugabyte-#     wait_event,
yugabyte-#     wait_event_type
yugabyte-# ORDER BY
yugabyte-#     root_request_id,
yugabyte-#     query,
yugabyte-#     wait_event_component,
yugabyte-#     wait_event_type;
                       query                        |           root_request_id            | wait_event_component |        wait_event        | wait_event_type | count
----------------------------------------------------+--------------------------------------+----------------------+--------------------------+-----------------+-------
 create table if not exists a(id int)               | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | TServer              | OnCpu_Active             | Cpu             |     2
 create table if not exists a(id int)               | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | TServer              | YBClient_WaitingOnDocDB  | Network         |    13
 create table if not exists a(id int)               | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | YSQL                 | QueryProcessing          | Cpu             |    11
 create table if not exists a(id int)               | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | YSQL                 | CatalogRead              | Network         |    13
 create table if not exists a(id int)               | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | YSQL                 | StorageFlush             | Network         |     1
 insert into a select i from generate_series($1, $2 | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | TServer              | Raft_ApplyingEdits       | Cpu             |     2
 insert into a select i from generate_series($1, $2 | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | TServer              | OnCpu_Passive            | Cpu             |     2
 insert into a select i from generate_series($1, $2 | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | TServer              | YBClient_LookingUpTablet | Network         |     1
 insert into a select i from generate_series($1, $2 | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | TServer              | YBClient_WaitingOnDocDB  | Network         |     2
 insert into a select i from generate_series($1, $2 | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | YSQL                 | QueryProcessing          | Cpu             |     2
 insert into a select i from generate_series($1, $2 | 0e5eed5a-a324-0ffb-d53d-58b6d01c48a7 | YSQL                 | StorageFlush             | Network         |     3
```
Jira: DB-11338

Test Plan: Jenkins

Reviewers: jason

Reviewed By: jason

Subscribers: hbhanawat, amitanand, yql

Differential Revision: https://phorge.dev.yugabyte.com/D35126
…L Connection Manager enabled

Summary:
When attempting to use startup parameters with connection manager, any parameter values that have spaces in them, for example `transaction_isolation = read committed`, would get parsed incorrectly. Odyssey's code uses any space found in the "value" as the cue to end parsing the value for a certain startup parameter; this patch resolves this by instead searching for the next option as per the syntax of passing multiple pg options.

This patch also modifies TestSessionParameters.java to allow testing of startup parameters with spaces.

Jira: DB-11168

Test Plan:
Jenkins: test regex: .*TestSessionParameters.*

./yb_build.sh --java-test org.yb.ysqlconnmgr.TestSessionParameters

Reviewers: mkumar, nkumar

Reviewed By: mkumar

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34814
…es (yugabyte#22367)

* [DOC-310] Explained the meaning of upgrade delay between nodes

* Update docs/content/preview/yugabyte-platform/manage-deployments/upgrade-software-install.md

---------

Co-authored-by: Aishwarya Chakravarthy <ashchakravarthy@gmail.com>
… and pre-aggregate data inside the DB.

Summary:
Right now we load all the raw data to build TS storage based graph in memory and aggregate in TS code.
This makes PG storage to perform aggregation and we only load results in memory.
We still perform topK filtering and average across nodes line calculation in TS.

Test Plan: Unit tested

Reviewers: rmadhavan

Reviewed By: rmadhavan

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34873
* Update docs

* Update yb-tserver.md

* Update yb-master.md
…tion info

Summary:
Adding `GetXClusterOutboundReplicationGroups` and `GetXClusterOutboundReplicationGroupInfo` RPCs and corresponding `list_xcluster_outbound_replication_groups` and `get_xcluster_outbound_replication_group_info` yb-admin commands.

**Upgrade/Rollback safety:**
Only adding new APIs. All changed are guarded under `enable_xcluster_api_v2`

Fixes yugabyte#22278
Jira: DB-11194

Test Plan:
XClusterDBScopedTest.TestYbAdmin
XClusterOutboundReplicationGroupTest.TestListAPIs

Reviewers: jhe, xCluster, mlillibridge

Reviewed By: mlillibridge

Subscribers: cwang, mlillibridge, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D35011
Summary:
NamespaceReplication is not in use and the tests are flaky. Disabling them for now.

List of tests:
SetupNamespaceReplicationWithBootstrap
SetupNamespaceReplicationWithBootstrapFailToSendSnapshot
SetupNamespaceReplicationWithBootstrapFailCreateProducerSnapshot
SetupNamespaceReplicationWithBootstrapFailRestoreSnapshot
SetupNamespaceReplicationWithBootstrapRequestFailures

Fixes yugabyte#22457
Jira: DB-11376

Test Plan: Jenkins

Reviewers: jhe, slingam, xCluster, yyan

Reviewed By: yyan

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D35205
…nfiguration section for K8s universes. (yugabyte#22285)

* first cut

* added k8s operator page

* added image

* added upgrade instructions

* updated instance configuration for 2024.1

* added screenshot

* minor edits

---------

Co-authored-by: Dwight Hodge <ghodge@yugabyte.com>
Summary:
Allow users to create DR with db scoped replication on YBA.

Users can enable this feature by enabling the `yb.xcluster.db_scoped.enabled` runtime flag. If the `dbScoped` parameter is passed into the the `Create` endpoint and the runtime flag is enabled, a db scoped DR config will be created.

Currently only supports the most basic creation. That is, source and target universe do not have TLS enabled, and bootstrapping is not needed.

Example of create API:
```
curl --location '<platform-url>/api/v1/customers/f33e3c9b-75ab-4c30-80ad-cba85646ea39/dr_configs' \
--header 'X-AUTH-YW-API-TOKEN: <api-token' \
--header 'Content-Type: application/json' \
--data '{
    "name": "dr-config-test",
    "sourceUniverseUUID": "53297075-ec69-4235-aeae-eceb7f3e7208",
    "targetUniverseUUID": "bdb574d2-ae9e-4834-ad93-2f0b5e63229a",
    "dbs": [
        "000033c3000030008000000000000000"
    ],
    "dryRun": false,
    "bootstrapParams": {
        "backupRequestParams": {
            "storageConfigUUID": "<storage_config_uuid>"
        }
    },
    "dbScoped": true
}'
```
A new table called xcluster_namespace_config is added to keep track of the dbs/namespaces that are stored per xcluster config, instead of the old `xcluster_table_config` table. Schema below:
```
                   Table "public.xcluster_namespace_config"
       Column        |         Type          | Collation | Nullable | Default
---------------------+-----------------------+-----------+----------+---------
 config_uuid         | uuid                  |           | not null |
 source_namespace_id | character varying(64) |           | not null |
Indexes:
    "pk_xcluster_namespace_config" PRIMARY KEY, btree (config_uuid, source_namespace_id)
Foreign-key constraints:
    "fk_xcluster_namespace_config_config_uuid" FOREIGN KEY (config_uuid) REFERENCES xcluster_config(uuid) ON UPDATE CASCADE ON DELETE CASCADE
```

Test Plan:
Added UTs

Create a source and target universe with db version >= `2.23.0.0-b247`.

Add some tables to the `yugabyte` db on both source and target universe without any data in it.
```
CREATE TABLE house(
   ID INT PRIMARY KEY     NOT NULL,
   LOCATION           TEXT    NOT NULL
);

CREATE TABLE ROOM(
   ID INT PRIMARY KEY     NOT NULL,
   size           TEXT    NOT NULL
);

create index on house(id);
```

Make a call to the create Dr Config endpoint with `dbScoped` boolean set as true.
Make sure that the task succeeds.

On the source universe master UI xCluster page, we should see (`<master_url>:7000/xcluster`):
{F179208}

On the target universe's master UI xCluster page, we should see (`<master_url>:7000/xcluster`):
{F179209}

Reviewers: #yba-api-review, sanketh, hzare, jmak

Reviewed By: #yba-api-review, hzare

Subscribers: dkumar, sanketh, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34802
Summary:
This diff adds a tserver flag `ysql_yb_default_replica_identity` which decides the default replica identity for a table at the time of creation. This flag has a default value of "CHANGE". The flag is case sensitive and can take one of these four values "FULL", "DEFAULT", "NOTHING", and "CHANGE. If the flag is assigned any value other than these four, it treats it as an invalid input and the table creation fails.
Jira: DB-11234

Test Plan:
./yb_build.sh --java-test 'org.yb.pgsql.TestPgAlterTable#testReplicaIdentity'
./yb_build.sh --cxx-test pg_mini-test --gtest_filter PgMiniTest.AlterTableWithReplicaIdentity

Reviewers: stiwary, asrinivasan, skumar

Reviewed By: stiwary

Subscribers: yql, ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34942
Summary:
On a customer env, we noticed that when an incremental backup time expires, YBA generates a new time, but unfortunately, it was giving old time as during that check, it wasn't expired, we suspect this could be due to clock skewness so we did following 2 improvements:
 - Use the same date variable while comparing and getting new incremental task times.
 - Always add incremental frequency time, if old values exists

Test Plan: Tested manually to see that the incremental backup schedule works as expected.

Reviewers: vkumar, vpatibandla

Reviewed By: vkumar

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D35193
Summary: Modified several upgrade tests to fit into new test schema.

Test Plan: sbt test

Reviewers: nsingh, cwang

Reviewed By: cwang

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34827
… records are added to response

Summary:
At the end of PopulateCDCSDKWriteRecord(), if we have not added any DML records (scenario described below) to the GetChanges response, we want to remove the BEGIN record that was added at the start of this method. To remove this BEGIN record, we were working under the assumption that the last added record in the response would be the BEGIN record and hence simply removing the last record without verifying its op. But, during the 1st GetChanges call, right after the BEGIN record, we add a DDL record. Therefore, the assumption becomes incorrect. Hence, the logic has been fixed now to find out the last added BEGIN record by iterating over the response and then removing it.

One scenario where we do not add a DML record to the response: When a user has updated a table row that does not exist.
Jira: DB-10540

Test Plan:
Jenkins

./yb_build.sh --cxx-test integration-tests_cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestUpdateOnNonExistingEntry
./yb_build.sh --cxx-test cdcsdk_consumption_consistent_changes-test --gtest_filter CDCSDKConsumptionConsistentChangesTest.TestVWALConsumptionWhileUpdatingNonExistingRow

Reviewers: asrinivasan, stiwary, skumar, vkushwaha

Reviewed By: asrinivasan

Subscribers: ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35188
…tel config

Summary: Use exporter UUID instead of export config name in the otel config.

Test Plan: Manually tested and checked the nodes otel collector config file.

Reviewers: amalyshev

Reviewed By: amalyshev

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D35196
Summary:
Currently we show the local calls section in /rpcz by default. But this only makes sense for the tserver end point, not the cql end point.

With this change we only fetch the local-calls only for RpcServers which have ShouldExportLocalCalls as true.
This is set to true only for TServer.
Jira: DB-11367

Test Plan:
Test on a local cluster
```
dev-server-amitanand2:~/code/yugabyte-db [:1f37b8438c|✚ 5]
04:19 $ curl 127.0.0.1:9000/rpcz 2>/dev/null | jq '.local_calls'
{
  "remote_ip": "local calls",
  "state": "OPEN"
}
dev-server-amitanand2:~/code/yugabyte-db [:1f37b8438c|✚ 5]
04:19 $ curl 127.0.0.1:12000/rpcz 2>/dev/null | jq '.local_calls'
null
dev-server-amitanand2:~/code/yugabyte-db [:1f37b8438c|✚ 5]
04:19 $ curl 127.0.0.1:7000/rpcz 2>/dev/null | jq '.local_calls'
null
dev-server-amitanand2:~/code/yugabyte-db [:1f37b8438c|✚ 5]
04:19 $ curl 127.0.0.1:13000/rpcz 2>/dev/null | jq '.local_calls'
null
```

Reviewers: asaha

Reviewed By: asaha

Subscribers: esheng, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D35220
Co-authored-by: Sanketh I <sanketh@yugabyte.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Documentation needed
Projects
Documentation
In progress
Development

Successfully merging this pull request may close these issues.

None yet