Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[H2-1975] Add a TCP_CONNECTIONS table to the INFORMATION_SCHEMA #1990

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mattrpav
Copy link

Provides client socket connection information

@mattrpav
Copy link
Author

Preview of data:
H2-1975-TCP_CONNECTIONS

@mattrpav
Copy link
Author

This table is joinable with the sessions table via the TCP_CONNECTIONS.session_id field

select * from sessions s, tcp_connections t where t.session_id = s.id

@katzyn
Copy link
Contributor

katzyn commented Jun 13, 2019

I wrote another implementation that handles HTTP, TCP, and PG/ODBC connections. Your implementation is TCP-only and it exposes a lot of low-level information. Do you really need it?

@mattrpav
Copy link
Author

mattrpav commented Jun 13, 2019

@katzyn Yes, in my experience, having the details is helpful in troubleshooting troublesome connections.

I agree having a single view for HTTP, TCP , and PG/ODBC connections would be better and use a type column to indicate the differences.

If the other connection types do not include socket details, perhaps the TCP table for just the TCP connections?

@katzyn
Copy link
Contributor

katzyn commented Jun 13, 2019

All of them use TCP sockets. But fields that you introduce here aren't necessary from my point of view. Server IP address is not currently configurable, H2 accepts connections from all addresses. TCP options aren't configurable too.

People who know details of TCP protocol are also usually know that a network analyzer can be used in case of some unusual problem with their networks. For DBA this information is not relevant.

In the current master branch you can see server protocol and port and client address and port for each remote connection in the SESSIONS table. What else do you really need and can't get from other sources (network utilities)?

@mattrpav
Copy link
Author

The benefit is to monitoring tools to collect that information and be able to display it to admins across a list of hosts without having to tie into the network tools which are OS-specific and not always available (containers).

When managing a large number of databases and connections, it is helpful to have H2 report back as the H2 server sees it, regardless of OS type or anything else that may vary from environment to environment (developer laptop, containers, vm's, metal, etc).

Seems minor to include. Are there concerns about including the socket info?

Related-- I was about to update my patch to include the protocol version that client is connecting with. I think that is really valuable, but would make more sense in the SESSIONS table changes you made. Thoughts?

@katzyn
Copy link
Contributor

katzyn commented Jun 13, 2019

A lot of issues with H2 were reported in the mailing list, on GitHub, on StackOverflow, but I newer saw an issue with TCP layer. How exactly this information can help you in the real world? All connections will have similar default flags.

I was about to update my patch to include the protocol version that client is connecting with.

Are you talking about TcpServerThread.clientVersion or about something else? If client is not compatible the connection will not be accepted. If client is compatible who cares about the actual protocol version? Different versions of H2 can use the same protocol version and changes in versions of H2 are usually much more significant than changes in the protocol. Server doesn't know the version of the JDBC client, BTW.

@mattrpav
Copy link
Author

mattrpav commented Jun 13, 2019

@katzyn I agree none of this should be needed.. but users do strange things and systems get out of whack unintentionally. Troubleshooting the distributed aspects are always really complicated, and require a different set of skills to root cause. I believe it brings a lot of value to H2 for it to be easier to troubleshoot the distributed aspects via DBA playbook vs having to be an expert in TCP. My $0.02.

My comment on the TcpServerThread.clientVersion is that can be used to help pinpoint an older client jar file that is in use... a version can be supported and the client-jar buggy at the same time.

@katzyn
Copy link
Contributor

katzyn commented Jun 13, 2019

You should not have any issues with TCP in H2. You may have some issues with IP and underlying layers or with firewall, but H2 has nothing to do with them. You already can see hosts and ports. It's enough to check sources of connections, firewall rules, etc.

You can't get the version of the client jar on the server side of H2. Unfortunately, this information is not provided by the client part of H2. Also the whole client application can be incompatible (but use the same version of H2 library). Distributed applications need own compatibility checks.

It looks like you're simply trying to gather all possible information from all sources. But you need to provide a real-world use case and describe how it can help you.

@mattrpav
Copy link
Author

mattrpav commented Jun 13, 2019

This isn't H2-specific, and no coding on the H2-side can ensure bulletproof TCP handling in all environments. Virtually every mature distributed client-server app that does stateful connections will have these settings available for tuning-- databases, messaging systems, ssh, etc.

Use case: keeping connections from being disconnected due to inactivity (ability to enable TCP keepAlive and verify the value on the H2 server side). This frequently occurs when client is connecting to server through a firewall. Long running query + no keep alive results in firewall closing connections.

Use case: setting socket timeout due to inactivity. Client connected with auto-reconnect should be able to self-heal and connect to a different node in the cluster. SO_TIMEOUT should be configurable (and have the ability to verify the value on the H2 server side) in order for client-side to unblock itself if the socket does not respond. Note: SO_TIMEOUT should be greater than MAX_QUERY_TIMEOUT.

Use case: increasing or decreasing send/recv buffers (and having the ability to verify on h2 server side) to fine tune for really small or really large queries to improve throughput.

EDITS:

  1. Server-side IP address is especially useful for troubleshooting from within containers where the internal IP is not the same as the external IP used to connect.
  2. I would contend client version is very useful information
  3. I'd also recommend keepAlive, soTimeout, etc for cluster connections to keep those healthy and able to re-cycle.

@katzyn
Copy link
Contributor

katzyn commented Jun 14, 2019

H2 uses the infinite timeout, that's why setQueryTimeout() does not change it.

Your firewall is not configurable, or what? Please also note that databases should not usually be accesed over Internet or other complex network.

Please, do the things in the right order. If you really need to touch some network options, add all necessary configuration options for them with test cases and documentation. When they will be configurable, add some way to check their values.

Maybe you want to see a table with defaults in each row right now, but I don't think that we should abuse other people with this garbage. Such table will create a false impression that this information has a meaning and can be modified in some way. But currently it is meaningless and not configurable.

@katzyn
Copy link
Contributor

katzyn commented Jun 14, 2019

Server IP and TCP protocol versions are now visible in the INFORMATION_SCHEMA.SESSIONS table.

If you need the real client version, please fill a separate issue for it. Server doesn't have such information. Hypothetically we can pass this information from the new clients to the new servers, but it should be done very carefully. Compatibility with older servers and older clients must be preserved.

Of course, you can work on additional network settings. But please discuss it first in a new issue or in the mailing list. If you need it but don't have a time or knowledge of H2 for it, feel free to create an issue with detailed description, maybe it will be implemented by somebody else.

@mattrpav
Copy link
Author

Hi @katzyn - I'm back from a vacation. Thank you for adding the Server IP and TCP protocol versions to the SCHEMA_SESSIONS.

I will create a ticket for the enhancement request about being able to specify connect a receive timeouts and post a message to the mailing list for general discussion.

Regarding the changes to the TcpServer and TcpServerThread class included in this PR-- would you accept the modifications to those classes? That would allow us to create user-defined functions to obtain the information without having to have h2-wide impacts.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants