Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_job_status fails if it connects to backup master #665

Open
kranskydog opened this issue Nov 1, 2023 · 10 comments
Open

get_job_status fails if it connects to backup master #665

kranskydog opened this issue Nov 1, 2023 · 10 comments

Comments

@kranskydog
Copy link

Summary

the get_job_status API fails if it happens to hit the backup master in multi-server config . This does not happen with other API calls

Steps to reproduce the problem

  1. create multi server setup with primary and backup masters
  2. call get_job_status API to primary master - works
  3. call get_job_status API to backup master - fails with

image

Your Setup

Virtualbox
Cronicle 0.9.38
2 master servers (primary, backup)- access via round-robin DNS (Oracle SCAN IPs) to virtual hostname
conf/config.json has "web_direct_connect": true,
2 other worker servers
can connect to web console via virtual hostname and everything works as expected. Can use other APIs against both master nodes and they work correctly
ie
image
image

Operating system and version?

[root@orcl01 ~]# cat /etc/oracle-release
Oracle Linux Server release 7.9
[root@orcl01 ~]# uname -a
Linux orcl01.example.com 5.4.17-2136.324.5.3.el7uek.x86_64 #2 SMP Tue Oct 10 12:44:19 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Node.js version?

v16.20.2

Cronicle software version?

0.9.38

Are you using a multi-server setup, or just a single server?

Multi

Are you using the filesystem as back-end storage, or S3/Couchbase?

filesystem (cluster)

Can you reproduce the crash consistently?

yes

Log Excerpts

Can't see anything specific

@kranskydog kranskydog changed the title get_job_status fails if it connects to backup primary get_job_status fails if it connects to backup master Nov 1, 2023
@jhuckaby
Copy link
Owner

jhuckaby commented Nov 1, 2023

Okay, so, here is the thing. The get_job_status API is actually working as designed. This API only works on the master node. If you hit a backup node, it returns a HTTP 302 redirect over to the master. This is explained in the docs here:

https://github.com/jhuckaby/Cronicle/blob/master/docs/APIReference.md#redirects

I cannot explain why you are seeing that weird "protocol violation" error, or where that is even coming from. Some kind of proxy server you have in the middle, which isn't expecting a HTTP 302? Dunno.

Anyway, here is the thing. The get_history API, which you cite as an example of something working correctly, is actually not 😝 . That API is failing to check if the current server is master before running, which is a bug.

I will fix that.

@kranskydog
Copy link
Author

Hmmmm
[apache@apchop01 ~]$ wget "http://orcl02.example.com:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02"--2023-11-02 10:49:07-- http://orcl02.example.com:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02
Resolving orcl02.example.com (orcl02.example.com)... 192.168.56.55
Connecting to orcl02.example.com (orcl02.example.com)|192.168.56.55|:3012... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://::ffff:192.168.56.50:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02 [following]
http://::ffff:192.168.56.50:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02: Invalid host name.

IPV6?

@kranskydog
Copy link
Author

[apache@apchop01 ~]$ curl -v -L "http://orcl02.example.com:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02"

  • Uses proxy env variable no_proxy == 'example.com'
  • Trying 192.168.56.55:3012...
  • Connected to orcl02.example.com (192.168.56.55) port 3012 (#0)

GET /api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02 HTTP/1.1
Host: orcl02.example.com:3012
User-Agent: curl/7.76.1
Accept: /

  • Mark bundle as not supporting multiuse
    < HTTP/1.1 302 Found
    < Location: http://::ffff:192.168.56.50:3012/api/app/get_event/v1/?api_key=a44e89551e0232b8e7aab002147c357e&id=elo3mvp8h02
    < Content-Type: application/json
    < Access-Control-Allow-Origin: *
    < Server: Cronicle 1.0
    < Content-Length: 90
    < Date: Thu, 02 Nov 2023 00:52:09 GMT
    < Connection: keep-alive
    < Keep-Alive: timeout=5
    <
  • Ignoring the response-body
  • Connection #0 to host orcl02.example.com left intact
    curl: (3) URL using bad/illegal format or missing URL

@jhuckaby
Copy link
Owner

jhuckaby commented Nov 2, 2023

Okay, that is really bizarre. Your backup server thinks that the master server's IP address is ::ffff:192.168.56.50. I've never seen that before.

What does your server data look like? Try:

/opt/cronicle/bin/storage-cli.js list_get global/servers

Are the IPs munged in there as well? I'm still trying to fathom how this could possibly have happened.

@kranskydog
Copy link
Author

[root@orcl02 cronicle]# /opt/cronicle/bin/storage-cli.js list_get global/servers
Got 4 items.
Items from list: global/servers: [
{
"hostname": "orcl02.example.com",
"ip": "192.168.56.55"
},
{
"hostname": "orcl01.example.com",
"ip": "192.168.56.50"
},
{
"hostname": "orclxe.example.com",
"ip": "192.168.56.25"
},
{
"hostname": "apchop01.example.com",
"ip": "192.168.56.30"
}
]

@jhuckaby
Copy link
Owner

jhuckaby commented Nov 2, 2023

Okay thanks, all normal there. I'll have to dig into this when I have some time. That is really a weird bug.

@kranskydog
Copy link
Author

OTOH

[root@orcl02 cronicle]# netstat -anp | grep Cronicle
tcp6 0 0 :::3012 :::* LISTEN 772/Cronicle Server
tcp6 0 0 192.168.56.55:3012 192.168.56.50:27976 ESTABLISHED 772/Cronicle Server
udp 0 0 0.0.0.0:3014 0.0.0.0:* 772/Cronicle Server

So, it seems because Cronicle is bound to an IPV6 address, anything it gets is going to come from an IPV6 address, so It thinks everything needs to be an IPV6 address
https://nodejs.org/dist/latest-v4.x/docs/api/http.html#http_server_listen_port_hostname_backlog_callback
image

@kranskydog
Copy link
Author

setting
"server_comm_use_hostnames": true,
"web_socket_use_hostnames": true,
helps

@kranskydog
Copy link
Author

@jhuckaby
Copy link
Owner

jhuckaby commented Nov 2, 2023

Okay, thank you for all this info. I'll dig in as soon as I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants