Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(utxo): prioritize electrum connections #1966

Open
wants to merge 114 commits into
base: dev
Choose a base branch
from

Conversation

rozhkovdmitrii
Copy link

@rozhkovdmitrii rozhkovdmitrii commented Sep 11, 2023

This PR introduces a new electrum connection management system that is presented by two defined policies: multiple and selective. In the future the other types of connections are presumably to be managed in the same way.

Setting up conn mng policy

To set up connection management policy the conn_mng_policy should be provided in the mm2 configuration file (MM2.json). This setting is also provided by the komodefi-cli (adex-cli) init command

Example:
{
  "gui": "adex-cli",
  "netid": 7777,
  "rpc_password": "#asdfASD2",
  "passphrase": "please excite exit liquid amused light canal cattle depart execute code leaf",
  "allow_weak_password": false,
  "conn_mng_policy": "multiple"
}

Selective move and electrum activation scheme

Selective mode assumes that the electrum activation method is extended by the priority and timeout_sec settings, which determine in which queue the associated addresses are connected and how long the connection should take, respectively. primary nodes are connected first then secondary ones. These settings are optional and if are not set the default values are applied. secondary priority is set by default.

Example:
    {
      "platform": "UTXO",
      "coin": "DOC",
      "command": {
        "coin": "DOC",
        "method": "electrum",
        "servers": [
          {
            "url": "electrum1.cipig.net:10020",
            "priority": "primary",
            "timeout_sec": 10,
            "contact": [ { "email": "cipi@komodoplatform.com" }, { "discord": "cipi#4502" } ]
          },
          {
            "url": "electrum2.cipig.net:10020",
            "priority": "secondary",
            "timeout_sec": 20,
            "contact": [ { "email": "cipi@komodoplatform.com" }, { "discord": "cipi#4502" } ]
          },
          {
            "url": "electrum3.cipig.net:10020",
            "priority": "primary",
            "timeout_sec": 10,
            "contact": [ { "email": "cipi@komodoplatform.com" }, { "discord": "cipi#4502" } ]
          },
          {
            "url": "electrum1.cipig.net:20020",
            ...
}

For the given activation scheme mm2 has the following addresses queue:

29 11:23:38, coins::utxo::rpc_clients::conn_mng_selective:57] DEBUG Primary electrum nodes to connect: ["electrum3.cipig.net:10020", "electrum1.cipig.net:10020"]
29 11:23:38, coins::utxo::rpc_clients::conn_mng_selective:58] DEBUG Backup electrum nodes to connect: ["electrum2.cipig.net:20020", "electrum1.cipig.net:20020", "electrum2.cipig.net:10020", "electrum3.cipig.net:20020"]

It is important to note that addresses are shuffled before being queued.
If a connection cannot be established with an address or if it is broken for any reason, it is suspended for 30, 60, 120, etc. seconds, after which it is resumed. Resuming assumes that the primary address pushes out the secondary one, if it is currently connected.

Testing selective and multiple conn mng policy

There are several practices that have been used to test expected behavior. These practices are provided in the examples for your reference:

watch sudo lsof -iTCP -cmm2 -a -nP
COMMAND    PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
mm2     104686 rozhkov    6u  IPv4 1237845      0t0  TCP 192.168.1.3:60364->168.119.236.249:38890 (ESTABLISHED)
mm2     104686 rozhkov    7u  IPv4 1242331      0t0  TCP 192.168.1.3:56464->168.119.236.241:38890 (ESTABLISHED)
mm2     104686 rozhkov    8u  IPv4 1233507      0t0  TCP 192.168.1.3:36352->168.119.236.243:38890 (ESTABLISHED)
mm2     104686 rozhkov   13u  IPv4 1243229      0t0  TCP 127.0.0.1:7783 (LISTEN)
mm2     104686 rozhkov   16u  IPv4 1245469      0t0  TCP 192.168.1.3:56868->168.119.236.246:38890 (ESTABLISHED)
mm2     104686 rozhkov   18u  IPv4 1267754      0t0  TCP 192.168.1.3:54878->168.119.236.233:38890 (ESTABLISHED)
mm2     104686 rozhkov   19u  IPv4 1285700      0t0  TCP 192.168.1.3:42452->209.222.101.247:38890 (ESTABLISHED)
mm2     104686 rozhkov   20u  IPv4 1303294      0t0  TCP 192.168.1.3:57460->49.12.127.111:10020 (ESTABLISHED)

In the separate frame

elease/adex-cli enable  DOC
Enabling asset: DOC
coin: DOC
address: RPFGrvJWjSYN4qYvcXsECW1HoHbvQjowZM
balance: 949828.3746221
unspendable_balance: 0
required_confirmations: 1
requires_notarization: No
mature_confirmations: 100

Using the table of established connection it's always possible to know what address mm2 is connected to check if it's connected to expected ones. It is also possible to break the certain connection to check if that is suspended/resumed.

sudo gdb -p 104686 --batch --ex 'call (int) shutdown(20, 0)'
...
$1 = 0

@rozhkovdmitrii rozhkovdmitrii added the in progress Changes will be made from the author label Sep 11, 2023
@rozhkovdmitrii rozhkovdmitrii self-assigned this Sep 11, 2023
@onur-ozkan onur-ozkan changed the title feat(utxo): prioritize electerum nodes connection feat(utxo): prioritize electrum connections Sep 12, 2023
@shamardy shamardy removed the request for review from onur-ozkan September 12, 2023 08:26
Copy link
Member

@onur-ozkan onur-ozkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your effort!

The PR seems to be in its early stages right now. This is not an actual review, I just did a quick review to point out some areas that should not be forgotten.

README.md Outdated Show resolved Hide resolved
mm2src/adex_cli/src/tests/mod.rs Outdated Show resolved Hide resolved
mm2src/coins/lp_coins.rs Outdated Show resolved Hide resolved
mm2src/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
mm2src/coins/utxo/utxo_builder/utxo_coin_builder.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@shamardy shamardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is still in progress, I left some comments that can help you improve the code.
I will do a more thorough review when this is ready for review!

Please also merge with latest dev and start adding doc comments.

pub struct ElectrumClientImpl {
weak_self: Mutex<Option<Weak<ElectrumClientImpl>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very bad design choice, the struct should never reference it self to not create any unstable behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, what unstable behavior can there be if we use Weak reference? The only purpose of this weak_self is to be spawned as an argument of self hosted futures, e.g. server_ping

In the previous implementation, the corresponding code was where it shouldn't be [1, 2]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just introduce a new struct ? e.g

struct ElectrumClientImplWeak(Mutex<Option<Weak<ElectrumClientImpl>>>)

mm2src/adex_cli/src/rpc_data.rs Outdated Show resolved Hide resolved
mm2src/coins/utxo/rpc_clients.rs Outdated Show resolved Hide resolved
mm2src/coins/lp_coins.rs Outdated Show resolved Hide resolved
struct ConnMng(Arc<ConnMngImpl>);

impl ConnMng {
async fn suspend_server(self, address: String) -> Result<(), String> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new implementation, please start using specific error types and MmError

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 2751 to 2754
spawner.spawn(async move {
let state = conn_mng.0.guarded.lock().await;
state.connecting.store(false, AtomicOrdering::Relaxed);
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate on this more? ConnectingStateCtx will be considered dropped before connecting is set to false if the lock was held by other operation for sometime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly should have been more detailed! Thanks for pointing this out! This place is the heart of selective connectivity.

done

Comment on lines 2732 to 2738
let address: String = {
if address.is_some() {
address.map(|internal| internal.to_string())
} else {
guard.active.as_ref().cloned()
}
}?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let address: String = {
if address.is_some() {
address.map(|internal| internal.to_string())
} else {
guard.active.as_ref().cloned()
}
}?;
let address = address.map_or_else(|| guard.active.as_ref().cloned(), |internal| Some(internal.to_string()))?;

mm2src/coins/utxo/rpc_clients.rs Outdated Show resolved Hide resolved
mm2src/coins/utxo/rpc_clients.rs Outdated Show resolved Hide resolved
mm2src/coins/utxo/rpc_clients.rs Outdated Show resolved Hide resolved
still needs refactoring for the tons of warnings there

also selective connection manager is missing
the total time this method might take is at most the timeout decided in the connection settings, which is the time needed to establish the connection and query for the version
the abortable system that created the children abortable systems for the connections were dropped and that resulted in a non functioning connection abortable systems
electrum servers don't like it when the same connection query for the vesion again, they expect us to store the version and never ask about it again
the middle subsystem wasn't used anywhere, thus was dropped and caused all the connection subsystems to get aborted.
the new design doesn't break the coin initilization if the electrum servers fail for whatever reason.
Create the coin and make sure that each server fails version check when queried
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress Changes will be made from the author
Projects
None yet
5 participants