Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crimson/common: re-implement do_for_each #48932

Merged
merged 3 commits into from
Nov 24, 2022
Merged

Conversation

aisakaki
Copy link
Contributor

@aisakaki aisakaki commented Nov 17, 2022

When future is available but seastar::need_preempt is true, crimson::do_for_each will become a big recursive function, which might cause stackoverflow.
According to the unittest I added in first commit, this problem can be reproduced.
It seems that crimson::repeat will not meet this situation.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@xxhdx1985126
Copy link
Contributor

/home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/common/errorator.h:92:10: error: no matching function for call to 'do_for_each_impl'
return ::crimson::do_for_each_impl(begin, end, std::move(action));
^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/os/seastore/async_cleaner.cc:1241:19: note: in instantiation of function template specialization 'crimson::do_for_each<crimson::os::seastore::segment_map_tcrimson::os::seastore::segment_info_t::iterator, (lambda at /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/os/seastore/async_cleaner.cc:1244:5)>' requested here
return crimson::do_for_each(
^
/home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/common/errorator.h:72:16: note: candidate template ignored: substitution failure [with Iterator = crimson::os::seastore::segment_map_tcrimson::os::seastore::segment_info_t::iterator, AsyncAction = (lambda at /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/os/seastore/async_cleaner.cc:1244:5)]: no type named 'reference' in 'crimson::os::seastore::segment_map_tcrimson::os::seastore::segment_info_t::iterator'
inline FutureT do_for_each_impl(Iterator begin, Iterator end, AsyncAction action) {
^
1 error generated.

Seems there's a compilation issue

@aisakaki
Copy link
Contributor Author

Seems there's a compilation issue

yes, I found that there is no reference in segment_map_t 's iterator, i am fixing this

Copy link
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to schedule CI as the fix also impacts OSD? @athanatos

src/test/crimson/test_interruptible_future.cc Show resolved Hide resolved
src/crimson/os/seastore/seastore_types.h Outdated Show resolved Hide resolved
src/crimson/common/errorator.h Show resolved Hide resolved
Signed-off-by: Xinyu Huang <xinyu.huang@intel.com>
@aisakaki aisakaki force-pushed the wip-crimson-loop branch 2 times, most recently from 779ee68 to 59805ef Compare November 18, 2022 09:42
@aisakaki
Copy link
Contributor Author

Changed the reference in the second commit to ensure a more accurate reference type

Signed-off-by: Xinyu Huang <xinyu.huang@intel.com>
The current implementation of crimson::do_for_each might meet
stack overflow when future is available but seastar::need_preempt
is true. This new implementation mirror to the seastar::do_for_each
with crimson errorator mechanism will solve this problem.

Fixes: https://tracker.ceph.com/issues/58005.

Signed-off-by: Xinyu Huang <xinyu.huang@intel.com>
@aisakaki
Copy link
Contributor Author

jenkins retest this please

@cyx1231st
Copy link
Member

make check error, seems not related, looking...

INFO  2022-11-23 04:53:25,307 [shard 0] ms - test_preemptive_shutdown():
INFO  2022-11-23 04:53:25,308 [shard 0] ms - [osd.6(server4) v2:127.0.0.1:9034/7] try_bind: done
INFO  2022-11-23 04:53:25,308 [shard 0] ms - [osd.6(server4) v2:127.0.0.1:9034/7] try_bind: done
INFO  2022-11-23 04:53:25,308 [shard 0] ms - [0x61400000a240 osd.7(client4) - >> osd.? v2:127.0.0.1:9034/7] ProtocolV2::start_connect(): peer_addr=v2:127.0.0.1:9034/7, peer_name=osd.?, cc=16474622824966519135 policy(lossy=true, server=false, standby=false, resetcheck=false)
INFO  2022-11-23 04:53:25,308 [shard 0] ms - [0x61400000a240 osd.7(client4) - >> osd.? v2:127.0.0.1:9034/7] write_event: delay ...
INFO  2022-11-23 04:53:25,309 [shard 0] ms - [0x61400000a440 osd.6(server4) v2:127.0.0.1:9034/7 >> unknown.? -@59481] ProtocolV2::start_accept(): target_addr=v2:127.0.0.1:59481/0
INFO  2022-11-23 04:53:25,350 [shard 0] ms - [0x61400000a240 osd.7(client4) -@59481 >> osd.? v2:127.0.0.1:9034/7] peer v2:127.0.0.1:9034/7 says I am v2:127.0.0.1:59481/0 (socket says 127.0.0.1:59481)
INFO  2022-11-23 04:53:25,350 [shard 0] ms - [0x61400000a240 osd.7(client4) 127.0.0.1:0/8@59481 >> osd.? v2:127.0.0.1:9034/7] learned myaddr=127.0.0.1:0/8 (unbound)
INFO  2022-11-23 04:53:25,351 [shard 0] ms - [0x61400000a440 osd.6(server4) v2:127.0.0.1:9034/7 >> osd.? -@59481] UPDATE: peer_type=osd, policy(lossy=true server=true standby=false resetcheck=false)
INFO  2022-11-23 04:53:25,424 [shard 0] ms - client shutdown...
INFO  2022-11-23 04:53:25,424 [shard 0] ms - [0x61400000a240 osd.7(client4) 127.0.0.1:0/8@59481 >> osd.? v2:127.0.0.1:9034/7] closing: reset no, replace no
INFO  2022-11-23 04:53:25,432 [shard 0] ms - [0x61400000a240 osd.7(client4) 127.0.0.1:0/8@59481 >> osd.? v2:127.0.0.1:9034/7] write_event: dropped
INFO  2022-11-23 04:53:25,433 [shard 0] ms - [0x61400000a240 osd.7(client4) 127.0.0.1:0/8@59481 >> osd.? v2:127.0.0.1:9034/7] execute_connecting(): protocol aborted at CLOSING -- std::system_error (error crimson::net:4, read eof)
INFO  2022-11-23 04:53:25,460 [shard 0] ms - [0x61400000a440 osd.6(server4) v2:127.0.0.1:9034/7 >> osd.7 127.0.0.1:0/8@59481] established: gs=1, pgs=1, cs=0, client_cookie=16474622824966519135, server_cookie=0, in_seq=0, out_seq=0, out_q=0
INFO  2022-11-23 04:53:25,461 [shard 0] ms - server shutdown...
INFO  2022-11-23 04:53:25,462 [shard 0] ms - [0x61400000a440 osd.6(server4) v2:127.0.0.1:9034/7 >> osd.7 127.0.0.1:0/8@59481] execute_ready(): fault at READY on lossy channel, going to CLOSING -- std::system_error (error crimson::net:4, read eof)
INFO  2022-11-23 04:53:25,462 [shard 0] ms - [0x61400000a440 osd.6(server4) v2:127.0.0.1:9034/7 >> osd.7 127.0.0.1:0/8@59481] closing: reset yes, replace no
=================================================================
==2857501==ERROR: AddressSanitizer: heap-use-after-free on address 0x618000006ca8 at pc 0x564d178ddead bp 0x7fff38cbf250 sp 0x7fff38cbf248
READ of size 1 at 0x618000006ca8 thread T0
    #0 0x564d178ddeac in crimson::net::Protocol::close(bool, std::optional<std::function<void ()> >) /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/net/Protocol.cc:38:7
    #1 0x564d178c7b8a in crimson::net::Protocol::close_clean(bool)::'lambda'()::operator()() const /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/net/Protocol.h:38:7
......

0x618000006ca8 is located 40 bytes inside of 776-byte region [0x618000006c80,0x618000006f88)
freed by thread T0 here:
    #0 0x564d170cd09d in operator delete(void*) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest-seastar-messenger+0x554e09d) (BuildId: 2c3e5b361a8a7b1e966a93a12b03eebc3c9ba5f3)
    #1 0x564d17951ea1 in crimson::net::ProtocolV2::~ProtocolV2() /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/net/ProtocolV2.cc:169:27
    #2 0x564d178dafd8 in std::default_delete<crimson::net::Protocol>::operator()(crimson::net::Protocol*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
    #3 0x564d178c1ec4 in std::unique_ptr<crimson::net::Protocol, std::default_delete<crimson::net::Protocol> >::~unique_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
    #4 0x564d178ac8a0 in crimson::net::SocketConnection::~SocketConnection() /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/net/SocketConnection.cc:42:40
    #5 0x564d178ac8f8 in crimson::net::SocketConnection::~SocketConnection() /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/net/SocketConnection.cc:42:39
    #6 0x564d17558475 in seastar::shared_ptr<crimson::net::Connection>::~shared_ptr() /home/jenkins-build/build/workspace/ceph-pull-requests/src/seastar/include/seastar/core/shared_ptr.hh:537:13
    #7 0x564d178df988 in crimson::net::Protocol::close(bool, std::optional<std::function<void ()> >)::$_4::~$_4() /home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/net/Protocol.cc:84:23
......

@cyx1231st cyx1231st reopened this Nov 24, 2022
@aisakaki
Copy link
Contributor Author

jenkins test make check

@aisakaki
Copy link
Contributor Author

jenkins retest this please

@tchaikov
Copy link
Contributor

@aisakaki the make check run is now broken. the failures should be fixed by #49040.

@tchaikov
Copy link
Contributor

jenkins test make check

@tchaikov tchaikov merged commit 40955f6 into ceph:main Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants