Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rank table for locality-aware streaming #1505

Merged
merged 28 commits into from
May 31, 2024

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Aug 16, 2023

This is the first logical part of #824 which I am now splitting in two separate PRs:

  1. Writing and reading a rank table that a reader code can use to determine which MPI rank was running on which compute node.
  2. Chunk distribution algorithms based on that information.

The idea is that a writer code can either explicitly use:

series.setMpiRanksMetaInfo(/* myRankInfo = */ "host123_numa_domain_2");

.. or alternatively initialize the Series with JSON/TOML parameter rank_table:

Series write(..., R"(rank_table = "hostname")");

(The second option is useful as it requires no rewriting of existing code.)

A 2D char dataset is then created, encoding the per-rank information line by line, e.g.:

$ bpls ranktable.bp/ -de '/rankTable'
  char     /rankTable                                {4, 8}
    (0,0)    h a l 8 9 9
    (0,6)    9  h a l 8
    (1,4)    9 9 9  h a
    (2,2)    l 8 9 9 9 
    (3,0)    h a l 8 9 9
    (3,6)    9  

A reader code can then access this information via:

>>> import openpmd_api as io
>>> s = io.Series("ranktable.bp", io.Access.read_only)
>>> s.get_mpi_ranks_meta_info(collective=False)
{0: 'hal8999', 1: 'hal8999', 2: 'hal8999', 3: 'hal8999'}

And compare that information against unsigned int WrittenChunkInfo::sourceID as is returned by availableChunks():

struct WrittenChunkInfo : ChunkInfo
{
    unsigned int sourceID = 0; //!< ID of the data source containing the chunk
    ...
};
  • Documentation
  • Maybe for a follow-up: Send the rank table not only in first step

@franzpoeschel franzpoeschel added the api: new additions to the API label Aug 16, 2023
include/openPMD/Series.hpp Fixed Show fixed Hide fixed
include/openPMD/Series.hpp Fixed Show fixed Hide fixed
@franzpoeschel franzpoeschel force-pushed the topic-chunk-table branch 2 times, most recently from f1c65ff to 14d5bf9 Compare August 16, 2023 15:17
@ax3l ax3l self-requested a review August 17, 2023 03:16
@ax3l ax3l added the MPI label Aug 17, 2023
@franzpoeschel franzpoeschel force-pushed the topic-chunk-table branch 7 times, most recently from d728f35 to 4918ff0 Compare August 18, 2023 10:29
Comment on lines 1998 to 2014
Access::READ_WRITE,
R"({"rank_table": "hostname"})");

Check notice

Code scanning / CodeQL

Equality test on floating-point values Note

Equality checks on floating point values can yield unexpected results.
CMakeLists.txt Outdated
# called surrounding the gethostname() function on Windows
# and it needs to be done at client site since the winsocks API is
# initialized statically per process....
target_link_libraries(openPMD PUBLIC ws2_32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more portable (MSYS/MinGW, MSVC, LLVM on Windows) if this was searched for via find_library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes, will add this tomorrow when I get my hands on a Windows computer again.

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

Regarding the ws2_32 lib for gethostname - there is a more portable way to achieve this with standard MPI calls: can MPI_Get_processor_name in chapter 9 of the MPI standard:

https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf

Note that this sometimes appends and sometimes does not append the CPU id as well.

include/openPMD/ChunkInfo.hpp Outdated Show resolved Hide resolved
include/openPMD/ChunkInfo.hpp Show resolved Hide resolved
include/openPMD/Series.hpp Show resolved Hide resolved
include/openPMD/auxiliary/Mpi.hpp Show resolved Hide resolved
include/openPMD/binding/python/Mpi.hpp Outdated Show resolved Hide resolved
@@ -19,6 +19,7 @@
* If not, see <http://www.gnu.org/licenses/>.
*/
#include "openPMD/ChunkInfo.hpp"
#include "openPMD/binding/python/Mpi.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need an openPMD_HAVE_MPI guard?
Please include after Common.hpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.../python/Mpi.hpp has that guard itself, so it's safe to include unguarded.

Comment on lines 1805 to 1844
#ifdef _WIN32
WSADATA wsaData;
WSAStartup(MAKEWORD(2, 0), &wsaData);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very scary and would need extensive user documentation (and inline documentation), no? It essentially changes the user API contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be solved (a bit) by above suggestion of splitting enum class Method{HOSTNAME} into enum class Method{POSIX_HOSTNAME, WINSOCKS_HOSTNAME, MPI_PROCESSOR_NAME}, this way making the use of Winsocks explicit. Documentation still needed obviously, yeah.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed support for Winsocks-Hostnames entirely, and further renamed the identifiers to posix_hostname and mpi_processor_name, so that people know to initialize MPI before using the latter.

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Oct 16, 2023

Regarding the ws2_32 lib for gethostname - there is a more portable way to achieve this with standard MPI calls: can MPI_Get_processor_name in chapter 9 of the MPI standard:

https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf

Note that this sometimes appends and sometimes does not append the CPU id as well.

Using the MPI call is not ideal either, since MPI is not always available and the call suffers from exactly the same trouble:

*** The MPI_Get_processor_name() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.

This is in practise a lesser trouble since most applications that use MPI also have it initialized, but I think the better solution is to make the used implementation explicit:
I was intending for this call to have multiple implementations anyway:

// include/openPMD/ChunkInfo.hpp
namespace host_info
{
    enum class Method
    {
        HOSTNAME
    };

    std::string byMethod(Method);

#if openPMD_HAVE_MPI
    chunk_assignment::RankMeta byMethodCollective(MPI_Comm, Method);
#endif

    std::string hostname();
} // namespace host_info
// src/Series.cpp
                if (viaJson.value == "hostname")
                {
                    return host_info::hostname();
                }
                else
                {
                    throw error::WrongAPIUsage(
                        "[Series] Wrong value for JSON option 'rank_table': '" +
                        viaJson.value + "'.");
                };

The thought behind this is: In some setups, you want writer and reader to match by hostname, in other cases by NUMA node, in other cases by CPU id, you might even want to group multiple hosts into one group. Since the chunk distribution algorithms in #824 use this rank table as a basis for distributing chunks, this keeps that choice flexible for the user.

My suggestion hence: Split this into POSIX_HOSTNAME, WINSOCKS_HOSTNAME, MPI_HOSTNAME. Users would then need to inquire which options are available on their systems. The enum class Method would always contain all options without any #ifdefs, and we could add a call host_info::method_available(Method) -> bool.

This way, users would explicitly need to state that they want to use the Winsocks implementation, along with all implications that that has.

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@ax3l ax3l merged commit 0baf09f into openPMD:dev May 31, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: new additions to the API MPI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants