Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add high-level C++17/C++20 conversion functions #144

Open
lemire opened this issue Jul 11, 2022 · 2 comments
Open

Add high-level C++17/C++20 conversion functions #144

lemire opened this issue Jul 11, 2022 · 2 comments

Comments

@lemire
Copy link
Member

lemire commented Jul 11, 2022

Starting with C++11, we have a full range of specialized string classes... E.g., std::u8string, std::u16string... std::u8string_view, and so forth. Strictly speaking they were introduced with C++11 (for std::*string) and C++17 (for std::*string_view) but std::u8string became available with C++20.

We could use std::string, assuming that it is UTF-8, but it might also use other encodings. If we are explicit that we are assuming UTF-8 then it is ok.

What we could do is to provide conversion functions. That might be helpful to some...?

The objective would be to improve quality of life for users who prefer not to mess with pointers.

#include <string>

#ifndef SIMDUTF_CPLUSPLUS
#if defined(_MSVC_LANG) && !defined(__clang__)
#define SIMDUTF_CPLUSPLUS (_MSC_VER == 1900 ? 201103L : _MSVC_LANG)
#else
#define SIMDUTF_CPLUSPLUS __cplusplus
#endif
#endif

#if (SIMDUTF_CPLUSPLUS >= 202002L)
#define SIMDJSON_CPLUSPLUS20 1
#endif

#if (SIMDUTF_CPLUSPLUS >= 201703L)
#define SIMDJSON_CPLUSPLUS17 1
#endif


#if SIMDJSON_CPLUSPLUS17

inline std::u32string to_u32string(const std::u16string_view in) {
  return U"bogus code";
}

#if SIMDJSON_CPLUSPLUS20
inline std::u32string to_u32string(const std::u8string_view in) {
  return U"bogus code";
}
#endif 


inline std::u16string to_u16string(const std::u16string_view in) {
  return u"bogus code";
}

inline std::u16string to_u16string(const std::u32string_view in) {
  return u"bogus code";
}

int main() {
  printf("Support for C++17.\n");
  std::string mystring("hello"); // could be any encoding?
#if SIMDJSON_CPLUSPLUS20
  std::u8string mystringu8(u8"hello");
#endif
  std::u16string mystringu16(u"hello");
  std::u32string mystringu32(U"hello");
#if SIMDJSON_CPLUSPLUS20
  std::u32string mystringu8_as32 = to_u32string(mystringu8);
#endif
  std::u32string mystring_as32 = to_u32string(mystring);

}

#else
int main() { printf("No support for C++20.\n"); }
#endif

References:

https://en.cppreference.com/w/cpp/string/basic_string_view
https://en.cppreference.com/w/cpp/string/basic_string

@lemire
Copy link
Member Author

lemire commented Jul 11, 2022

cc @NicolasJiaxin

@amosnier
Copy link

amosnier commented May 12, 2024

I'm guessing we also want to provide a std::ranges-based API with lazy evaluation. For instance, assuming a compiler that encodes string literals as UTF-8, we want the following to work:

static_assert(std::ranges::equal("$£Иह€한𐍈" | utf8::views::decode, std::array{
    0x00000024, 0x000000a3, 0x00000418, 0x00000939, 0x000020ac, 0x0000d55c, 0x00010348, 0x00000000}));

The previous static_assert also assumes that the whole implementation is constexpr, which would be nice too, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants