Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mappings supporting shared memory #1202

Closed
wants to merge 13 commits into from

Conversation

dbwiddis
Copy link
Contributor

Resolves #1113. This unifies these types across all *nix APIs, retaining the native type casing per @twall comments in #191.

One consideration is the duplication of the current public definition in the Xattr class. Xattr could be written to extend LibCAPI and those mappings removed, if desired; however, I thought it best to leave them (and the many private declarations) alone for now.

@dbwiddis
Copy link
Contributor Author

The test failure is not related to this change. My test assumes no change in available space on a filesystem but looks like it's failing in a race condition. I'll handle that in a separate PR.
assertEquals(fstore.getUsableSpace(), vfs.f_bavail.longValue() * vfs.f_bsize.longValue());

@dbwiddis
Copy link
Contributor Author

I've completed the mappings and test case for the shm_open and mmap and associated functions. If it's ok I'll add them here and retitle this PR to include all the changes. Not sure if you want different topics segregated into separate PRs for ease of review.

@matthiasblaesing
Copy link
Member

So I finally managed to complete my set of native builds. I found, that the assumption sizeof(off_t) == sizeof(long) mostly holds true. Mostly in this case means:

  • mac OS
  • linux
  • solaris
  • aix

Windows also looks pretty solid, but it could not be that easy: on cygwin32 I found size_of(off_t) == 8 and size_of(long) == 4. But I think we can safely ignore that.

The only thing I don't agree with is the explanation around _FILE_OFFSET_BITS. When you compile a 32 bit programm with _FILE_OFFSET_BITS=64 header magic happens. The function names are mapped to their 64bit counter parts and the datastructures are similarly mapped. So my take on this is, that for example if you resolve mmap on 32 bit, you will always get the function that takes a 32bit off_t. If you compile _FILE_OFFSET_BITS=64, you will get mmap64 under the hood.

See this sample:

#include <stddef.h>
#include <stdio.h>
#include <sys/mman.h>

int main() {
    printf("size_of(off_t): %d\n", sizeof(off_t));
    FILE* fd = fopen("test2.c", "r");
    mmap(NULL, 1, PROT_READ, MAP_SHARED, (int) fd, 0);
    fclose(fd);
}
matthias@athena:~/tmp/datatype-size-check$ nm test2_1  | egrep "fopen|mmap|fclose"
         U fclose@@GLIBC_2.1
         U fopen@@GLIBC_2.1
         U mmap@@GLIBC_2.0
matthias@athena:~/tmp/datatype-size-check$ gcc -D_FILE_OFFSET_BITS=64 -m32 -o test2_1 test2_1.c 
matthias@athena:~/tmp/datatype-size-check$ nm test2_1  | egrep "fopen|mmap|fclose"
         U fclose@@GLIBC_2.1
         U fopen64@@GLIBC_2.1
         U mmap64@@GLIBC_2.1
matthias@athena:~/tmp/datatype-size-check$ 

You can see, the different imports used.

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 3, 2020

I don't disagree with anything you've stated, but in most cases both mmap and mmap64 are just wrappers that eventually call mmap2 (as long as mmap2 exists) after bit-shifting by page size. The mmap call does this:
MMAP_CALL (mmap2, addr, len, prot, flags, fd, offset / (uint32_t) MMAP2_PAGE_UNIT)

And mmap64 does this:
MMAP_CALL (mmap2, addr, len, prot, flags, fd, (off_t) (offset / MMAP2_PAGE_UNIT))

In the case where mmap2 doesn't exist, mmap64 just does a simple cast, checks for overflow which would happen if the 64-bit offset doesn't match the 32-bit cast, and calls mmap with the 32-bit cast value.

So the mmap call results only depend on the bit-width of off_t, and that bit-width is (usually) either a long or explicitly (header magic) mapped to the long long off64_t`.

For fopen64:

The fopen64() function is identical to the fopen() function except that the underlying file descriptor is created with the O_LARGEFILE flag set.

I'm open to improving the javadoc to be more explicit on when a 64-bit value is needed (in addition to the _FILE_OFFSET_BITS define, such as with cygwin), but ultimately it's a type definition that requires the user to detect and handle the use cases. A third-party library may be compiled with that flag set and behave differently than the kernel, so we have to permit the user to define off_t differently for whichever dll they are working with. And by differently, I primarily mean "use 64-bit width when appropriate".

@matthiasblaesing
Copy link
Member

A third-party library may be compiled with that flag set and behave differently than the kernel, so we have to permit the user to define off_t differently for whichever dll they are working with.

I don't get this. From my experiment above there is:

  • mmap - taking an 32bit offset
  • mmap64 - taking an 64bit offset

These two are the names of the functions at runtime. At compile time it is always mmap. The important part is, that for us the name at runtime is relevant.

So why is a third party library relevant here? Is really someone stupid enough to expose off_t outside the libc definitions? I'd like to see a sample.

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 3, 2020

From my experiment above there is:

  • mmap - taking an 32bit offset
  • mmap64 - taking an 64bit offset

Yes, that is correct. And both functions are wrappers that divide offset by page size and call the same function mmap2, which takes an off_t offset (in pages).

So we can map LibC.INSTANCE.mmap() using a correctly sized off_t and it will transparently call the correct function with identical results; we don't need to know the actual implementation. The only concern is having an incorrect bit width for an argument.

Is really someone stupid enough to expose off_t outside the libc definitions? I'd like to see a sample.

One example is zlib. This is less a problem because they expose a zlibCompileFlags() function with a bitmask identifying the off_t size, among other information.

This forum thread indicates libcdio also uses off_t.

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 4, 2020

my set of native builds. ... aix

As a complete aside, I've been searching for a long time for an AIX machine to develop on. Is this a VM/cloud environment?

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 4, 2020

After thinking about it, I believe I now understand why you are pointing out the mmap vs. mmap64 difference: to point out that the presence of the two functions can determine which size off_t to use.

This doesn't work for mmap() itself because even when mmap64() is the correct function to use, mmap() redirects to it, so userspace code only needs to call one function, and change the size of the argument appropriately.

I think, however, that it may be true that mmap64() never exists when the 32-bit call is appropriate. So it would in theory be "safe" in LibC to:

  1. map both mmap and mmap64 functions in JNA, using a NativeLong in the former and long in the latter. Do the same for fopen/fopen64, fseeko/fseeko64, ftello/ftello64, and any other code taking an off_t argument. .
  2. instruct users (or write a wrapper function) to always attempt the 64-bit (e.g., mmap64) first. If it fails with an UnsatisfiedLinkError (the function won't exist) then reassign the offset argument to a 32-bit value and call mmap.

@matthiasblaesing
Copy link
Member

I think we are pretty aligned - I suggest this approach:

  1. I would keep off_t as an integer with a width of Native#LONG_SIZE (I would not document this - we found, that it works on all known sensible platforms, but who knows, but keep it an implementation detail)
  2. Map the non *64 variants with the defined off_t and the *64 variants with java longs.
  3. Introduce Util mappings, that check the size of Native#LONG_SIZE (again implementation detail), if it is 64 bit, wrap the long into the off_t and we are done. If it is 32 bit, check if the *64 bit variant function can be called, if it works, we are done again (whether this works should be cached), else check the value if it exceeds 32 bit. If it fits into 32 bit, cast and call, if not, throw an IllegalArgument exception.

I tried a different approach (attached as maven project):
TestBindingIO.zip

This does not mapp sizte_t or ssize_t or off_t, but it illustrates the approach.

@matthiasblaesing
Copy link
Member

aix

my set of native builds. ... aix

As a complete aside, I've been searching for a long time for an AIX machine to develop on. Is this a VM/cloud environment?

http://www.polarhome.com/

I use the AIX offering to build the JNA native library, but not much else. I found the founder/owner/supporter to be very friendly and the required setup "donation" is totally fair. But it is not some commercial cloud provider, so this might not be what you are looking for.

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 5, 2020

I would keep off_t as an integer with a width of Native#LONG_SIZE

If we are going this route, I would prefer to not map off_t at all, and just use a NativeLong in the expressions that would require it.

if it is 64 bit, wrap the long into the off_t

I'm not sure I understand what that means.

http://www.polarhome.com/

OK, I tried going that route and my PayPal was returned and no response to my email. It may be that I needed to get the initial free account first and then pay for the upgrade so I'll try again. I was skeptical whether that site was still active!

@matthiasblaesing
Copy link
Member

If we expose the the native functions directly (without a java adaption layer), we should not use NativeLong for the type. We don't gain anything, but loose the option to later "fix" wrong assumptions about the size of off_t. If off_t is a type separate from NativeLong we are free to for example catch to large value in the constructor if we need to for example.

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 5, 2020

OK. I'm fine removing details of the assumption but I would like to keep the boolean to force the 64-bit, and the @see references to the posix standards documenting off_t.

I'll see what I can work up with your function example.

@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 8, 2020

Added some more mappings.

Not controversial:

Still open question on handling off_t. I did carefully consider your suggestion and took parts of it but went a slightly different direction. My reasoning:

  • We should provide an off_t with a sensible default that works on C library and offers the ability to override it if the user needs to (e.g., libcdio and zlib functions). I think my mapping meets this goal.
  • We only need to test one of the xxxx64 methods to determine whether off_t should be 64.
  • We don't need alternate versions of most functions. Internal C implementation really only has one function with an alias from the non-64 to the 64 version, which is really only "interesting" in the case of mmap. The fseeko and ftello functions provide the non-o versions that are always 64-bit and the arguments are the same in other cases, requiring no user knowledge of the internals.
  • I did use your wrapper idea in a LibCUtil class for mmap, allowing long arguments for length and offset, although that method seems more of convenience than otherwise.

@dbwiddis dbwiddis changed the title Add size_t, ssize_t, and off_t to LibCAPI Add mappings supporting shared memory Jun 8, 2020
@dbwiddis
Copy link
Contributor Author

dbwiddis commented Jun 8, 2020

OK, ftruncate also uses off_t and I could add a wrapper for it as well, but will await your feedback on the latest submission first.

contrib/platform/src/com/sun/jna/platform/linux/LibC.java Outdated Show resolved Hide resolved
try {
LIBC.getFunction("mmap64", Function.THROW_LAST_ERROR);
// on 32-bit, mmap64 only exists when off_t is 64-bit
size = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks dangerous and wrong.

Please show evidence, that the runtime off_t is 64bit, if mmap64 is present. From my perspective that assumptions makes no sense. A sensible implementation of a libc must be able to run binaries compiled with and without _FILE_OFFSET_BITS=64. At lease I hope, that no one made this really part of the ABI. So if LFS is not part of the ABI, all the magic must happen at the preprocessor level. And that fits the observations.

If you redefine off_t it (at that is what happens when you change the define at compile time), it does not change the runtime type. The whole header magic relies on two things:

  • remap the types involved (off_t is mapped to off64_t)
  • remap the functions called (64bit file access on 32bit means open is mapped to open64, mmap is mapped to mmap64)

But this applies at compile time, the runtime function remains on 32bit to be mmap taking a 32bit off_t and mmap64 taking a 64bit wide value. At least that is my conclusion from reviewing the imported symbols of the binaries created with -D_FILE_OFFSET_BITS=64 and without it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I agree with you. Documenting my deep dive of the code here for future reference:

Consistently across most *nix, __off_t is mapped to long and __off64_t to long long. These are mapped to off_t in mman.h:

#ifndef __off_t_defined
# ifndef __USE_FILE_OFFSET64
typedef __off_t off_t;
# else
typedef __off64_t off_t;
# endif
# define __off_t_defined
#endif

So we only really care about the 32-bit OS case detecting 64-bit off_t, which happens only in the case of __USE_FILE_OFFSET64.

Later, the mmap mapping also uses __USE_FILE_OFFSET64, which actually uses the base type mapping (__off_t or __off64_t) without reference to the userspace off_t:

#ifndef __USE_FILE_OFFSET64
extern void *mmap (void *__addr, size_t __len, int __prot,
                   int __flags, int __fd, __off_t __offset) __THROW;
#else
# ifdef __REDIRECT_NTH
extern void * __REDIRECT_NTH (mmap,
                              (void *__addr, size_t __len, int __prot,
                               int __flags, int __fd, __off64_t __offset),
                              mmap64);
# else
#  define mmap mmap64
# endif
#endif
#ifdef __USE_LARGEFILE64
extern void *mmap64 (void *__addr, size_t __len, int __prot,
                     int __flags, int __fd, __off64_t __offset) __THROW;
#endif

From this latter mapping we see mmap64 only exists if either __USE_FILE_OFFSET64 (corresponding directly to the off_t bit width we care about) or when __USE_LARGEFILE64 is defined.

When __USE_FILE_OFFSET64 is defined we know from above that off_t is 64-bit. The only case of concern would be if __USE_LARGEFILE64 is defined (giving us mmap64) but __USE_FILE_OFFSET64 is not (giving us 32-bit off_t).

Two things can cause __USE_LARGEFILE64 to be defined. See features.h.

One way is the user-controled -D_FILE_OFFSET_BITS = 64 which causes __USE_FILE_OFFSET64 to be defined at line 363:

#if defined _FILE_OFFSET_BITS && _FILE_OFFSET_BITS == 64
# define __USE_FILE_OFFSET64	1
#endif

and later at line 431 we see this define causes __USE_LARGEFILE_64 to be defined.

/* If we don't have __REDIRECT, prototypes will be missing if
   __USE_FILE_OFFSET64 but not __USE_LARGEFILE[64]. */
# if defined __USE_FILE_OFFSET64 && !defined __REDIRECT
#  define __USE_LARGEFILE	1
#  define __USE_LARGEFILE64	1
# endif

So we're safe there. The other way to get __USE_LARGEFILE64 defined is at line 359:

#ifdef _LARGEFILE64_SOURCE
# define __USE_LARGEFILE64	1
#endif

Here, the top of the file documents the LFS functionality:
_LARGEFILE64_SOURCE Additional functionality from LFS for large files.
and it is defined only if the user specifies _GNU_SOURCE (see line 190):

/* If _GNU_SOURCE was defined by the user, turn on all the other features.  */
#ifdef _GNU_SOURCE
... other defines ...
# undef	 _LARGEFILE64_SOURCE
# define _LARGEFILE64_SOURCE	1
... other defines ...
#endif

All the above is documented here, noting:

Macro: _LARGEFILE64_SOURCE
If you define this macro an additional set of functions is made available

and

Macro: _FILE_OFFSET_BITS
This macro determines which file system interface shall be used, one replacing the other. Whereas _LARGEFILE64_SOURCE makes the 64 bit interface available as an additional interface, _FILE_OFFSET_BITS allows the 64 bit interface to replace the old interface.

So, this does indicate a case where my code would fail:

  1. a user did NOT specify -D_FILE_OFFSET_BITS=64 at C library compile time
  2. a user did specify either -D_LARGEFILE64_SOURCE or -D_GNU_SOURCE in C library compilation
  3. that user then chose to call mmap and not mmap64.

And here, we see that -D_GNU_SOURCE is a relatively common compile-time define (it's on by default in g++).

So, I guess we are back to just using Native.LONG_SIZE and providing the user the option to specify 64 bit, and providing the "safe" wrapper functions for mmap and ftruncate that will attempt the 64-bit version with a 64-bit arg, to maximize portability.

@dbwiddis dbwiddis force-pushed the sizet branch 2 times, most recently from 79c1242 to ecb4d66 Compare June 14, 2020 22:36
@dbwiddis
Copy link
Contributor Author

OK, I believe this is ready for final review.

@matthiasblaesing
Copy link
Member

Thank you for taking the deep dive - nice summary! One last nitpick: I would remove off_t. As it is unused and has some ******* (not sure what to place here) semantics I think it is save not to map it.

public void testMmapToShm() throws IOException {
// Get a suitably random filename of the form "/somename"
File f = File.createTempFile("mmapToShm", "test");
String share = "/" + f.getName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a file, but a virtual object (at least that is my reading from the manpage). It is possible to map a file, but that is open with mmap, not shm_open.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I was just leveraging the createTempFile() code to generate a somewhat random filename (e.g., mapToShm192837461234test). The file f would actually briefly exist just long enough to get its name, and then be deleted. What I really need is just a name "/something" for the virtual file. I'm dual-purposing this name as also the "text"/bytes to write/read from the share as part of the test, and I thought it would be more useful for that text to be somewhat randomly generated.

I could just use Random to generate some numbers (or dig into createTempFile() source code to see what it uses) but since this is just test code, I took the quick-and-dirty route.

@dbwiddis
Copy link
Contributor Author

I would remove off_t. As it is unused ...

I think I agree here. Bouncing around to see how everyone else handled it, I got the general impression that everyone writing portable apps avoids using it, and I thought of adding that to the javadoc for it ("hey, here's a mapping but don't use it"). I'll add some more detail to the Util javadoc to explain that its purpose is to work around off_t, in case someone wants to add more mappings in the future they'll know why it exists.

@matthiasblaesing
Copy link
Member

Merged via d471b15.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

size_t support?
2 participants