Skip to content

complyue/jdfs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JDFS

Just Data FileSystem - JDFS is a networked userspace filesystem with responsibilities (such as access control ) those beyond upright data availability & consistency, offloaded. Its purpose has a few implications, including:

  • It's highly vulnerable if exposed to untrusted environments. When access must cross trust boundaries, some other means, e.g. SSH tunneling or VPN should be implemented to guard the exposed mountpoints.
  • Files and directories at jdfs host's local filesystem are exposed to jdfc with owner identity mapped, files ownend by the uid/gid running the jdfs process will appear at jdfc as if owned by the uid/gid mounted the JDFS mountpoint, and file creation/reading/writing/deleting all follow this proxy relationship.

Simply deployed alone (1 jdfs <=> n jdfc), JDFS seeks to replace NFS in many HPC scenarios where it sucks .

But the main purpose of JDFS is to contribute data focused, performance-critical parts (i.e. components at various granularity, with jdfs - the service/server, and jdfc - the consumer/client, the most coarse ones) into analytical solutions (e.g. a homegrown array database ), with ease.

In my opinion, what’s going to happen over the next five years is that everyone is going to move from business intelligence to data science, and this data will be a sea change from what I’ll call stupid analytics, to what I’ll call smart analytics, which is correlations, data clustering, predictive modeling, data mining, Bayes classification.

All of these words mean complex analytics. All that stuff is defined on arrays, and none of it is in SQL. So the world will move to smart analytics from stupid analytics, and that’s where we are.

—— Michael Stonebraker 2014

JDFS server is stateful, in contrast to NFS, a jdfs process basically proxies all file operations on behalf of the jdfc:

  • fsync
    • always mapped 1 to 1
  • open/close
    • mapped 1 to 1 from jdfc on Linux
    • forged by osxfuse from jdfc on macOS
  • read/write/mmap
    • forged by all FUSE kernels with writeback cache enabled

Any new connection is treated by the jdfs as a fresh new mount, a fresh server process is started to proxy all operations from the connecting jdfc.

And all server side states, including resource occupation from os perspective, will be naturally freed/released by means of that the jdfs process, just exits, once the underlying JDFS connection is disconnected.