Skip to content

marcelmay/hfsa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop FSImage Analyzer (HFSA)

Maven Central javadoc

Intro

Hadoop FSImage Analyzer (HFSA) complements the Apache Hadoop 'hadoop-hdfs' tool by providing HDFS fsimage

  • tooling support for summary overview of the HDFS data files and directories of users and groups (answering 'who has how many/big/small files...')
  • a library for fast and partly multithreaded fsimage processing API file-, directory- and symlink aware visitor, derived from Apache HDFS FSImageLoader
  • a helper FSImage file generator for creating synthetic test data

Example usage for library

See FSImageLoaderTest.java for example usage.

The following lines visit all directory-, file- and symlink inodes:

RandomAccessFile file = new RandomAccessFile("src/test/resources/fsi_small.img", "r");

// Load file into memory
FsImageData fsimageData = new FsImageLoader.Builder()
    .parallel().build()
    .load(file);

// Traverse file hierarchy
new FsVisitor.Builder()
    .parallel()
    .visit(fsimageData, new FsVisitor() {
        @Override
        public void onFile(FsImageProto.INodeSection.INode inode, String path) {
            // Do something
            String fileName = ("/".equals(path) ? path : path + '/') + inode.getName().toStringUtf8();
            System.out.println(fileName);
            FsImageProto.INodeSection.INodeFile f = inode.getFile();
            PermissionStatus p = loader.getPermissionStatus(f.getPermission());
            ...
        }
             
        @Override
        public void onDirectory(FsImageProto.INodeSection.INode inode, String path) {
            // Do something
            final String dirName = ("/".equals(path) ? path : path + '/') + inode.getName().toStringUtf8();
            System.out.println("Directory : " + fileName);
            
            FsImageProto.INodeSection.INodeDirectory d = inode.getDirectory();
            PermissionStatus p = loader.getPermissionStatus(d.getPermission());
            ...
        }
             
        @Override
        public void onSymLink(FsImageProto.INodeSection.INode inode, String path) {
            // Do something
        }
    }
);

Requirements

  • JDK 1.8 (11 recommended for running)
  • Hadoop 2.x or 3.x fsimage
    Note: hfsa lib version 1.2+ has Hadoop 3.x dependencies but still works for Hadoop 2.x fsimages
  • Maven 3.9.x (for building from source)

Building

mvn clean install

Roadmap

  • Configurable strategy for fast-but-memory-intensive or slow-but-memory-friendly fsimage loading
  • Report and config options for topk/sorting/selection/...

License

HFSA is released under the Apache 2.0 license.

Copyright 2017-2023 Marcel May and project contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

Contains work derived from Apache Hadoop.