Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17528. FsImageValidation: set txid when saving a new image #6828

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

szetszwo
Copy link
Contributor

@szetszwo szetszwo commented May 14, 2024

Description of PR

HDFS-17528

  • When the fsimage is specified as a file and the FsImageValidation tool saves a new image (for removing inaccessible inodes), the txid is not set. Then, the resulted image will have 0 as its txid.
  • When the fsimage is specified as a directory, the txid is set. However, it will get NPE since NameNode metrics is uninitialized (although the metrics is not used by FsImageValidation).

How was this patch tested?

Tested manually

  • before: the output file is fsimage.ckpt_0000000000000000000 (i.e. txid is 0)

2024-05-14 13:37:27,531 [main] INFO namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:save(732)) - Saving image file .../fsimage/current/newFsImage5968764763996132609/current/fsimage.ckpt_0000000000000000000 using no compression
2024-05-14 13:37:30,522 [main] INFO namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:save(736)) - Image file .../fsimage/current/newFsImage5968764763996132609/current/fsimage.ckpt_0000000000000000000 of size 200392059 bytes saved in 2 seconds .

  • after: the output file is fsimage.ckpt_0000000023945925442 with correct txid

2024-05-14 13:38:32,414 [main] INFO namenode.FSImage (FSImage.java:save(1223)) - save fsimage with txid=23945925442 to .../fsimage/current/newFsImage4409944859316006440
2024-05-14 13:38:32,436 [main] INFO namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:save(732)) - Saving image file .../fsimage/current/newFsImage4409944859316006440/current/fsimage.ckpt_0000000023945925442 using no compression
2024-05-14 13:38:35,437 [main] INFO namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:save(736)) - Image file .../fsimage/current/newFsImage4409944859316006440/current/fsimage.ckpt_0000000023945925442 of size 200392062 bytes saved in 3 seconds .

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • [NA] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • [NA] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [NA] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@szetszwo
Copy link
Contributor Author

@vinayakumarb , thanks a lot for reviewing this!

@szetszwo
Copy link
Contributor Author

The jenkins builds keep getting stuck for a day and then fail. Not sure if it is a known problem?

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 6, 2024

The last few lines of the Jenkins build before failure:

[2024-06-03T20:14:42.551Z] cd /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-6828/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs
[2024-06-03T20:14:42.551Z] /usr/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-6828/yetus-m2/hadoop-trunk-patch-0 -Dsurefire.rerunFailingTestsCount=2 -Pparallel-tests -P!shelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.zstd -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-6828/ubuntu-focal/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt 2>&1
[2024-06-05T05:43:40.181Z] wrapper script does not seem to be touching the log file in /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-6828@tmp/durable-db261340
[2024-06-05T05:43:40.181Z] (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
script returned exit code -1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants