Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store extra files in their own directory #1289

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bfad
Copy link

@bfad bfad commented Nov 13, 2019

Description

Currently when creating HTML documentation, extra files are stored at the web root with "file." prefixed to their name. This PR changes this practice to instead store them in their own directory (named "_file" to prevent conflicts with Ruby objects).

This brings the output more inline with the server which hosts the files under /file/ path. In fact, the server keeps the relative path to the extra file as part of the URL path. (For example, while yard's changelog is found at /file/CHANGELOG.md the getting started guide is at /file/docs/GettingStarted.md.) If this PR is accepted, I would like to work on doing something similar for HTML creation. (This might fix #1277 as all the READMEs would be listed there, and it's easy enough to set custom title attributes to help distinguish.) If all of that is successful, I would like to have a setting that allows for showing the extra file hierarchically similar to how nested Ruby objects are presented with disclosure triangles.

Completed Tasks

  • I have read the Contributing Guide.
  • The pull request is complete (implemented / written).
  • Git commits have been cleaned up (squash WIP / revert commits).
  • I wrote tests and ran bundle exec rake locally (if code is attached to PR).

@coveralls
Copy link

coveralls commented Nov 13, 2019

Coverage Status

Coverage increased (+0.0007%) to 93.439% when pulling 8fa6608 on bfad:extra-files-directory into 89ea314 on lsegal:main.

@bfad
Copy link
Author

bfad commented Nov 14, 2019

I'm not sure why GitHub is showing 5 different commits when it should just be the one.

Sorry, was missing the merge commit for some reason.

When generating HTML, store extra files in their own directory ("_file")
instead of prefixing their names with "file." and writing them to the
web root.
@lsegal
Copy link
Owner

lsegal commented Nov 26, 2019

Hi @bfad,

I'm unclear on the purpose of this change. As is, this is a pretty large breaking change, since this would change URL structure of any statically deployed documentation, which is pretty significant and would need a good justification. I get that there's an attempt to bring static generation more in line with yard server, but the URL structure would never be 100% compatible since yard server removes the suffix .html extensions, so without rewrites this doesn't fully achieve the result (and if URL rewrites are okay, there are rewrite rules that can achieve server parity without code changes).

If the intent is to fix #1277, there are probably other ways to do that without changing the naming scheme on disk. Presumably this could be done with similar custom file metadata attributes to @title, like @filename to rename a file when serializing.

@bfad
Copy link
Author

bfad commented Nov 26, 2019

@lsegal thanks for looking this over.

The ultimate end goal is the ability to have a hierarchical structure for these extra documents. The idea being that I would use globing and the file hierarchy would be replicated, not just at the file-system level, but also with disclosure triangles in the HTML sidebar (just like namespace hierarchies).

This could still be done with something like a @docpath attribute, but if I have already done the work to organize the documents into folders, then it seems onerous to have me spell out that hierarchy for each file in a metadata attribute. At the same time, I really respect your hesitation to break existing URLs. If you don't think this is the right path to go down, I can investigate alternatives.

@lsegal
Copy link
Owner

lsegal commented Nov 26, 2019

Backing up a bit, it seems like we're solving for two separate problems.

The location that YARD uses to serialize files on disk, as well as the expected URL structure by yard server, are actually quite a bit disconnected from the way extra files are rendered in templates, in other words, whether they are displayed flat or in hierarchy, what title is used to display the links, etc.

YARD actually stores the full location to each extra file alongside the file object inside the filename attribute; .name is sanitized and stripped but the filename is untouched. Even though those aren't used in serialization, they could certainly be used in the default (or custom) HTML generation template, if there was value to copying the hierarchy when displaying extra files. I think there is still a question of whether this is useful, or what heuristic YARD should use when doing so (for ex: YARD's use of doc/X shouldn't display as a > doc/ expandable item, there's probably a basic nesting heuristic that could be applied to avoid this), but this is certainly solvable with the information already available in YARD's cache database and only requires template changes (also here).

Now, duplicate/conflicting filenames from subdirs is a separate problem that still exists even if you solve the above. That said, the location that YARD chooses to serialize files to should generally not be a user concern and should typically be hidden from the user, so realistically, there are a few options to disambiguate conflicting filenames that could work without a breaking change or even introducing an attribute:

  1. Perform a duplicate check against extra files prior to serializing and rewrite the .name attribute to use .filename on the dupes (and only the dupes) if any are found. This is backward compatible since the old behavior will be to continue using flattened file paths. This also doesn't change anything in the serializer itself and can be done fully inside of the template. Alternatively you could do this dupe checking inside of YARD::CLI::Yardoc. The serializer should be able to handle this case out of the box.
  2. Optionally you could add an extra attribute to ExtraFileObject (like .alt_path) that is used by .path if present in order to override the serialization location. This is less nice because of all the hoops you'd need to jump through and extra complexity, but it would retain all original state information (.name won't be rewritten out from under you).

Hope this helps / makes sense.

@lsegal
Copy link
Owner

lsegal commented Nov 26, 2019

Sidenote: these can and should be two separable and independent PRs. Specifically, the latter issue could be turned into a PR and accepted to fix the underlying issue of dupes without needing to change how files render in the template.

@lsegal
Copy link
Owner

lsegal commented Nov 26, 2019

Here's a first-attempt patch for supporting dupe files:

diff --git a/lib/yard/cli/yardoc.rb b/lib/yard/cli/yardoc.rb
index 5332196f..8a6804e1 100644
--- a/lib/yard/cli/yardoc.rb
+++ b/lib/yard/cli/yardoc.rb
@@ -222,6 +222,7 @@ module YARD
         @save_yardoc = true
         @has_markup = false
         @fail_on_warning = false
+        @extra_file_dupes = {}

         if defined?(::Encoding) && ::Encoding.respond_to?(:default_external=)
           utf8 = ::Encoding.find('utf-8')
@@ -413,7 +414,17 @@ module YARD
         files.map! {|f| f.include?("*") ? Dir.glob(f) : f }.flatten!
         files.each do |file|
           if extra_file_valid?(file)
-            options.files << CodeObjects::ExtraFileObject.new(file)
+            extra_file = CodeObjects::ExtraFileObject.new(file)
+            options.files << extra_file
+
+            if @extra_file_dupes[extra_file.name]
+              extra_file.attributes[:title] ||= extra_file.filename
+              extra_file.name = extra_file.filename.
+                gsub(/\.[^.]+$/, '').
+                gsub(/\//, '_')
+            else
+              @extra_file_dupes[extra_file.name] = true
+            end
           end
         end
       end

The gsub stuff could be refactored into ExtraFileObject's name= setter to remove the extension, as well as a change to FileSystemSerializer to correctly escape / characters for ExtraFileObjects-- the latter would remove the need for the .attributes[:title] override.

@bfad
Copy link
Author

bfad commented Dec 4, 2019

What if, instead of messing with ExtraFileObject#name, we used ExtraFileObject#path for this? Patch:

diff --git a/lib/yard/code_objects/extra_file_object.rb b/lib/yard/code_objects/extra_file_object.rb
index c0f53a3e..eb9bad6f 100644
--- a/lib/yard/code_objects/extra_file_object.rb
+++ b/lib/yard/code_objects/extra_file_object.rb
@@ -8,6 +8,7 @@ module YARD::CodeObjects
     attr_accessor :filename
     attr_writer :attributes
     attr_accessor :name
+    attr_accessor :path
     # @since 0.8.3
     attr_reader :locale
 
@@ -18,6 +19,7 @@ module YARD::CodeObjects
     def initialize(filename, contents = nil)
       self.filename = filename
       self.name = File.basename(filename).gsub(/\.[^.]+$/, '')
+      self.path = filename.gsub(/\.[^.]+$/, '')
       self.attributes = SymbolHash.new(false)
       @original_contents = contents
       @parsed = false
@@ -25,8 +27,6 @@ module YARD::CodeObjects
       ensure_parsed
     end
 
-    alias path name
-
     def attributes
       ensure_parsed
       @attributes
diff --git a/templates/default/fulldoc/html/setup.rb b/templates/default/fulldoc/html/setup.rb
index 201ba63c..bd0e1954 100644
--- a/templates/default/fulldoc/html/setup.rb
+++ b/templates/default/fulldoc/html/setup.rb
@@ -7,8 +7,11 @@ def init
   return serialize_onefile if options.onefile
   generate_assets
   serialize('_index.html')
+
+  dup_file_names = {}
   options.files.each_with_index do |file, _i|
-    serialize_file(file, file.title)
+    serialize_file(file, dup_file_names[file.name] ? file.path : file.name)
+    dup_file_names[file.name] = true
   end
 
   options.delete(:objects)
@@ -60,10 +63,10 @@ end
 # @param [String] title currently unused
 #
 # @see layout#diskfile
-def serialize_file(file, title = nil) # rubocop:disable Lint/UnusedMethodArgument
+def serialize_file(file, file_name)
   options.object = Registry.root
   options.file = file
-  outfile = 'file.' + file.name + '.html'
+  outfile = 'file.' + file_name.gsub(/\//, '_') + '.html'
 
   serialize_index(options) if file == options.readme
   Templates::Engine.with_serializer(outfile, options.serializer) do

@lsegal lsegal force-pushed the master branch 2 times, most recently from 44c07a8 to 112a890 Compare January 7, 2020 04:04
@lsegal lsegal closed this Jun 20, 2020
@lsegal lsegal reopened this Jun 20, 2020
@lsegal lsegal changed the base branch from master to main June 20, 2020 18:11
@ghost
Copy link

ghost commented May 14, 2023

@lsegal @bfad Hello. If I may ask, when will this PR get merged? I have some documentation for a pet project I'm working on, and the docs have some markdown files that link to each other, but those links are broken; one of the reasons they are broken is because YARD's flat file structure is incompatible with the file/folder structure the browser expects. I'm hoping this gets merged ASAP because I released V1 of my project with broken documentation because I knew that a fix would not come soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extra files with same file name
3 participants