Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reporters #129

Merged
merged 32 commits into from Jul 19, 2021
Merged

Add support for reporters #129

merged 32 commits into from Jul 19, 2021

Conversation

dwightjack
Copy link
Contributor

@dwightjack dwightjack commented Mar 18, 2021

This PR adds support for Pa11y compatible reporters into Pa11y CI.

Why

Pa11y CI reports omit the violations for URLs below the error threshold.
In my project, I want to log those violations as well. Instead of modifying the core behavior, I think it's a good idea to add reporters support so that users can use their own strategies to transform, store and analyze the results.

Reporters use the same interface as pa11y reporters with some additions:

  • Every reporter method receives an additional report argument. This object is an instance of a JavaScript Map that can be used to initialize and collect data across each tested URL.
  • You can define a beforeAll(urls, report) and afterAll(urls, report) optional methods called respectively at the beginning and at the very end of the process with the following arguments:
    • urls: the URLs array defined in your config
    • report: the report object
  • The results() method receives a third option argument with the following properties:
    • config: the current URL configuration object
    • url: the current URL under test
    • urls: the URLs array defined in your config

Changes

  • added a --reporter CLI option to define a single reporter
  • added a defaults.reporters configuration array to define multiple reporters
  • when using a reporter, console output is redirected to stderr so that we can still use output redirection (pa11y-ci ... > my-file.json)

@dwightjack
Copy link
Contributor Author

After some inputs from my colleagues, I updated the PR and the description to reflect the changes (c5a67c5)

README.md Outdated Show resolved Hide resolved
Copy link
Member

@joeyciechanowicz joeyciechanowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! Great idea to add IMO.

A slight change to how promises are returned and this is 💯

README.md Outdated Show resolved Hide resolved
lib/helpers/resolver.js Outdated Show resolved Hide resolved
lib/helpers/reporter.js Outdated Show resolved Hide resolved
lib/pa11y-ci.js Outdated Show resolved Hide resolved
lib/helpers/reporter.js Outdated Show resolved Hide resolved
dwightjack and others added 3 commits March 22, 2021 11:47
Co-authored-by: Joey Ciechanowicz <1148637+joeyciechanowicz@users.noreply.github.com>
@aarongoldenthal
Copy link
Contributor

This is a great proposal, so after taking a look at the details I've been thinking about how I'd adapt pa11y-ci-reporter-html to leverage this interface (instead of just post-processing the JSON as it currently does), and in the process had a few thoughts and suggestions:

  • The answer to this one may depend on whether this is intended to leverage pa11y compatible reporters or represent a new unique interface for pa11y-ci reporters. From my experience, I think the differences dealing with potentially many URLs being scanned make a unique interface valuable (e.g. I know some people use pa11y-ci to scan thousands of pages in individual runs). To that end, I'd propose that the responsibility for output be left up to reporters and pa11y-ci not output to the console.
    • If people want the console output, they could simply add the applicable CLI reporter (a new pa11y-ci CLI reporter). From the work I've done for Replace chalk with kleur #121, and pa11y-ci-reporter-html, it's pretty straighforward to leverage existing pa11y reporters for a lot of the work (in these cases for CLI and HTML). Otherwise reporter could save their own data as required.
    • This is definitely different than the pa11y-reporter-* philosophy, but may make things cleaner for some of the console output questions already asked, and managing output for multiple reports. For example, in the proposed design, and admittedly I spent limited time going through the code and haven't specifically tested, it wasn't immediately obvious how multiple reporters outputting to the console would be managed (I may have missed it).
  • In the interest of separation of concerns, I'd consider leaving state management to the individual reporters, and remove the report object passed with each reporter call.
  • If the complete array of URLs is passed to beforeAll the reporter has access to it initially, so does it need to be passed with each results call? It doesn't hurt, but I couldn't see an obvious use if it was already passed at the start.
  • Could afterAll be updated to include the full JSON report from pa11y-ci? That would allow reporters to simply post-process the data if that was easier.

@joeyciechanowicz I thought I had read somewhere here or on Slack that the master plan was to integrate the reporters into pa11y, and integrate the pa11y-ci functionality into pa11y, so just wondering what that means for this PR (and maybe others)?

@dwightjack
Copy link
Contributor Author

@aarongoldenthal thanks for your feedback!

  • The answer to this one may depend on whether this is intended to leverage pa11y compatible reporters or represent a new unique interface for pa11y-ci reporters. From my experience, I think the differences dealing with potentially many URLs being scanned make a unique interface valuable (e.g. I know some people use pa11y-ci to scan thousands of pages in individual runs). To that end, I'd propose that the responsibility for output be left up to reporters and pa11y-ci not output to the console.
    • If people want the console output, they could simply add the applicable CLI reporter (a new pa11y-ci CLI reporter). From the work I've done for Replace chalk with kleur #121, and pa11y-ci-reporter-html, it's pretty straighforward to leverage existing pa11y reporters for a lot of the work (in these cases for CLI and HTML). Otherwise reporter could save their own data as required.
    • This is definitely different than the pa11y-reporter-* philosophy, but may make things cleaner for some of the console output questions already asked, and managing output for multiple reports. For example, in the proposed design, and admittedly I spent limited time going through the code and haven't specifically tested, it wasn't immediately obvious how multiple reporters outputting to the console would be managed (I may have missed it).

My initial idea was to re-use pa11y reporters interface as much as possible for consistency and to introduce fewer changes, but I understand your point.

To preserve the current behavior we can have a default reporter which outputs to the CLI in the same way pa11y-ci does now (if I am correct is what you already proposed in #121). The user can then change the reporters to say, JSON to just output to a file or JSON + HTML + CLI to output both JSON and HTML as file while still have the CLI output. I believe this is how it works in mocha (or maybe jest).

  • In the interest of separation of concerns, I'd consider leaving state management to the individual reporters, and remove the report object passed with each reporter call.

I added the reporter map so that it can be used like a sort of accumulator that is unique for every reporter instance. Since reporters are basically static modules, and supposing that in the future we introduce reporters options, this looked as a good solution to avoid sharing the same accumulator in case the same reporter is used multiple times.

Another solution, if we remove the current behavior (ie: if reporters return something log it to the console), might be to use beforeAll as a factory function returning the accumulator. In this way each reporter can set its initial state independently:

function beforeAll() {
  // do something...

 // this is then passed to results, afterAll, ... 
 return new Map()
}
  • If the complete array of URLs is passed to beforeAll the reporter has access to it initially, so does it need to be passed with each results call? It doesn't hurt, but I couldn't see an obvious use if it was already passed at the start.

In some of my experiments I used those URLs as unique key on a weakMap used as accumulator. But I get they might not be so useful.

  • Could afterAll be updated to include the full JSON report from pa11y-ci? That would allow reporters to simply post-process the data if that was easier.

You're right. It makes sense.

@aarongoldenthal
Copy link
Contributor

I had some time to throw together a quick example to test and had a few additional comments:

  • The original pa11y reporters implement an error method that passes the execution error if one occurs. I'd look at adding this so action can be taken if required on error. I suggested the places I added it, but didn't have a chance to look at test updates.
  • The reporter begin function is missing from the "Write a custom reporter" section of the README.
  • The pa11y results argument passed to the results function (I know that came from pa11y and the pa11y reporters, but would love to eliminate that duplication) contains the URL being analyzed in results.pageUrl, so wonder if that could be eliminated as a separate property of the third argument.
  • After seeing it in action I'll reiterate my thoughts on letting the reporters control their own output rather than writing their output to the console. Running will multiple reporters and multiple URLs does tend to mix everything in the console output and it's really not valid for any format except text. I think the proposal to keep the CLI output as it is now is probably the right answer, then a non-breaking change, and let any defined reporters handle their own output.

In looking at this I put together a quick example using pa11y-reporter-html to write an HTML report file for each URL (obviously not production-ready, but a quick proof of concept). In working with it I quickly wanted to be able to pass reporter-specific configuration, which I think should be considered (ideally I think of something like the jest reporter configuration). It could also be a growth item as pa11y-ci currently doesn't limit "extra" configuration parameters and passes them as shown in the example below.

Configuration file:

{
    "defaults": {
        "reporters": [
            "reporters/reporter-test.js"
        ],
        "reporterConfig": {
            "reporter-test": {
                "savePageReport": true,
                "saveSummaryReport": true
            }
        }
    },
    "urls": [
        "https://weather.com/",
        "https://domain.does.not.exist/"
    ]
}

reporters/reporter-test.js

'use strict';

const filenamifyUrl = require('filenamify-url');
const htmlReporter = require('pa11y-reporter-html');

const fs = require('fs');
const path = require('path');

const report = module.exports = {};

report.results = async (results, _, {config}) => {
	// If configuration defined, save report based on setting (default: true)
	if (config.reporterConfig &&
		config.reporterConfig['reporter-test'] &&
		!config.reporterConfig['reporter-test'].savePageReport) {
		return;
	}
	const fileName = path.join('reports', `${filenamifyUrl(results.pageUrl)}.html`);
	const htmlReport = await htmlReporter.results(results);
	fs.writeFileSync(fileName, htmlReport);
};

report.error = (error, _, {url}) => {
	// Process error, but probably not like this, instead log error to summary results
	console.error(`Error processing '${url}': ${error.message}`);
};

@dwightjack
Copy link
Contributor Author

@aarongoldenthal thanks for your feedback. Sorry I wasn't able to reply earlier.

I agree with your suggestions and will update the PR in the next few days.

I have one question about a topic you shared earlier:

In the interest of separation of concerns, I'd consider leaving state management to the individual reporters, and remove the report object passed with each reporter call.

A possible scenario could be that one reporter is used multiple times with different options. Since reporters are module functions how could we manage different instances of the reporter object?

Maybe we could add a createReport optional method returning the report initial state and attaching it to the reporter instance we generate in the buildReporter factory function?

@aarongoldenthal
Copy link
Contributor

@dwightjack That's a good question, I hadn't considered the case where one reporter was used multiple times. I assume that would only be in the case where there were specific reporter options specific to each instance. My initial thought is that the limitation that we hit is that require caches the results, so when calling a second time the cached result, i.e. the first instance, is returned. But, that cached value can be invalidated, causing require to run again and create a new instance.

I ran a quick test with the changes below against a local file and that appeared to work, although I'll admit it probably deservers a little more thought.

module.exports = function resolveReporters(reporters) {
	return [].concat(reporters).map(reporter => {
		if (typeof reporter !== 'string') {
			return undefined;
		}
		try {
			delete require.cache[require.resolve(reporter)];
			return require(reporter);
		} catch (_) {
			const localModule = path.isAbsolute(reporter) ?
				reporter : path.resolve(process.cwd(), reporter);
			if (fs.existsSync(localModule)) {
				delete require.cache[require.resolve(localModule)];
				return require(localModule);
			}
			console.error(`Unable to load reporter "${reporter}"`);
			return undefined;
		}
	}).filter(Boolean).map(reporterModule => {
		return buildReporter(reporterModule);
	});
};

@dwightjack
Copy link
Contributor Author

@aarongoldenthal I didn't think about that solution. We could use import-fresh for the task, which should be a little bit more optimized.

dwightjack and others added 3 commits April 15, 2021 16:13
Co-authored-by: Aaron Goldenthal <aarongoldenthal@users.noreply.github.com>
Co-authored-by: Aaron Goldenthal <aarongoldenthal@users.noreply.github.com>
@masi
Copy link

masi commented Apr 15, 2021

A reporter option would be awseome!

@dwightjack
Copy link
Contributor Author

@aarongoldenthal @joeyciechanowicz Sorry for the delay in updating this PR, I've pushed the changes we discussed earlier (hope I didn't miss any point):

  • extract all logs to a dedicated reporter under ./libs/reporters/cli which is loaded by default.
  • updated the reporters' methods signature (refer to the updated "Write a custom reporter" section in the README for details)
  • removed the buildReporters logic. If a reporter wants to log something to the console it should implement its own logic (this will make pa11y-ci reporters behave differently from pa11y reporters).

For the scenario with the same reporter used multiple times (and with custom options), I thought that maybe exporting as default export a factory function would have been the more intuitive option (instead of using import-fresh). In this way, give the following 'file-reporter' reporter module:

module.exports = (options) => {

  const filename = options.filename

  return {
    afterAll(report) {
      fs.writeFile(filename, ...);
    }
  }
}

we could configure it to write two files:

// configuration file
{
    "defaults": {
        "reporters": [
            "pa11y-ci/libs/reporters/cli", // preserve the CLI output
            ["file-reporter", { "fileName": "./my-report.json" }],
            ["file-reporter", { "fileName": "./my-other-report.json" }]
        ]
    },
    "urls": [
        ...
    ]
}

The array syntax is the same used by Jest, so it should be enough familiar to developers.

@joeyciechanowicz
Copy link
Member

I thought I had read somewhere here or on Slack that the master plan was to integrate the reporters into pa11y, and integrate the pa11y-ci functionality into pa11y, so just wondering what that means for this PR (and maybe others)?

@aarongoldenthal Apologies for the slow reply. Yes, the aim is to merge the reporters into pa11y. This is so we we can reduce the number or reporters and thus the overhead in releasing future versions. I'm not sure if pa11y-ci will be merged as well. pa11y will continue to support reporters in the same way it does currently, just we'll have some special paths for the internal reporters. This work is 100% useful, as provides a way of handling specialised reporter for CI (I'm dreaming of a Grafana one 😁 ).

exporting as default export a factory function would have been the more intuitive option

That gets my vote!

Regarding the issue of console output, I think that pa11y-ci should not log anything and it's up to a console-reporter. Though adding a debug flag to it which logged runtime information would useful (which would then align with pa11y).

@aarongoldenthal
Copy link
Contributor

@dwightjack This is great. After using it for a week and throwing together a couple of example reporters it's really straightforward, and I agree the reporter configuration option is very familiar and easy to work with. I did have some other observations (which really are intended not to be critical but to show how useful this is). I hoped to have some code for some of these, but got sidetracked working #132 for GitHub Actions.

JSON

With the reporter interface it seems like the JSON option could/should be converted to a reporter. At the moment you can get yourself into some messy output with reporters and --json specified since they're treated somewhat separately, but the JSON is written to the console. If everyone agrees that's the right answer I'm happy to help with that (or not). One of the example reporters I created was for JSON, and with no configuration outputs to the console like the current implementation, but there's a fileName option to save directly to a file.

The other questions with that is what to do with the -j/--json CLI option. If it were me I'd remove it and only allow it to be specified as a reporter, but I'm not sure if there are concerns about a breaking change (at the moment with the defaults it doesn't seem like it's breaking, but maybe I missed something). The option could always be left and just set json as the reporter.

Built-In Modules

Should there be an abbreviated name syntax for the included reporters (e.g. cli, json)? Unless I missed it, it looks like the only way to reference them is to point to the file path directly. @joeyciechanowicz, I know you were looking at integrating the reporters into pa11y, so I wasn't sure if this scenario had already been considered and there was a plan (it seems like that approach should be consistent).

Tests

I didn't see an issue running multiple reporters (and at one point I was running 3, two of which were instances of the same reporter with different configuration), but I didn't see a test to check that all reporters are called if multiple are specified, which seemed valuable.

Reporter Interface

The pa11y reporter interface include a semver formatted supports property to validate compatibility, which got me thinking about it here. I can see pros and cons, especially if pa11y reporters are used here, but thought I'd throw the idea out.

Default Configuration

In digging through a lot of the pa11y-ci code earlier for the other issue, it seems like there's some duplication in setting defaults between bin/pa11y-ci.js and lib/pa11y-ci.js/lib/helpers/defaults.js that could be consolidated (nothing broken).

@dwightjack
Copy link
Contributor Author

I add this comment for reference. @aarongoldenthal implemented most of the ideas from the previous comment ( #129 (comment)) in dwightjack#1.

To reply to some of the remaining points:

Tests

I didn't see an issue running multiple reporters (and at one point I was running 3, two of which were instances of the same reporter with different configuration), but I didn't see a test to check that all reporters are called if multiple are specified, which seemed valuable.

I'm going to implement those tests in the next few days. I am not sure if they would fit better as integration of unit tests. I will investigate that.

Reporter Interface

The pa11y reporter interface include a semver formatted supports property to validate compatibility, which got me thinking about it here. I can see pros and cons, especially if pa11y reporters are used here, but thought I'd throw the idea out.

Adding a new property is not a problem. I am wondering if it is really useful since we can use peerDependency in package.json for that. 🤔

in Default Configuration

In digging through a lot of the pa11y-ci code earlier for the other issue, it seems like there's some duplication in setting defaults between bin/pa11y-ci.js and lib/pa11y-ci.js/lib/helpers/defaults.js that could be consolidated (nothing broken).

Agree. Navigating the defaults and how they are merged and used got me confused at first. I will review that now that I am a bit more confident with the codebase.

@dwightjack
Copy link
Contributor Author

dwightjack commented May 25, 2021

@aarongoldenthal I added both unit and integration tests for reporters. (1d7010b 460c008) I also updated the json reporter to ensure that the folder where the file get stored exists and to resolve the fileName parameter to cwd (d7a2632). For this change I used fs.mkdirSync(dirPath, {recursive: true}); with the recursive option not being supported in Node 8. Anyway I am not sure we should add more code or libraries to support a version of Node that reached End of Life more than 1.5 years ago.

In the end, I didn't refactor the configuration resolution. Maybe it's not so readable but I cannot see a better pattern to make it work with the two usage scenarios (CLI and programmatic) of the library 🤔 .

@aarongoldenthal
Copy link
Contributor

@dwightjack I made a couple of comments, but overall it looks really good to me. I looked again at the configuration and agree it seems like the best way to handle both scenarios (although it did lead to me comment in the README). I'm also a supporter of deprecating old Node versions, so I wouldn't be worried about Node 8 support (and assuming pally-ci follows the just-released pa11y v6 compatibility, it will be Node 12+).

The only question that still lingers in my mind is potentially dropping the --json CLI argument. I can see both sides, but my vote would be to remove because it now seems inconsistent with specifying other output/reporting. If this PR makes the next pa11y-ci release, and assuming it will update to pa11y v6, then it'll be a major release and a good point to do it.

dwightjack and others added 2 commits June 2, 2021 16:12
Co-authored-by: Aaron Goldenthal <aarongoldenthal@users.noreply.github.com>
Co-authored-by: Aaron Goldenthal <aarongoldenthal@users.noreply.github.com>
@dwightjack
Copy link
Contributor Author

@aarongoldenthal thanks! I've merged your suggestions.

About the --json flag, if we plan to release this feature in a major release, then I agree we can remove it. If so:

  1. Since this PR is getting long and the flag is only related to the CLI module, what about working on that on a different PR?
  2. Is it still fine to merge this on master or should I use a different branch as base?

@masi
Copy link

masi commented Jun 2, 2021

@aarongoldenthal thanks! I've merged your suggestions.

About the --json flag, if we plan to release this feature in a major release, then I agree we can remove it.

If it is not too much hassle it would be nice to deprecate the flag at first and remove it only in a major version number change (ie 3.0) as it is a breaking change.

@aarongoldenthal
Copy link
Contributor

@dwightjack It looks good to me. And I think it makes sense to look at deprecating --json as a separate PR, which leaves open either option. @joeyciechanowicz?

Copy link
Member

@joeyciechanowicz joeyciechanowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition and brings it line with the reporter merging that's recently been completed on the pa11y repo.
Really lovely documentation as well ❤️

@dwightjack
Copy link
Contributor Author

@joeyciechanowicz thanks to you and @aarongoldenthal for the help and great feedback.

@joeyciechanowicz joeyciechanowicz merged commit 5c842cf into pa11y:master Jul 19, 2021
@josebolos
Copy link
Member

This has now been released as part of v3.0.0. Thanks everyone for all the work in this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants