Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and improve experience related to fingerprint regular expression flags #75

Open
jhart-r7 opened this issue Jun 30, 2015 · 0 comments
Assignees

Comments

@jhart-r7
Copy link
Contributor

As experienced in #74, there is room for improvement as it relates to how fingerprint regular expression flags are handled.

Today, the flags attribute is parsed in such a way that fingerprint authors can use multiple different formats for regular expression flags, and that during recog testing time, the variety of ways in which the flags are specified are normalized to something that Recog, which is written in Ruby, can understand. The issue with this, as we saw in #74, is that if we support multiple methods of specifying the same flag (in this case, REG_MULTILINE is the Java/GNU/perl friendly way and MULTILINE was the Ruby friendly way (per http://ruby-doc.org/core-2.1.1/Regexp.html#method-i-options)), any product that consumes Recog content must also have the same support for multiple methods of specifying options, or Recog itself needs to provide that mechanism.

Further proof of this is still lurking in IGNORECASE, which would still be allowed in current Recog but would break any Java/GNU/Perl implementation because REG_ICASE is the preferred method.

I am thinking we should do one of the following:

  1. Pick 1 set of regular expression flags, per the TODO from http://www.rubydoc.info/gems/recog/2.0.7/Recog/Fingerprint/RegexpFactory, such that products consuming recog are responsible for translating Recog's options. Ensure that the Recog tests will properly catch a fingerprint with bad flags.
  2. Ditch flags altogether and require that any regular expression "options" be specified in the regular expression itself. For example, rather than pattern="foo" flags="REG_ICASE", use pattern="(?i:foo)". We can automatically convert all of the existing fingerprints with some simple Ruby Regexp code that computes the new pattern with Regexp.new(old_pattern, old_flags).to_s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant