Skip to content

Nakilon/nakischema

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nakischema

Why?

I often process complex and undocumented data such as game assets or web responses. Previously I used to add asserts everywhere to check things on the fly but now I think the much better practice is to split the job into two steps and preprocess the whole data, i.e. validate it, before starting the actual work with it, because:

  1. Having asserts in random places may cause running them more than once for no purpose.
  2. Having asserts at all is just slowing your program.
  3. It is much simpler to figure things out after you've processed all the data. Valid schema is a valid documentation.

The whole Nakischema API is just one method.
The whole schema is just one Ruby object.
Say no to needless DSLs.

Also exceptions are informative -- they tell you where and how things went wrong.

How?

Install

gem install nakischema
require "nakischema"

Validate Ruby objects

Schema can be as simple as just checking basic scalar objects for equality or class:

Nakischema.validate "John", String  # => nil
Nakischema.validate "John", "Joe"   # Nakischema::Error: expected "Joe" != "John"
Nakischema.validate "John", Numeric # Nakischema::Error: expected Numeric != String

And schema can be nested to validate nested objects.
Array and Hash objects have to be validated by Hash object schema with special keys:

  • :each for validating every Array item with nested schema object
  • :hash for validating every value via exact Hash keys match
Nakischema.validate( [
  {name: "John", age: 20},
  {name: "Bill", age: 15},
], {
  each: {
    hash: {
      name: /\A[A-Z][a-z]+\z/,
      age: 18..100,
    },
  },
} )
expected 18..100 != 15 (at [:"#1", :age]) (Nakischema::Error)
  • :size to specify allowed Array size range
  • :hash_req to specify required Hash items
  • :hash_opt to specify optional Hash items -- subtracted when validating :hash if present

Your schema object can be recursive to validate objects that are recursive or just look like that:

human_schema = {}
human_schema.replace( {
  hash_req: {name: /\A[A-Z][a-z]+\z/, age: 0..100},
  hash_opt: {parents: {size: 2..2, each: human_schema}},
} )

father = {name: "John", age: 40}
mother = {name: "Anna", age: 40}
Nakischema.validate( [
  father,
  mother,
  {name: "Bill", age: 18, parents: [father, mother]},
],
  {each: human_schema}
)

The [[ ]] syntax validates an Array in order (it also can be nested).

pets = [
  ["cat", "Thomas"],
]
Nakischema.validate( pets, {
  each: [[/\A[a-z]+\z/, /\A[A-Z]/]]
} )

The "or-group" [ ] tries to match the object with any of a given list of "rules" (schemas).

  • :assertions allows passing a list of lambdas to do arbitrary checks and return booleans.
humans = [
  {name: "John", gender: :male},
  {name: "Anna", gender: :female, pets: %w{ Thomas }},
  {name: "Bill", gender: :attack_helicopter},
]
Nakischema.validate( humans, {
  each: {
    hash_opt: {
      pets: {
        each: {
          assertions: [
            -> pet_id, _ { pets.map(&:last).include? pet_id },
          ],
        }
      },
    },
    hash_req: {
      name: /\A[A-Z]/,
      gender: [:male, :female],
    },
  },
} )
expected at least one of 2 rules to match the :attack_helicopter, errors: (Nakischema::Error)
  expected :male != :attack_helicopter (at [:"#2", :gender, :"variant#0"])
  expected :female != :attack_helicopter (at [:"#2", :gender, :"variant#1"])

Here you can see that nested schema validation errors produce nested exception messages with indentation so you can easily see the whole validation object tree path that was made.

And if Anna had a pet with unfamiliar name that custom assertion would throw:

custom assertion failed (at [:"#1", :pets, :"#0"]) (Nakischema::Error)

Known issue and workaround: the [[[]]] is being interpereted as [[ [] ]], not [ [[]] ], but you can force it by doing: [:something, [[]]].

There are a few other special keys. You'll find them in source code easily.

Validate Oga objects

gem oga is a Nokogiri analogue that I recommend you to try out because it's written in pure Ruby. Nakischema.validate_oga_xml is a method more or less similar to validate. See basic Wolfram Alpha API response validation as an example:

xml = Oga.parse_xml open link, &:read
Nakischema.validate_oga_xml xml, {
  exact: {
    "queryresult" => [[ {
      attr_req: {"success": "true", "error": "false", "inputstring": query},
      assertions: [
        ->n,_{ n.at_xpath("./pod")["id"] == "Input" },
        ->n,_{ n.xpath("/pod").each{ |_| _["id"] == _["title"].delete(" ") } },
      ],
      exact: {"pod" => {size: 8..8}, "assumptions" => [[{}]]},
      children: {
        "/*[@error='true']" => [[]],
        "/pod" => {each: {attr_req: {"id": /\A([A-Z][a-z]+)+(:([A-Z][a-z]+)+)?\z/, "scanner": /\A([A-Z][a-z]+)+\z/}}},
        "./pod[@title='Input']" => [[{children: {"subpod" => [[{exact: {"plaintext" => [[{text: query}]]}}]]}}]],
        "./pod[@primary='true']" => [[{children: {"subpod" => [[{exact: {"plaintext" => [[{}]]}}]]}}]],
        "/pod[@scanner='Numeric']" => {each: {children: {"subpod" => [[{exact: {"plaintext" => [[{}]]}}]]}}},
      },
    } ]],
  },
}
  • exact -- similar to hash when using validate
  • attr_req -- required attributes
  • children -- children nodes
  • text -- ...text

exact and req accept either node name or XPath. Each results in an array of nodes of any length, even empty -- this is why you see so many [[]]. Also note that results of these selectors can overlap so you are able to apply multiple schemas to the same node.

Another real life example -- validating non-json object while using gem minitest

Imagine you want to validate the args that were passed to Monitoring.log when calling the log_ruby_processes. First you write such test to see the actual args passed:

it :log_ruby_processes do
  mock = MiniTest::Mock.new.expect :call, nil do |*args|

    require "pp"
    pp args

  end
  Monitoring.stub :log, mock do

    log_ruby_processes

  end
  mock.verify
end
$ ruby test.rb
...
[2021-09-06 00:59:58 UTC,
 nil,
 {:total=>3,
  :each=>
   [["/Users/nakilon/.../ruby test.rb", 35.53125, 43],
    ["/Users/nakilon/.../ruby take.rb", 3.20703125, 41],
    ["/Users/nakilon/.../bin/paster", 1.89453125, 14]]}]
...

Now just replace the require "pp"; pp args with something like this:

    Nakischema.validate args, [[
      Time,
      NilClass,
      {
        hash: {
          total: 3..3,
          each: {
            size: 3..3,
            each: [[/\A\/Users\//, Float, Fixnum]],
          },
        },
        assertions: [-> h,_ { h[:total] == h[:each].size }],
      },
    ]]

Custom mismatch message

Imagine you don't want a number higher than 5 (unless it's 10):

Nakischema.validate [1, 10, 7], {
  each: [
    { assertions: [-> x, _ {
      x <= 5
    } ] },
    10..10,
  ],
}
expected at least one of 2 rules to match the 7, errors: (Nakischema::Error)
  custom assertion failed (at [:"#2", :"variant#0", :"assertion#0"])
  expected 10..10 != 7 (at [:"#2", :"variant#1"])

To replace the generic "custom assertion failed" message with something more useful you can raise the Nakischema::Error manually (don't forget to return true otherwise):

Nakischema.validate [1, 10, 7], {
  each: [
    { assertions: [-> x, _ {
      raise Nakischema::Error.new "#{x} is too much" unless x <= 5
      true
    } ] },
    10..10,
  ],
}
expected at least one of 2 rules to match the 7, errors: (Nakischema::Error)
  7 is too much (at [:"#2", :"variant#0", :"assertion#0"])
  expected 10..10 != 7 (at [:"#2", :"variant#1"])

Why such stupid name?

Initially I wanted to call it something like "SchemaValidator" but:

$ gem search schema | grep valid
...
schema-validator (0.0.1)
schema_validations (2.3.0)
schema_validator (0.1.1)
validates_by_schema (0.4.0)
validates_schema (1.1.3)
...
$ gem search schema | wc -l
288

TODO

  • make some tests and GitHub Action for them
  • consider replacing .is_a? with .respond_to?
  • ability to assert at the same time the range and class (like is it's float or integer)

About

The most compact yet powerful arbitrary nested Ruby objects and XML (Oga) validator. Handy to validate data coming from someone else.

Topics

Resources

License

Stars

Watchers

Forks

Languages