Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer key upsets Hashie::Mash with SymbolizeKeys extension #516

Closed
deemytch opened this issue Jan 29, 2020 · 16 comments
Closed

Integer key upsets Hashie::Mash with SymbolizeKeys extension #516

deemytch opened this issue Jan 29, 2020 · 16 comments

Comments

@deemytch
Copy link

I made a Hash like that:

yaml = {"app"=>{"id"=>"deployer", "progname"=>"deployer"}, "servicelist"=>{"clerk"=>{"publish"=>{3000=>3000, 3500=>5675}}}}

Then wrote a class:

class SettingsHash < ::Hashie::Mash  
  include Hashie::Extensions::Mash::SymbolizeKeys    
end

And that code was crashed with NoMethodError

cfg = SettingsHash.new
cfg.merge! yaml

NoMethodError: undefined method `to_sym' for 3000:Integer
Did you mean?  to_s
from .gem/ruby/2.7.0/gems/hashie-4.0.0/lib/hashie/extensions/mash/symbolize_keys.rb:33:in `convert_key'
@michaelherold
Copy link
Member

Yep, that's a bug. We'd love to accept a patch that makes it so the symbolize extension doesn't cause errors with other types of keys. Any interest in taking on that challenge?

@deemytch
Copy link
Author

Just trying to repair this, made simple checks. But has been stuck in, don't know what to do there:

TypeError: 1024 is not a symbol nor a string
from hashie/lib/hashie/mash.rb:410:in `respond_to?'

Diff

diff --git a/lib/hashie/extensions/mash/symbolize_keys.rb b/lib/hashie/extensions/mash/symbolize_keys.rb
index 91a6b14..634afed 100644
--- a/lib/hashie/extensions/mash/symbolize_keys.rb
+++ b/lib/hashie/extensions/mash/symbolize_keys.rb
@@ -30,7 +30,7 @@ module Hashie
         # @param [String, Symbol] key the key to convert to a symbol
         # @return [void]
         def convert_key(key)
-          key.to_sym
+          key.is_a?( String ) ? key.to_sym : key
         end
       end
     end
diff --git a/lib/hashie/mash.rb b/lib/hashie/mash.rb
index 96b24c9..1d0255e 100644
--- a/lib/hashie/mash.rb
+++ b/lib/hashie/mash.rb
@@ -134,7 +134,8 @@ module Hashie
     # a string before it is set, and Hashes will be converted
     # into Mashes for nesting purposes.
     def custom_writer(key, value, convert = true) #:nodoc:
-      key_as_symbol = (key = convert_key(key)).to_sym
+      key = convert_key(key)
+      key_as_symbol = key.is_a?(String) ? key.to_sym : key
 
       log_built_in_message(key_as_symbol) if log_collision?(key_as_symbol)
       regular_writer(key, convert ? convert_value(value) : value)

@deemytch
Copy link
Author

And found also another bug

class SettingsHash < ::Hashie::Mash
end
cfg = SettingsHash.new
yaml = {"app"=>{"id"=>"deployer", "progname"=>"deployer"}, "servicelist"=>{"clerk"=>{"publish"=>{3000=>3000, 3500=>5675}}}}
cfg.merge! yaml

Here ok.

cfg.services.clerk  = { publish: { 80 => 1080 }}
cfg.services.clerk

=> {"publish"=>{"80"=>1080}}

Integer key 80 was converted into String "80".

@michaelherold
Copy link
Member

Mashes are weird, to be honest. Much of the logic assumes that you're dealing with String/Symbol keys because otherwise you do not get the benefits of a Mash. This might be a bigger deal than it originally seemed.

@carolineartz
Copy link
Contributor

I'm interested in this. Curious about the desired behavior at the base mash level for non string/symbol keys? As @michaelherold says, the point of a mash is "pseudo-object functionality". Only strings and symbols can define a method call.

In @deemytch's example, what would the expectation be given a hash with numeric string and Number keys?

"servicelist"=>{"clerk"=>{"publish"=>{3000=>3000, "3000"=>5675}}}}

So what is the goal of not throwing an error when mixing in the Mash::SymbolizeKeys extension? If the desire is to have Mash preserve the type of non string/symbol keys and ignore them in key conversion, one could do:

diff --git a/lib/hashie/extensions/symbolize_keys.rb b/lib/hashie/extensions/symbolize_keys.rb
index e4315bd..543e0f9 100644
--- a/lib/hashie/extensions/symbolize_keys.rb
+++ b/lib/hashie/extensions/symbolize_keys.rb
@@ -46,7 +46,7 @@ module Hashie
           hash.extend(Hashie::Extensions::SymbolizeKeys) unless hash.respond_to?(:symbolize_keys!)
           hash.keys.each do |k| # rubocop:disable Performance/HashEachMethods
             symbolize_keys_recursively!(hash[k])
-            hash[k.to_sym] = hash.delete(k)
+            hash[convert_key(k)] = hash.delete(k)
           end
           hash
         end
@@ -61,6 +61,12 @@ module Hashie
             symbolize_keys!(new_hash)
           end
         end
+
+        private
+
+        def convert_key(key)
+          key.to_sym rescue key
+        end
       end

diff --git a/lib/hashie/mash.rb b/lib/hashie/mash.rb
index 96b24c9..7894a2a 100644
--- a/lib/hashie/mash.rb
+++ b/lib/hashie/mash.rb
@@ -122,7 +122,7 @@ module Hashie
     alias regular_writer []=

     # Retrieves an attribute set in the Mash. Will convert
-    # any key passed in to a string before retrieving.
+    # any Symbol key passed in to a string before retrieving.
     def custom_reader(key)
       default_proc.call(self, key) if default_proc && !key?(key)
       value = regular_reader(convert_key(key))
@@ -130,14 +130,14 @@ module Hashie
       value
     end

-    # Sets an attribute in the Mash. Key will be converted to
+    # Sets an attribute in the Mash. Symbol keys will be converted to
     # a string before it is set, and Hashes will be converted
     # into Mashes for nesting purposes.
     def custom_writer(key, value, convert = true) #:nodoc:
-      key_as_symbol = (key = convert_key(key)).to_sym
+      key_as_symbol = key.to_s.to_sym

       log_built_in_message(key_as_symbol) if log_collision?(key_as_symbol)
-      regular_writer(key, convert ? convert_value(value) : value)
+      regular_writer(convert_key(key), convert ? convert_value(value) : value)
     end

     alias [] custom_reader
@@ -371,7 +371,7 @@ module Hashie
     end

     def convert_key(key) #:nodoc:
-      key.to_s
+      key.respond_to?(:to_sym) ? key.to_s : key
     end

and now your example "works"

[1] pry(main)> yaml = {"app"=>{"id"=>"deployer", "progname"=>"deployer"}, "servicelist"=>{"clerk"=>{"publish"=>{3000=>3000, 3500=>5675}}}}
=> {"app"=>{"id"=>"deployer", "progname"=>"deployer"}, "servicelist"=>{"clerk"=>{"publish"=>{3000=>3000, 3500=>5675}}}}
[2] pry(main)> class SettingsHash < ::Hashie::Mash
[2] pry(main)*   include Hashie::Extensions::Mash::SymbolizeKeys
[2] pry(main)* end
=> SettingsHash
[3] pry(main)> cfg = SettingsHash.new
=> {}
[4] pry(main)> cfg.merge! yaml
=> {:app=>{:id=>"deployer", :progname=>"deployer"}, :servicelist=>{:clerk=>{:publish=>{3000=>3000, 3500=>5675}}}}

HOWEVER this is a breaking change now that mash keys are not blanket converted to strings.

current behavior

[16] pry(main)> hash = {animals: {"1": "cat", "1" => "dog", 1 => "also cat"}}
=> {:animals=>{:"1"=>"cat", "1"=>"dog", 1=>"also cat"}}
[17] pry(main)> mash = Hashie::Mash.new(hash)
=> {"animals"=>{"1"=>"also cat"}}

new behavior

[34] pry(main)> hash =  {animals: {"1": "cat", "1" => "dog", 1 => "also cat"}}
=> {:animals=>{:"1"=>"cat", "1"=>"dog", 1=>"also cat"}}
[35] pry(main)> mash = Hashie::Mash.new(hash)
=> {"animals"=>{"1"=>"dog", 1=>"also cat"}}

Since symbols and strings define the same method, indifferent access is implied, but complicating this scenario, Mashes also allow dig access by Numeric types to values keyed off numeric strings...

context 'with numeric key' do
subject { described_class.new('1' => { b: 1 }) }
it 'accepts a numeric value as key' do
expect(subject.dig(1, :b)).to eq(1)
expect(subject.dig('1', :b)).to eq(1)
end
end

With the animals hash example above

current behavior

[18] pry(main)> mash.dig("animals", 1)
=> "also cat"
[19] pry(main)> mash.dig("animals", "1")
=> "also cat"

new behavior?

[36] pry(main)> mash.dig(:animals, 1)
=> "also cat"
[37] pry(main)> mash.dig(:animals, "1")
=> "dog"

The same type of "fix" applied to the base Hashie symbolize keys mixin

[1] pry(main)> MyHash = Class.new(Hash); MyHash.public_send(:include, Hashie::Extensions::SymbolizeKeys);
[2] pry(main)> hash =  {"animals" => {"1": "cat", "1" => "dog", 1 => "also cat"}}
=> {"animals"=>{:"1"=>"cat", "1"=>"dog", 1=>"also cat"}}
[3] pry(main)> myhash = MyHash.new.merge(hash)
=> {"animals"=>{:"1"=>"cat", "1"=>"dog", 1=>"also cat"}}
[4] pry(main)> myhash.symbolize_keys
=> {:animals=>{:"1"=>"dog", 1=>"also cat"}}

@deemytch
Copy link
Author

deemytch commented Mar 3, 2020

IMHO when I use numbers for keys, sure I don't hope to use it like that hash.ports.300. hash.ports[300] will be enough. Also because usually it is hash.ports[ internal_port ] = external_port or even like that hash.ports.each{|i,o| Nat.new_forwarding(i, o) }

@carolineartz
Copy link
Contributor

carolineartz commented Mar 3, 2020

That doesn't really narrow the questions needing answers in order to patch this. It isn't possible to "fix" this and preserve the existing behavior of a Mash. Allowing integers to remain integers as keys in a Mash will break dig as described above.

hash.ports.300 (i.e., foo.<Integer>) isn't an option in any scenario; it's a SyntaxErrorso no need to worry about supporting or not supporting it.

@carolineartz
Copy link
Contributor

@deemytch given how Mash implements dig, you could just use that with your integer keys to access the values....

[2] pry(main)> x = Hashie::Mash.new(ports: { 300 => "foo" })
=> {"ports"=>{"300"=>"foo"}}
[3] pry(main)> x.dig(:ports, 300)
=> "foo"

@deemytch
Copy link
Author

deemytch commented Apr 2, 2020

Such use x.dig(:ports, 300) is unusable. hashie/mashie/sashie/gashie are for usability, isn't it?
I can write cfg[:ports][300] and that's ok without any hashies.
I proposed just that hashie do not break my program when it encounters any non-string key.

@dblock
Copy link
Member

dblock commented Apr 3, 2020

Such use x.dig(:ports, 300) is unusable. hashie/mashie/sashie/gashie are for usability, isn't it?
I can write cfg[:ports][300] and that's ok without any hashies.
I proposed just that hashie do not break my program when it encounters any non-string key.

I agree with this angle. Try a PR?

@carolineartz
Copy link
Contributor

carolineartz commented Apr 9, 2020

I agree with this angle. Try a PR?

I don't really understand how this is an angle... my comments were trying to outline that making the required code changes to avoid throwing an error open some questions on expected behavior:

the bottom line as I see it here is that in order to support including the SymbolizeKeys extension , the keys of the structure that is extended must be strings or symbols to begin with. So, if you're considering this outcome a "bug":

Integer key 80 was converted into String "80".

then this is impossible.

If converting the strings is acceptable --then sure a PR is possible. This means that if you were relying on cfg.ports[300] in your code, where cfg is a Hashie with symbolized keys, you're out of luck.

Further, accepting the conversion to strings as a means of "fixing" this by not throwing an error means that if your incoming Hash happens to have both a string key of 80 and a numeric key of 80 to begin with, you'll potentially have some unexpected results.

@dblock
Copy link
Member

dblock commented Apr 10, 2020

Apologies, I'll clarify what I meant @carolineartz.

First, it's an interesting problem and thank you for stepping up to solve it. I was agreeing with you on the issue and goals (we don't want an error) and I was agreeing on the solution (convert integers, and possibly other types, to strings then to symbols). I think it's the best we can do, and you're absolutely right about the side effects. I generally also value strong opinions from contributors on what we should be doing, cause what do I know? :) All I'm saying is "let's see a PR for that!" and "thank you".

For the actual PR, my first question will be "which existing tests fail"? If the answer is none, then we're good to go with a paragraph in README and UPGRADING. If we are introducing regressions we should highlight those and @michaelherold can make a call on which way to go.

Finally, if we made a mistake, it will be fun to write another chapter of the demonic possession of Hashie::Mash :)

@carolineartz
Copy link
Contributor

carolineartz commented Apr 23, 2020

@dblock opinion-wise, I think the symbolization of keys should behave like the rails Hash extensions [deep]_symbolize_keys[!]. In that case, symbolization only occurs if the keys can in fact be symbolized (responds to to_sym); non-string keys are left as is.

Here is a branch to the above effect showing the diffs.

https://github.com/hashie/hashie/compare/master...carolineartz:fix-symbolize-keys?expand=1

These changes result in two expected spec failures.

Mash#invert

subject(:mash) { described_class.new(a: 'apple', b: 4) }

it 'returns a mash with the keys and values inverted' do
expect(mash.invert).to eq('apple' => 'a', '4' => 'b')
end

and

Mash#dig

context 'with numeric key' do
subject { described_class.new('1' => { b: 1 }) }
it 'accepts a numeric value as key' do
expect(subject.dig(1, :b)).to eq(1)
expect(subject.dig('1', :b)).to eq(1)
end
end

IMO, the first re: Mash#invert is straightforward and would result in a change to the expected behavior and thus the spec itself.

As for the second...I don't really understand why it was ever desired to have dig behave as specced in the first place 🤷‍♀️

thoughts?

@dblock
Copy link
Member

dblock commented Apr 24, 2020

Looks good to me:

  1. Make a PR so we can discuss lines of code.
  2. Use .respond_to?, do not rely on NoMethodError for flow.

I presume the first test leaves a numeric 4 as a key? That seems like a fix to a bug :)

The second one is trying to make a Mash behave invariably between an Integer key and a String key. Because you're saying this should only work when the two types can be converted back and forth, this test is also moot. Furthermore the subject in the example above isn't actually a numeric key despite what the description says, it's a String. Write some tests for the permutations of symbol, string and integer, and let's document clear expectations of what happens.

We could introduce an invariable extension for non bi-directional values such as Integer, and make Mash treat everything invariably even if it can't go both ways, but we have to ask ourselves, what else could possibly go wrong?

README changes and regressions in UPGRADING plus a major version bump will be important, please.

@carolineartz
Copy link
Contributor

@dblock on it. Will open a PR tonight or tomorrow. Thanks for the help and we'll move further discussion there. I'll link between this issue and the forthcoming PR.

@dblock
Copy link
Member

dblock commented May 5, 2020

This was resolved in #521.

@dblock dblock closed this as completed May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants