At Ruby Hack Night in February, we completed a series of Koans. A workshop participant had the following comments about the Koans dealing with Hash default values.
“From what I understand, when a default value (such as
[]
orArray.new
) is passed as argument to theHash#new
method, the hash is created with the empty array as default value. But when we reference the hash with a key (:a, :b, etc) and we try to push an element, all we ended up modifying is the default value, the array itself…. Am I right about this?”
– Vincent L.
Consider this simple use of a Hash default value, and how it behaves in an expected way:
% irb
irb(main):001:0> h = Hash.new(2)
=> {}
irb(main):002:0> h[:a]
=> 2
irb(main):003:0> h[:a] += 1
=> 3
irb(main):004:0> h[:b]
=> 2
Now here is some similar code, that demonstrates an unexpected behaviour:
% irb
irb(main):001:0> h = Hash.new([])
=> {}
irb(main):002:0> h[:a]
=> []
irb(main):003:0> h[:a] << "x"
=> ["x"]
irb(main):004:0> h[:b]
=> ["x"]
By that example, it appears that there is some problem with the Hash class, perhaps it’s getting confused about the values? It’s easy to blame the Hash class for the unexpected behaviour, but once you consider the following working example, you might see what’s going on:
% irb
irb(main):001:0> h = Hash.new([])
=> {}
irb(main):002:0> h[:a]
=> []
irb(main):003:0> h[:a] += ["x"]
=> ["x"]
irb(main):004:0> h[:b]
=> []
It works! Why in this last example was the default value unchanged, but in the earlier example it was? How random and bizarre this seemingly is!
Do you see what’s going on?
This behaviour is quite predictable, but to understand it you need to know that it’s caused by a collaboration of two things:
1. The hash default, whether set by the #new
or #default=
methods, is shared between each of the keys that use it. When I say “shared between each of the keys”, I mean to say that there is only one default value that is referenced or “pointed to” by any keys that use that default value.
2. Operations generally fall into two types: those that replace values with new instances, and those that modify a value “in-place”. We see this bizarre Hash behaviour only when operations of the latter type are used to modify values. Put another way, line “003” of each of these irb
examples exercises a slightly different piece of code. In the first and the third examples, the contents of h[:a]
are replaced with the result of a #+
operation through assignment. In the second example, the contents of h[:a]
are modified in-place.
The collaboration between these two details causes the seemingly unpredictable outcome. The strange behaviour, where the default value is changed, only happens if both: the hash uses a shared default value, and the operation modifies in-place (rather than replacing) the value of a key that is referring to the default.
Note that it is possible to force the creation of a distinct instance of the default value for each new key. Instead of an object, the Hash constructor also accepts a block. This block will be executed each time a default hash value is required, and thus avoid the surprising effects of in-place modifications, because each default value will be a separate instance.
An example:
% irb
irb(main):001:0> h = Hash.new { |h,k| h[k] = [] }
=> {}
irb(main):002:0> h[:a]
=> []
irb(main):003:0> h[:a] << "x"
=> ["x"]
irb(main):004:0> h[:b]
=> []
How should we deal with this? Some developers will err on the side of caution and always use the block constructor, and I believe this is a reasonable compromise but only if it is probable that an unaware developer will get tripped up by this in later edits to the same file. On the flip-side, with a shared default value there is a small run-time benefit in size and speed. This benefit will be largest if the Hash
has a large number of keys with default values. Plus, since there are a limited number of operations that modify in-place, you may decide to optimize your code by taking advantage of the single instance of the default value!
In summary, Vincent is right about the Array#push
operation (i.e. the second example) modifying both the default value and, since the default value is a reference, the value of any key that uses the default value. Many devs at all levels have been burned by this. It is one of the few times in Ruby when we are entirely aware of the use of references to objects – and it’s not something we are particularly prepared to deal with because Ruby works intuitively in most contexts.
I would advocate that this is not a bug, but a feature to be exploited when the time is right – perhaps one of the rougher edges of the default Ruby classes, but for those of us who consider ourselves “Ruby gurus”, our clarity of understanding of this is another little trophy on our mental bookshelf of geekdom. Assess the risk of this within your team and choose appropriately. Better yet, get your team to read this and thrive on the way the Hash default values are intended to work.
[Click for more articles from our Development series]
[Click for more articles from our Workshop series]
[Click to learn more about Ryatta]
Ah, I find that this explanation is digestible now. From what I got out of this post, it seems to go like this:
By using a block, we are guaranteed that whenever we add a new key, the value in question is computed and assigned for that key – rather than using the key to point to the shared default value – the creation of the new key would then lead to the creation of new instance of the default value (array). This notion of property generating distinct key-value pairs seems to apply to any default value – array or not.
Here’s what I learned after trying out the console myself:
2.1.0 :001 > h = Hash.new { |h,k| h[k] = [] }
=> {}
2.1.0 :002 > h[:a]
=> []
2.1.0 :003 > h[:a] << “x”
=> [“x”]
2.1.0 :004 > h[:b]
=> []
2.1.0 :005 > h
=> {:a=>[“x”], :b=>[]}
This is indeed a great explanation for this seemingly odd behavior. I also got tripped up by this during the workshop . Later on when I was reviewing this exercise I came across this post. I found it interesting since it is often hard to find applicability in these sort of subtleties. .
Also, I think is worth point out that when you initialize a hash with a default value and then modify it, you only get to modify the default value and not the hash itself. What I mean is this:
% irb
irb(main):001:0> h = Hash.new([])
=> {}
irb(main):002:0> h[:a]
=> []
irb(main):003:0> h[:a] = []
=> []
irb(main):004:0> h
=> {:a=>[]}
irb(main):005:0> h[:b]
=> []
irb(main):006:0> h[:b] << "x"
=> ["x"]
irb(main):007:0> h
=> {:a=>[]}
irb(main):008:0> h[:b]
=> ["x"]
irb(main):009:0> h[:c]
=> ["x"]
irb(main):010:0> h
=> {:a=>[]}
Initializing a hash with a block does not follow this behavior as we can see from Vincent’s comment. I still do not know why they would design it to behave like this. Maybe David can shed some light.