You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scala worksheet, so assertions are verified by eye.
Using mergeByKey for convenience, knowing it delegates to mergeByKeyWith.
importscala.collection.decorators.mapDecoratorvalarthur="arthur.txt"valtyson="tyson.txt"valsandra="sandra.txt"valallKeys=Set(arthur, tyson, sandra)
valsharedValue=1valourChanges=Map(
(
arthur,
sharedValue
),
(
tyson,
2
)
)
valtheirChanges=Map(
(
arthur,
sharedValue
),
(
sandra,
3
)
)
ourChanges -> theirChanges
ourChanges.mergeByKey(theirChanges)
// Expect all the keys to appear in an outer join, and they do, good...
ourChanges.mergeByKey(theirChanges).keys == allKeys
theirChanges.mergeByKey(ourChanges)
// Expect all the keys to appear in an outer join, and they do, good...
theirChanges.mergeByKey(ourChanges).keys == allKeys
// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They are, good...
ourChanges
.mergeByKey(theirChanges)
.values
.map(_.swap)
.toList
.sorted
.sameElements(theirChanges.mergeByKey(ourChanges).values.toList.sorted)
// Expect these to be equal, and they are, good...
ourChanges.mergeByKey(theirChanges).keySet == theirChanges
.mergeByKey(ourChanges)
.keys
valtheirChangesRedux=Map(
(
arthur,
sharedValue
),
(
sandra,
sharedValue // <<<<------- Ahem!
)
)
ourChanges -> theirChangesRedux
ourChanges.mergeByKey(theirChangesRedux)
// Expect all the keys to appear in an outer join, but they don't...
ourChanges.mergeByKey(theirChangesRedux).keys == allKeys
theirChangesRedux.mergeByKey(ourChanges)
// Expect all the keys to appear in an outer join, and they do, good...
theirChangesRedux.mergeByKey(ourChanges).keys == allKeys
// Expect the same associated values to appear in the join taken either way around, albeit swapped around and not necessarily in the same key order. They aren't...
ourChanges
.mergeByKey(theirChangesRedux)
.values
.map(_.swap)
.toList
.sorted
.sameElements(theirChangesRedux.mergeByKey(ourChanges).values.toList.sorted)
// Expect these to be equal, but they aren't...
ourChanges.mergeByKey(theirChangesRedux).keySet == theirChangesRedux
.mergeByKey(ourChanges)
.keys
Expectation
The Scaladoc for mergeByKey states:
Perform a full outer join of this and that.
Equivalent to mergeByKeyWith(that) { case any => any }.
So all the keys from ourChanges and theirChanges should appear in the resulting map, albeit not necessarily in the same order if the specialised map implementations are created (which they are in the example above). The corresponding values from one or both of ourChanges and theirChanges should appear in the associated pairs, wrapped in Some.
Observed
ourChanges.mergeByKey(theirChangesRedux) drops key sandra, whereas theirChangesRedux.mergeByKey(ourChanges) preserves all keys.
Discussion
There is a set, traversed that is used to prevent duplicate key processing when switching from iterating over from coll to other. It is however typed as a set over the value type W of other, and is populated with values, not keys.
In the example above, the use of a shared value triggers the problem, causing the entry for key sandra to be dropped.
I am assuming this was a typo and not the desired behaviour.
If my diagnosis is correct, I can submit a PR with a proper test and the change in implementation to make it pass.
I'll wait until others chime in in case I've misinterpreted the original intent...
The text was updated successfully, but these errors were encountered:
Seen in
0.3.0
Reproduction
Scala worksheet, so assertions are verified by eye.
Using
mergeByKey
for convenience, knowing it delegates tomergeByKeyWith
.Expectation
The Scaladoc for
mergeByKey
states:So all the keys from
ourChanges
andtheirChanges
should appear in the resulting map, albeit not necessarily in the same order if the specialised map implementations are created (which they are in the example above). The corresponding values from one or both ofourChanges
andtheirChanges
should appear in the associated pairs, wrapped inSome
.Observed
ourChanges.mergeByKey(theirChangesRedux)
drops keysandra
, whereastheirChangesRedux.mergeByKey(ourChanges)
preserves all keys.Discussion
There is a set,
traversed
that is used to prevent duplicate key processing when switching from iterating over fromcoll
toother
. It is however typed as a set over the value typeW
ofother
, and is populated with values, not keys.In the example above, the use of a shared value triggers the problem, causing the entry for key
sandra
to be dropped.I am assuming this was a typo and not the desired behaviour.
If my diagnosis is correct, I can submit a PR with a proper test and the change in implementation to make it pass.
I'll wait until others chime in in case I've misinterpreted the original intent...
The text was updated successfully, but these errors were encountered: