Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify throws NameError, and, as implemented, doesn't work #92

Closed
GoogleCodeExporter opened this issue Apr 13, 2015 · 2 comments · Fixed by #248 · May be fixed by #249
Closed

Simplify throws NameError, and, as implemented, doesn't work #92

GoogleCodeExporter opened this issue Apr 13, 2015 · 2 comments · Fixed by #248 · May be fixed by #249

Comments

@GoogleCodeExporter
Copy link

GoogleCodeExporter commented Apr 13, 2015

What steps will reproduce the problem?

  1. invoke set_simplify(true) on a graph
  2. invoke method to_string()
  3. profit

What is the expected output? What do you see instead?
Extraneous edges removed from dot output

What version of the product are you using? On what operating system?
1.0.2

Please provide any additional information below.

This is a patch which both resolves the NameError and enables the desired behavior:

--- /usr/lib/python2.6/site-packages/pydot.py   2014-06-12 06:22:57.000000000 
+0400
+++ virtenv/lib/python2.6/site-packages/pydot.py    2014-06-12 06:28:42.515864719 
+0400
@@ -1455,11 +1455,11 @@

                 edge = Edge(obj_dict=obj)

-                if self.obj_dict.get('simplify', False) and elm in edges_done:
+                if self.obj_dict.get('simplify', False) and 
edge.obj_dict['points'] in edges_done:
                     continue
-
+
                 graph.append( edge.to_string() + '\n' )
-                edges_done.add(edge)
+                edges_done.add(edge.obj_dict['points'])

             else:

Original issue reported on code.google.com by TheSoup...@gmail.com on 12 Jun 2014 at 2:30

Attachments:

@GoogleCodeExporter
Copy link
Author

GoogleCodeExporter commented Apr 13, 2015

Or if you prefer to retain encapsulation:

--- /usr/lib/python2.6/site-packages/pydot.py   2014-06-12 06:22:57.000000000 
+0400
+++ virtenv/lib/python2.6/site-packages/pydot.py    2014-06-12 06:58:20.478835432 
+0400
@@ -834,6 +834,11 @@

         return False

+    def __hash__(self):
+        if self.get_parent_graph().get_top_graph_type() == 'graph':
+            return hash(min(self.get_source(), self.get_destination())
+                    + max(self.get_source(), self.get_destination()))
+        return hash(self.get_source() + self.get_destination())


     def parse_node_ref(self, node_str):
@@ -1455,7 +1460,7 @@

                 edge = Edge(obj_dict=obj)

-                if self.obj_dict.get('simplify', False) and elm in edges_done:
+                if self.obj_dict.get('simplify', False) and edge in edges_done:
                     continue

                 graph.append( edge.to_string() + '\n' )

Original comment by TheSoup...@gmail.com on 12 Jun 2014 at 3:00

Attachments:

@johnyf johnyf self-assigned this Dec 26, 2017
peternowee added a commit to peternowee/pydot that referenced this issue Nov 27, 2020
This test tests the output of `Graph.to_string()` for graphs with
multiple (parallel) edges and combinations of `simplify` set to `True`
or `False` and the graph being directed or undirected.

This test was originally created for issue [pydot#92][1]
("Simplify throws NameError, and, as implemented, doesn't work"). It
can also help catch regressions when doing future work on
`Graph.to_string()`, `Edge.__eq__()` or `Edge.__hash__()`.

[1]: pydot#92
peternowee added a commit to peternowee/pydot that referenced this issue Nov 27, 2020
These tests test `Edge` equality comparison and hashing for regressions
and point out some current issues.

For equality comparison (`Edge.__eq__()`):

- Our current definition of edge equality: Same endpoints and, if the
  parent graph is directed, the same order of endpoints. (Changes to
  that definition may require changing some of these tests as well.)
- [Identical objects should compare equal][1] (reflexivity).
- [Comparisons should be symmetric][1].
- Rich comparison methods should [return `NotImplemented` if they do
  not implement the operation for the operands provided][2] (see also
  [here][3]).
- Availability of equality comparison before the edge has a parent
  graph.

For hashing (`Edge.__hash__()`):

- [Hashable objects which compare equal must have the same hash
  value][4] (see also [here][5]).
- [The hash value of an hashable object should never change during its
  lifetime][4] (see also [here][5]).
- Try to prevent hash collisions, because they may degrade performance.
- Availability of the hash method before the edge has a parent graph.

Some tests are based on problems seen with the bug and patches reported
in issue [pydot#92][6]. Some other tests are based on examples
from articles by [Aaron Meurer][7] and [Roy Williams][8] and adapted to
our `Edge` class.

The tests that are currently failing have [`expectedFailure`][9]
decorators that will be removed as their underlying issues are
addressed in future commits.

The PR for this commit is [pydot#249][10]. It should get links to
spin-off issues and PRs that discuss the various issues brought up by
these tests in more detail.

[1]: https://docs.python.org/3.9/reference/expressions.html#value-comparisons
[2]: https://docs.python.org/3.9/reference/datamodel.html#the-standard-type-hierarchy
[3]: https://docs.python.org/3.9/reference/datamodel.html#object.__eq__
[4]: https://docs.python.org/3.9/glossary.html#term-hashable
[5]: https://docs.python.org/3.9/reference/datamodel.html#object.__hash__
[6]: pydot#92
[7]: https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/
[8]: https://eng.lyft.com/hashing-and-equality-in-python-2ea8c738fb9d
[9]: https://docs.python.org/3.9/library/unittest.html#unittest.expectedFailure
[10]: pydot#249
@peternowee
Copy link
Member

Table of contents

The bug ^

Minimal reproducible example (MRE)

The following test case for the test suite contains a minimal reproducible example for the bug reported at the top of this post:

    def test_graph_simplify(self):
        # Fail example: pydot 1.0.2. GH pydot/pydot#92 OP patch 1.
        g = pydot.Graph()
        g.add_edge(pydot.Edge('a', 'b'))
        g.add_edge(pydot.Edge('a', 'b'))
        g.add_edge(pydot.Edge('b', 'a'))
        g.add_edge(pydot.Edge('b', 'a'))
        test_combinations = [
            ('graph', False,
             'graph G { a -- b; a -- b; b -- a; b -- a; }'),
            ('graph', True,
             'graph G { a -- b; }'),
            ('digraph', False,
             'digraph G { a -> b; a -> b; b -> a; b -> a; }'),
            ('digraph', True,
             'digraph G { a -> b; b -> a; }')]
        expected_concat = observed_concat = ''
        for (graph_type, simplify, expected) in test_combinations:
            expected_concat += 'graph_type %s, simplify %s: %s\n' % (
                               graph_type, simplify, expected)
            g.set_type(graph_type)
            g.set_simplify(simplify)
            try:
                observed = ' '.join(g.to_string().split())
            except (NameError, TypeError) as e:
                observed = '%s: %s' % (type(e).__name__, e)
            observed_concat += 'graph_type %s, simplify %s: %s\n' % (
                               graph_type, simplify, observed)
        self.maxDiff = None
        self.assertMultiLineEqual(expected_concat, observed_concat)

PR #247 was created to add this test in the test suite.

MRE results

  • pydot 1.0.2 (eb00301) on Python 2.7.16: FAIL on graphs that have simplify set to True.

    ======================================================================
    FAIL: test_graph_simplify (__main__.TestGraphAPI)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "mre.py", line 160, in test_graph_simplify
        self.assertMultiLineEqual(expected_concat, observed_concat)
    AssertionError: 'graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }\n [truncated]... != "graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }\n [truncated]...
      graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }
    - graph_type graph, simplify True: graph G { a -- b; }
    + graph_type graph, simplify True: NameError: global name 'elm' is not defined
      graph_type digraph, simplify False: digraph G { a -> b; a -> b; b -> a; b -> a; }
    - graph_type digraph, simplify True: digraph G { a -> b; b -> a; }
    + graph_type digraph, simplify True: NameError: global name 'elm' is not defined
    
  • pydot 1.0.3 (b125c5d) on Python 2.7.16: Ok.

  • pydot 1.0.25 (ab17f0b) on Python 2.7.16: Ok.

  • pydot 1.4.1 (e47837e) on Python 2.7.16 and 3.7.3: Ok.

Bug already solved in pydot 1.0.3

The NameError bug was only present in pydot 1.0.2 (2008), and was resolved with pydot 1.0.3 (2010). I am not sure if pydot 1.0.3 had a public release, but at least pydot 1.0.25 (2011) is listed on PyPI. It is unclear to me why the bug report was made in 2014 only, providing patches even, without considering the later releases available at that time. Perhaps the reason why this issue was still not closed until today, is that it pulls the reader into this large subject area of membership testing, hashes and equality.

Cause

The NameError bug got into pydot 1.0.2 because of some major changes to Graph.to_string(). This included replacing some occurrences of the variable elm by a new variable edge. However, one occurrence, behind a conditional checking for pydot attribute simplify, did not get renamed.

Solution

The solution in pydot 1.0.3 was to rename that variable...

-                if self.obj_dict.get('simplify', False) and elm in edges_done:
+                if self.obj_dict.get('simplify', False) and edge in edges_done:

..., and at the same time implement an Edge.__hash__() method:

def __hash__(self):
     return hash( hash(self.get_source()) + hash(self.get_destination()) )

That hash method is still unchanged in pydot today (1.4.2.dev0).

Note that it would not have been possible to keep track of Edge objects in a set without defining some sort of Edge.__hash__(). Without it, edges_done.add(edge) and edge in edges_done would certainly not have worked as expected:

  • In Python 2, a missing Edge.__hash__() would have been inherited from object.__hash__(), which bases its return value on the object's id(). This would have resulted in different hashes for all Edge instances, even those that were equal according to Edge.__eq__(). This would have meant the requirement that "objects which compare equal have the same hash value" was not fulfilled and would have resulted in edge in edges_done returning False all the time, so simplify seemingly just being ignored and all parallel edges getting printed.

    MRE results on pydot 1.0.2 with elm renamed to edge, but no Edge.__hash__() (Python 2.7.16):

    ======================================================================
    FAIL: test_graph_simplify (__main__.TestGraphAPI)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "mre.py", line 160, in test_graph_simplify
        self.assertMultiLineEqual(expected_concat, observed_concat)
    AssertionError: 'graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }\n [truncated]... != 'graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }\n [truncated]...
      graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }
    - graph_type graph, simplify True: graph G { a -- b; }
    + graph_type graph, simplify True: graph G { a -- b; a -- b; b -- a; b -- a; }
    ?                                                    ++++++++++++++++++++++++
      graph_type digraph, simplify False: digraph G { a -> b; a -> b; b -> a; b -> a; }
    - graph_type digraph, simplify True: digraph G { a -> b; b -> a; }
    ?                                                        ^
    + graph_type digraph, simplify True: digraph G { a -> b; a -> b; b -> a; b -> a; }
    ?                                                        ^    ++++++++   ++++++++
    
  • In Python 3, with Edge.__eq__() present, the missing definition of Edge.__hash__() would have been interpreted as None and edge in edges_done would have resulted in a TypeError. Pydot 1.0.2 did not support Python 3, but this outcome can still be experienced by removing Edge.__hash__() from pydot 1.4.1.

    MRE results on pydot 1.4.1 with Edge.__hash__() removed (Python 3.7.3):

    ======================================================================
    FAIL: test_graph_simplify (__main__.TestGraphAPI)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "mre.py", line 160, in test_graph_simplify
        self.assertMultiLineEqual(expected_concat, observed_concat)
    AssertionError: 'graph_type graph, simplify False: graph [238 chars] }\n' != "graph_type graph, simplify False: TypeEr[238 chars]e'\n"
    - graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }
    - graph_type graph, simplify True: graph G { a -- b; }
    - graph_type digraph, simplify False: digraph G { a -> b; a -> b; b -> a; b -> a; }
    - graph_type digraph, simplify True: digraph G { a -> b; b -> a; }
    + graph_type graph, simplify False: TypeError: unhashable type: 'Edge'
    + graph_type graph, simplify True: TypeError: unhashable type: 'Edge'
    + graph_type digraph, simplify False: TypeError: unhashable type: 'Edge'
    + graph_type digraph, simplify True: TypeError: unhashable type: 'Edge'
    

Patch 1 by the OP ^

The OP's first patch proposes not to use the complete Edge objects, but only the tuple of edge points edge.obj_dict['points'] for keeping track of which edges were already printed. With this, implementing a Edge.__hash__() method is avoided, because only the tuples need to be hashed for storage in edges_done.

OP patch 1 diff for pydot 1.0.2 (including a small fix for`Edge.__eq__()`)
diff --git a/pydot.py b/pydot.py
index c2c81bcb..ba527fa6 100644
--- a/pydot.py
+++ b/pydot.py
@@ -826,3 +826,3 @@ class Edge(object,  Common ):
             if ( ( self.get_source()==edge.get_source() and self.get_destination()==edge.get_destination() ) or
-                ( edge.get_source() == get_destination() and self.get_destination() == edge.get_source() ) ):
+                ( edge.get_source() == self.get_destination() and self.get_destination() == edge.get_source() ) ):
                 return True
@@ -1457,7 +1457,7 @@ class Graph(object, Common):
                 
-                if self.obj_dict.get('simplify', False) and elm in edges_done:
+                if self.obj_dict.get('simplify', False) and edge.obj_dict['points'] in edges_done:
                     continue
                 
                 graph.append( edge.to_string() + '\n' )
-                edges_done.add(edge)
+                edges_done.add(edge.obj_dict['points'])
 
OP patch 1 diff for pydot 1.4.1
diff --git a/pydot.py b/pydot.py
index 9c085863..7419d855 100644
--- a/pydot.py
+++ b/pydot.py
@@ -754,7 +754,2 @@ class Edge(Common):
 
-    def __hash__(self):
-
-         return hash( hash(self.get_source()) +
-                     hash(self.get_destination()) )
-
 
@@ -1559,7 +1554,7 @@ class Graph(Common):
                 if (self.obj_dict.get('simplify', False) and
-                        edge in edges_done):
+                        edge.obj_dict['points'] in edges_done):
                     continue
 
                 graph.append( edge.to_string() + '\n' )
-                edges_done.add(edge)
+                edges_done.add(edge.obj_dict['points'])
 

With OP patch 1, the MRE fails for an undirected simple graph, as can been seen in the diff-like lines in the MRE results below. In addition to a -- b, the reverse edge b -- a gets printed, even though the graph is undirected.

MRE results of OP patch 1 on both pydot 1.0.2 on Python 2.7.16 and pydot 1.4.1 on Python 2.7.16 or 3.7.3:

======================================================================
FAIL: test_graph_simplify (__main__.TestGraphAPI)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mre.py", line 160, in test_graph_simplify
    self.assertMultiLineEqual(expected_concat, observed_concat)
AssertionError: 'grap[121 chars]- b; }\ngraph_type digraph, simplify False: di[107 chars] }\n' != 'grap[121 chars]- b; b -- a; }\ngraph_type digraph, simplify F[115 chars] }\n'
  graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }
- graph_type graph, simplify True: graph G { a -- b; }
+ graph_type graph, simplify True: graph G { a -- b; b -- a; }
?                                                    ++++++++
  graph_type digraph, simplify False: digraph G { a -> b; a -> b; b -> a; b -> a; }
  graph_type digraph, simplify True: digraph G { a -> b; b -> a; }

Review of OP patch 1:

  • The reason the MRE fails in the undirected, simplified graph is because tuples from Edge.obj_dict['points'] store their two endpoints in a fixed order: First source, then destination. Point tuples from a forward and a reverse edge therefore get different hashes, even if they are between the same points. As a result, the points tuple for the reverse edge is not considered to be in edges_done already. At no point the graph's directedness (whether the graph is directed or undirected; undirected in this case) is considered. The result is something that looks very similar to a violation of the important Python requirement that "objects which compare equal have the same hash value", except in this case the hash does not come from the object itself. If the Edge objects were compared using Edge.__eq__(), the forward and reverse edge would be considered equal, but now that only the hashes of tuples of points are compared, this equality gets overlooked. Possible solutions for this problem:

    • In Graph.to_string(), if the parent graph is undirected, keep track of undirected edges in edges_done by re-sorting their end points tuple(sorted(edge.obj_dict['points'])) (not using frozenset because source and destination may be duplicates).
    • In Graph.to_string(), keep track of directed edges in edges_done. Then, when checking if a new edge is done already, first check if the parent graph is undirected and if so, check the new edge twice for membership of edges_done: Once forward and once reverse.

    I would prefer the second solution, because it is more resistant to intermediate mutations of the parent graph directedness. Although such mutations are not immediately likely with the local, temporary edges_done we are talking about here, I prefer a solution that is also portable to situations where a set of edges is kept around for a longer time.

  • Different behavior between Python 2 and Python 3 when hashing an Edge object. When an __eq__() method is defined but a __hash__() method not, in Python 2 __hash__() is inherited from the parent class, whereas in Python 3 it is implicitly set to None. Examples below. This problem can be resolved in four ways:

    • Inheriting the parent class __hash__() (the Python 2 default) would mean inheriting Python's default object.__hash__(), which bases its hash values on id(). However, our current (pydot 1.4.1) Edge.__eq__() method uses a much 'looser', value-based definition of equality between edges. Together, this means a violation of the Python requirement that "objects which compare equal must have the same hash value" and results in major problems with sets and dictionaries.

      Example using OP patch 1 without Edge.__hash__() on Python 2.7.16:

      >>> g = pydot.Graph()
      >>> e1 = pydot.Edge('a', 'b')
      >>> e2 = pydot.Edge('a', 'b')
      >>> g.add_edge(e1)
      >>> g.add_edge(e2)
      >>> e1 == e2
      True
      >>> hash(e1) == hash(e2)
      False  # Violates requirement that equal objects have equal hashes.
      >>> s = {e1}
      >>> e2 in s
      False  # Result should be True, because e1 and e2 compare equal.
    • With Edge.__hash__ = None (the Python 3 default), Edge objects will not be hashable, meaning they cannot be used as in dictionary key or in a set.

      Example using OP patch 1 without Edge.__hash__() on Python 3.7.3:

      >>> g = pydot.Graph()
      >>> e1 = pydot.Edge('a', 'b')
      >>> e2 = pydot.Edge('a', 'b')
      >>> g.add_edge(e1)
      >>> g.add_edge(e2)
      >>> e1 == e2
      True
      >>> hash(e1) == hash(e2)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: unhashable type: 'Edge'
      >>> s = {e1}
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: unhashable type: 'Edge'

      In Graph.to_string(), OP patch 1 uses tuples instead of Edge objects for set edges_done, so Edge not being hashable will not be a problem there, but there may be other places where our code or user's code will break when we now still remove that hashability. The only argument in favor would be that it solves the problem that Edge should not even been hashable given how mutable it currently is, but that problem could also be solved by addressing that mutability (to be further discussed in Add tests for Edge equality and hashing #249 or a specific issue/PR spun off from there).

    • We could remove both Edge.__eq__() and Edge.__hash__(), but that would mean losing the value-based equality comparison between edges that pydot offers now. This would be a breaking API change for other code in pydot or user's code.

      Example of pydot 1.4.1 without Edge.__eq__() and Edge.__hash__() on Python 3.7.3:

      >>> g = pydot.Graph()
      >>> e1 = pydot.Edge('a', 'b')
      >>> e2 = pydot.Edge('a', 'b')
      >>> g.add_edge(e1)
      >>> g.add_edge(e2)
      >>> e1 == e2
      False  # Lost the ability to do value-based equality comparison.
      >>> hash(e1) == hash(e2)
      False  # Hashes based on identity.
    • We could keep Edge.__eq__() and our current (pydot 1.4.1) Edge.__hash__(), but then the question becomes why we would not just keep using Edge for edges_done then as well instead of using point tuples as this patch proposes. The Edge instances are already there for string conversion anyway.

  • If there are benefits in hashing a tuple (e.g. because it is advised by Python, or because of better performance and more evenly distributed hash values), we can also reap those benefits by switching to hashing tuples from inside Edge.__hash__(), so that others can profit from it as well.

Conclusion: OP patch 1 is not an improvement over the current implementation. It does invite a critical look at our current implementation, to be discussed in PR #249 and issues/PRs linked to from there.

Patch 2 by the OP ^

The OP's second patch looks a bit more like the current (pydot 1.4.1) implementation. It renames elm in edges_done to edge in edges_done and also introduces an Edge.__hash__() method. However, a big difference with both the current implementation and OP patch 1 is that OP patch 2 also considers the graph directedness when generating a hash value.

def __hash__(self):
    if self.get_parent_graph().get_top_graph_type() == 'graph':
        return hash(min(self.get_source(), self.get_destination())
                + max(self.get_source(), self.get_destination()))
    return hash(self.get_source() + self.get_destination())
OP patch 2 diff for pydot 1.0.2 (including a small fix for `Edge.__eq__()`)
diff --git a/pydot.py b/pydot.py
index c2c81bcb..2093f8e8 100644
--- a/pydot.py
+++ b/pydot.py
@@ -826,3 +826,3 @@ class Edge(object,  Common ):
             if	( ( self.get_source()==edge.get_source() and self.get_destination()==edge.get_destination() ) or
-                ( edge.get_source() == get_destination() and self.get_destination() == edge.get_source() ) ):
+                ( edge.get_source() == self.get_destination() and self.get_destination() == edge.get_source() ) ):
                 return True
@@ -836,2 +836,7 @@ class Edge(object,  Common ):

+    def __hash__(self):
+        if self.get_parent_graph().get_top_graph_type() == 'graph':
+            return hash(min(self.get_source(), self.get_destination())
+                    + max(self.get_source(), self.get_destination()))
+        return hash(self.get_source() + self.get_destination())

@@ -1457,3 +1462,3 @@ class Graph(object, Common):
                 
-                if self.obj_dict.get('simplify', False) and elm in edges_done:
+                if self.obj_dict.get('simplify', False) and edge in edges_done:
                     continue
OP patch 2 diff for pydot 1.4.1
diff --git a/pydot.py b/pydot.py
index 9c085863..8301c408 100644
--- a/pydot.py
+++ b/pydot.py
@@ -755,6 +755,8 @@ class Edge(Common):
     def __hash__(self):
 
-         return hash( hash(self.get_source()) +
-                     hash(self.get_destination()) )
+        if self.get_parent_graph().get_top_graph_type() == 'graph':
+            return hash(min(self.get_source(), self.get_destination())
+                    + max(self.get_source(), self.get_destination()))
+        return hash(self.get_source() + self.get_destination())
 
 

With OP patch 2, the MRE tests all pass (Ok) on both pydot 1.0.2 on Python 2.7.16 and pydot 1.4.1 on Python 2.7.16 or 3.7.3:

  graph_type graph, simplify False: graph G { a -- b; a -- b; b -- a; b -- a; }
  graph_type graph, simplify True: graph G { a -- b; }
  graph_type digraph, simplify False: digraph G { a -> b; a -> b; b -> a; b -> a; }
  graph_type digraph, simplify True: digraph G { a -> b; b -> a; }

Review of OP patch 2:

  • Contrary to the Edge.__hash__() method we have now, the one suggested in this patch does consider graph directedness.

    • An advantage of this it that it prevents the hash collisions of reverse edges in case of undirected graphs that we have in the current implementation. This should increase the performance of set lookups, although the performance gain is probably minimal because each collision is only over two Edge objects (the forward and the reverse edge).

    • The main disadvantage of making the hash function depend on the parent graph directedness is that it (further) increases the mutability of the Edge object. In our current Edge.__hash__() implementation, we already have the problem of the hash depending on edge points that are mutable, but at least pydot does not provide any setter methods for that. But for the dependence on the parent graph, things are even worse, because pydot offers two setter methods for that: Edge.set_parent_graph() and Graph.set_type(). If that results in the edge mutating from having a directed parent graph to an undirected one, or vice versa, and the hash function depends on that directedness as this OP patch 2 proposes, the hash value will also change. This is a violation of the Python requirement that an object's hash value should never change and results in major problems with sets and dictionaries.

      Example using OP patch 2 on Python 3.7.3:

      >>> g = pydot.Graph(graph_type='graph')
      >>> e1 = pydot.Edge('b', 'a')
      >>> g.add_edge(e1)
      >>> hash(e1)
      8331574168740811722
      >>> str(e1)
      'b -- a;'
      >>> s = {e1}
      >>> [str(e) for e in s]
      ['b -- a;']
      >>> e1 in s
      True                  # So far, so good.
      >>> g.set_type('digraph')
      >>> hash(e1)
      8394800466976714966   # Object hash should never change.
      >>> str(e1)
      'b -> a;'             # e1 mutated, ...
      >>> [str(e) for e in s]
      ['b -> a;']           # even inside the set, ...
      >>> e1 in s
      False                 # but is not found because of the changed hash.

      This can be solved by:

      • Reducing the mutability of Edge and Graph classes, or making them completely immutable.

      • Removing the dependence on the parent graph directedness from Edge.__hash__() again, either by:

        • Edge.__hash__() always returning an "undirected" hash, which is actually the solution that was chosen for pydot 1.0.3 and is still in pydot today (1.4.2.dev0).
        • Edge.__hash__() always returning a "directed" hash. In that case, Edge.__eq__() would also need to drop the dependency on the parent graph and always compare edges as directed. Otherwise two reverse edges with an undirected parent graph would compare equal but not have the same hash, which would be a violation of the requirement that "objects which compare equal have the same hash value", resulting in other major problems with sets and dictionaries.

      Problems with mutability and dependence on the parent graph in the current implementation are further discussed in PR Add tests for Edge equality and hashing #249 or a specific issue/PR spun off from there.

    • Another disadvantage of making the hash function depend on the parent graph directedness is that edges will not have a hash value until they are appointed to a parent graph. For example:

      >>> g = pydot.Graph()
      >>> e1 = pydot.Edge('a', 'b')
      >>> hash(e1)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "pydot.py", line 757, in __hash__
          if self.get_parent_graph().get_top_graph_type() == 'graph':
      AttributeError: 'NoneType' object has no attribute 'get_top_graph_type'
      >>> g.add_edge(e1)
      >>> hash(e1)
      8331574168740811722

      Besides dropping the dependence on the parent graph directedness, a possible solution would be to assume a default edge directedness for edges without a parent graph.

  • Compared to the current implementation, OP patch 2 adds a source of hash collisions by using the + (addition) operator to concatenate the edge endpoints before hashing. Two examples:

    >>> g = pydot.Graph()
    >>> e1 = pydot.Edge(1, 2)
    >>> e2 = pydot.Edge(0, 3)
    >>> g.add_edge(e1)
    >>> g.add_edge(e2)
    >>> e1 == e2
    False
    >>> hash(e1) == hash(e2)
    True                       # Hash collision
    >>> g = pydot.Graph()
    >>> e1 = pydot.Edge('aa', 'bb')
    >>> e2 = pydot.Edge('a', 'abb')
    >>> g.add_edge(e1)
    >>> g.add_edge(e2)
    >>> e1 == e2
    False
    >>> hash(e1) == hash(e2)
    True                       # Hash collision

    These collisions can probably be prevented by replacing the + concatenation of endpoints by a tuple of endpoints, either sorted or unsorted depending on if directedness needs to be taken into account.

Conclusion: OP patch 2 is problematic in that it makes hashes depend on parent graph directedness without addressing the problem of mutability. It also adds some hash collisions. Still, it provides some inspiration for rethinking the current implementation.

Follow-up ^

With the bug already solved and the patches provided by the OP not immediately having clear advantages over the current implementation, I will be closing this issue.

The following PRs are spin-offs of this issue:

  • PR Add test for simple/simplified graphs #247: Adds the test containing the MRE to the test suite. It tests the output of Graph.to_string() for graphs with multiple (parallel) edges and combinations of simplify set to True or False and the graph being directed or undirected. Planned for pydot 1.4.2.
  • PR Make Edge not-equal comparison on PY2 same as PY3 #248: Though not directly related to this issue, I happened to come across a difference in edge non-equality comparisons (!= and Edge.__ne__()) between Python 2 and 3. Solution planned for pydot 1.4.2.
  • PR Add tests for Edge equality and hashing #249: Adds many new tests for Edge equality and hashing. Many of them are based on the experiences with pydot 1.0.2, OP patch 1 and 2 described here. Some tests point out issues with the current implementation and may be further discussed during the development of pydot 2.0.0.

Versions used for examples in this post, unless otherwise noted: pydot 1.4.1, Python 3.7.3, Debian 10 buster, Linux 4.19 amd64.

peternowee added a commit that referenced this issue Dec 16, 2020
This test tests the output of `Graph.to_string()` for graphs with
multiple (parallel) edges and combinations of `simplify` set to `True`
or `False` and the graph being directed or undirected.

This test was originally created for issue [#92][1]
("Simplify throws NameError, and, as implemented, doesn't work"). It
can also help catch regressions when doing future work on
`Graph.to_string()`, `Edge.__eq__()` or `Edge.__hash__()`.

[1]: #92
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants