Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add namespace support #710

Closed
wants to merge 0 commits into from
Closed

Add namespace support #710

wants to merge 0 commits into from

Conversation

exit99
Copy link

@exit99 exit99 commented Jan 25, 2016

Allows namespacing based on proposed syntax in this issue.

r = redis.Redis(namespace='foobar')

Changes:

  • redis.Redis uses redis.namespace.NamespaceWrapper class to format requests to and responses from redis.
  • Contains passing tests for all commands available on the redis.Redis class.

@andymccurdy
Copy link
Contributor

Hi @kazanz,

Thanks for taking the time to put this together. I have a number of comments:

  1. Storing any state on the client instance breaks parallel code execution. redis-py is designed so that a client instance can be shared across threads or coroutines. Storing the ns instance on the client between the request/response cycle will break when multiple threads or coroutines attempt to use the same client. (e.g., Thread1 executes command GET, Thread2 executes command SMEMBERS, Thread1 parses the response and encounters the namespace instance that Thread2 created.) Instead, the communication from execute_command to parse_response should likely be a formalized argument to parse_response. (Note: I haven't looked closely at how this would work with pipelines and pubsub clients.)

  2. Speaking of pipeline and pubsub clients, they both override execute_command and parse_response. From quickly glancing at your patch, it doesn't look like any of the namespace code will work with when using pipelines or pubsub. As a user, I'd expect that if I specified a namespace on my client that keys that I use within a pipeline created from that client would get namespaced. Similarly, I think (but I'm not 100% convinced) that channels or patterns I choose to listen to with a pubsub connection should be namespaced as well.

  3. I'd also expect keys passed to Lua scripts to get namespaced. Of course this gets tricky when you think about literal key names specified in the Lua script itself rather than being passed in at runtime. My current thought is that if you're hardcoding a key name in the Lua script then we can't really do anything about that. But we most certainly can namespace any value passed into the keys arg of a callable created via register_script.

  4. The namespace argument and the modifications to the execute_command and parse_response functions should be on the StrictRedis class, not the Redis class. Currently only users of the old Redis class will be able to enjoy this feature.

  5. Copying and pasting all the command tests makes maintenance a nightmare. There's a couple options here:

    I generally like the second option because the test suite already takes 5+ seconds to run.

  6. Keeping a separate representation of how commands are structured in the namespace.py file also seems like a maintenance problem. Someone adding a new command or modifying an existing command needs to remember to also make sure CMDS is in sync. Perhaps we could use a decorator on each command function to describe its arguments? At least then the code and the spec would be in the same location in the source code.

  7. How should the KEYS command work with iteration from a user experience? Or any data structure that currently stores key names. For example, I bet there's lots of code out there that does this:

    r = redis.StrictRedis(...)
    for key in r.blpop():
       data = r.get(key)
       # do something with data
    
    ### OR
    
    for key in r.keys('myapp:*'):
       data = r.get(key)

    We need to be very clear to users that there may be application code changes required if they wish to enable namespace support when they have code like this in an existing application.

  8. I'm not really clear on the purpose of the Namespace.format_response method and the methods it calls (remove_response and clean_response). Is this intended to be a solution for the previous point? If so, what happens if I happen to be using namespace support with namespace='foo:' and I have a key bar with a value of foo:something. What if foo: in this case wan't a namespace and the value of the key just happened to have foo: in front? It seems like my app would just lose data.

    Further, no attempt is made to respect the encoding option where a user can specify what character set set values are stored in. remove_namespace blindly calls response.decode() and then hides errors should they be raised.

  9. There are several calls to print() in the namespace.py module. I'm sure you didn't mean to check those in. :)

  10. I was curious about the performance aspects of your approach. I ran a very rudimentary benchmark using the following code:

    In [1]: import redis, time
    In [2]: r = redis.Redis()
    In [3]: nsr = redis.Redis(namespace='zzz')
    In [4]: def test(client, num):
       start = time.time()
       for i in range(num):
           _ = client.get('foo')
       return time.time() - start
    
    In [5]: test(r, 10000)
    Out[5]: 3.865340995788574
    In [6]: test(nsr, 10000)
    Out[6]: 4.4781811237335205
    
    In [7]: def test(client, num):
       start = time.time()
       for i in range(num):
           _ = client.zadd('zfoo', a=1, b=2, c=3)
       return time.time() - start
    
    In [8]: test(r, 10000)
    Out[8]: 4.354082822799683
    In [9]: test(nsr, 10000)
    Out[9]: 5.616786956787109

    This suggests that using a namespace adds 15% overhead in the best case scenario (a simple GET command) and 28% overhead in one of the most complex examples (the ZADD command). That's quite a hefty performance penalty to pay IMO.

@exit99
Copy link
Author

exit99 commented Jan 27, 2016

Hey @andymccurdy,

Thank you for the feedback, all valid points. I have a few business related projects I have to work on now, but I will chip away at this when I can and update you on the progress.

@exit99 exit99 force-pushed the master branch 4 times, most recently from 0b0bc22 to adec138 Compare May 26, 2016 17:58
@exit99
Copy link
Author

exit99 commented May 26, 2016

Hello @andymccurdy

Went through and overhauled the PR based on your suggestions. All functions on StrictRedis, Redis, Pubsub, Script, and Pipeline have namespacing support. Tested manually majority of the functions and wrote string based tests for all the namespacing functionality as suggested in concern 5.

Took care of concerns 1, 2, 4, 6 by initializing the redis/strictredis/pubsub/pipeline classes with a namespace argument and wrapping all the functions that call out to redis with a custom namespace_format decorator. (also simplified a lot of the logic from the last commit). The namespace_format decorator has a few kwargs to specifiy which args being passed to execute_command need the namespace appended. There is also a cmd_execution_wrapper for when we need to decorate execute_command after a function manipulates kargs before using them in execute_command (see the mset function).

I have yet to come up with a elegant solution to 7, when we blpop there is no guarentee what we are popping is keys or values, as it is appliaction dependent. Perhaps it should default to removing the namespace and a kwarg can be added to those functions to disable namespace removal. Either way have a comment in the readme warning users. Thoughts?

In answer to 8. Methods like SCAN return a list of keys. From a user experience standpoint, it should only return keys relevant to the namespace. For these functions that can return only keys, we make sure only namespaced keys are returned and then call remove_namespace on them to ensure that they are reusable with the current namespaced Redis/StrictRedis instance.

9: Removed the prints :)

10: Here is simple benchmarking:

In [5]: import time

In [6]: params = {"host": "localhost", "port": 6379, "db": 9}

In [7]: import redis

In [8]: sr, sr2 = (redis.StrictRedis(namespace="namespace:", **params), redis.StrictRedis(**params))

In [9]: def test(client, num):
   ...:     start = time.time()
   ...:     for i in range(num):
   ...:         _ = client.get('foo')
   ...:     return time.time() - start
   ...:

In [10]: test(sr, 10000)
Out[10]: 11.770411014556885

In [11]: test(sr2, 10000)
Out[11]: 14.914319038391113

In [12]: def test(client, num):
   ....:     start = time.time()
   ....:     for i in range(num):
   ....:         _ = client.zadd('zfoo', a=1, b=2, c=3)
   ....:     return time.time() - start
   ....:

In [13]: test(sr, 10000)
Out[13]: 3.7007460594177246

In [14]: test(sr2, 10000)
Out[14]: 6.573690891265869

In [15]: test(sr, 10000)
Out[15]: 5.087170839309692

In [16]: test(sr2, 10000)
Out[16]: 7.857127904891968

A little confused on comment 3? the register_script func takes on arg the script. Either I am misunderstanding or perhaps you meant the eval func as it gets called via the Script class? If that is the case, we are formatting any keys passed to the script.

Let me know what you think and any potential improvements you think can be made.

Thanks

@andymccurdy
Copy link
Contributor

Thanks @Kazanz I'll find some time today or tomorrow to review.

@andymccurdy
Copy link
Contributor

@Kazanz Thanks for the updated patch. I think we're getting closer. A couple questions:

  1. Why do we care about the lex argument in namespace_format (https://github.com/andymccurdy/redis-py/pull/710/files#diff-b5290622cf3c90f20731acb922fb40f4R14)? It seems like the values in the Z_LEX_ commands you've specified lex=True on aren't key names, they're values within the sorted set. I don't see why those should be namespaced. Am I missing something?
  2. I see you're passing self.namespace to the pipeline objects, but the decorators are still relying on the pipeline_instance.<redis_command_name> function to return a response value (https://github.com/andymccurdy/redis-py/pull/710/files#diff-b5290622cf3c90f20731acb922fb40f4R41). response in this case will be an instance of the pipeline class, not the result of the command as the command hasn't been executed yet. Again, unless I'm missing something :)

I'm going to take a stab at cleaning a few things up and hopefully get back to you tonight or tomorrow.

Thanks!

@exit99
Copy link
Author

exit99 commented Jun 14, 2016

@andymccurdy Thanks for getting back, To answer your questions

#1 I could be missing something as well, so please correct me if I am wrong. Take ZRANGEBYLEX command for example:

redis> ZADD myzset 0 a 0 b 0 c 0 d 0 e 0 f 0 g
(integer) 7
redis> ZRANGEBYLEX myzset - [c
1) "a"
2) "b"
3) "c"

When I do a ZADD the keys a, b, c etc are getting namespaced to namespace:a etc. So If i try to do a ZRANGEBYLEX myzset - [a it will try to do a lookup of a which will fail. The way the ZRANGEBYLEX function is structured is we namespace the a with the regular decorator it will look up the key namespace:[a which will also fail. The lex arg lets the decorator know to put any namespace after the [ so we get the key [namespace:a which will succeed.

#2 And I may be misunderstanding the question, so correct me if I am wrong or not addressing the appropriate issue. When the pipeline object calls set, for example, it still returns a decorated function. The only difference is the decorated function is immediately executed on the StrictRedis and Redis classes, whereas the pipeline waits for the execute command to be run. The results are still same Code example of working pipeline:

In [1]: from redis import Redis
In [2]: params = {"host": "localhost", "port": 6379, "db": 9}
In [3]: r = Redis(namespace="namespace:", **params)
In [4]: pipeline = r.pipeline()
In [5]: r2 = Redis(**params)
In [6]: pipeline.set("foo", "bar").sadd("faz", "baz").incr("auto_number").execute()
In [7]: r2.get('foo')  # no namespace

In [8]: r.get('foo')  # namespace
Out[8]: 'bar' 
In [9]: p = pipeline.set('jar', 'jelly')
In [10]: p.execute()
In [11]: r2.get('jar')  # no namespace

In [12]: r.get('jar')  # namespace
Out[12]: 'jelly'

@andymccurdy
Copy link
Contributor

  1. ZADD only takes one key. In your example, myzset should be namespaced. But the individual elements (a, b, c, d, e, f) should not be namespaced.
  2. I should have been more specific about the pipelines. Using the KEYS or SCAN commands within a pipeline will currently fail. The decorator tries to un-namespace the responses of commands that have resp_format=True. But when these commands are being used within a pipeline, the response value is the pipeline instance, not the result of the command.

@exit99
Copy link
Author

exit99 commented Jun 16, 2016

@andymccurdy

Made changes as per the last suggetion. Lex was removed as it is was not needed.

Now, when a response is returned in the namespace_format decorator, if the response needs namespacing removed and is a pipeline object, we add a remove_namespace option to the command in the command_stack. BasePipeline's _execute_pipeline and _execute_transaction functions are decorated with the new remove_namespace_pipeline_wrapper that removes the namespace of the results of the commands with the remove_namspace option.

Working code example:

In [1]: from redis import Redis

In [2]: p1 = Redis(**{
                "namespace": "namespace:", "host": "localhost", "port": 6379, "db": 9
        }).pipeline()

In [3]: p1.set('a', '1').set('b', '2').keys().execute()
Out[3]: [True, True, ['b', 'a']]

In [4]: p2 = Redis(**{"host": "localhost", "port": 6379, "db": 9}).pipeline()

In [5]: p2.set('c', '3').set('d', '4').keys().execute()
Out[5]: [True, True, ['d', 'c', 'namespace:b', 'namespace:a']]

Let me know if there is other improvements we need.

@exit99
Copy link
Author

exit99 commented Jun 27, 2016

@andymccurdy Just following up with you to see if there is you got a chance to view latest commit, and if more modifications are needed. Thanks.

@fritzstauff
Copy link

Is this something that will be implemented soon? We are very interested in using it.

@Nick011
Copy link

Nick011 commented Jan 12, 2017

@andymccurdy we're using this in production. works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants