Richard Chapman 1087f3282a Merge branch 'candidate-6.2.0' into candidate-6.4.0 8 jaren geleden
..
hiredis @ 010756025e a7b2383f07 HPCC-16661 Added libhiredis submodule 8 jaren geleden
CMakeLists.txt a7b2383f07 HPCC-16661 Added libhiredis submodule 8 jaren geleden
README.md 450160503d HPCC-14405 Redis Plugin - use cached connection for subscriptions 9 jaren geleden
lib_redis.ecllib b9871a00e1 HPCC-15887 Add time attribute to various standard library functions 8 jaren geleden
redis.cpp e9283413bc HPCC-15616 Redis plugin - UNWATCH WATCHed key in UNLOCK 9 jaren geleden
redis.hpp 16d67414b1 HPCC-8983 Add support for -fvisibility to reduce dll exports 8 jaren geleden

README.md

ECL Redis Plugin

This is the ECL plugin to utilize the persistent key-value cache Redis. It utilises the C API hiredis.

Installation and Dependencies

To build the redis plugin with the HPCC-Platform, libhiredis-dev is required.

sudo apt-get install libhiredis-dev

The redis server and client software can be obtained via either - binaries, source or the preferred method:

sudo apt-get install redis-server

Note: redis-server 2.6.12 or greater is required to use this plugin as intended. For efficiency, such version requirements are not checked as this is a runtime check only. The use of a lesser version will result in an exception, normally indicating that either a given command does not exist or that the wrong number of arguments was passed to it. The Set plugin functions will not work when setting with an expiration for a version less than 2.6.12. In addition, whilst it is possible to use Expire with a version less than 2.1.3 it is not advised due to the change in its semantics.

Note: The minimum version requirement for the API hiredis to allow for redis connections to be cached is 0.13.0.

Getting started

The server can be started by typing redis-server within a terminal. To run with a non-default configuration run as redis-server redis.conf, where redis.conf is the configuration file supplied with the redis-server package.

For example, to require the server to password authenticate, locate and copy redis.conf to a desired dir. Then locate and alter the 'requirepass' variable within the file. Similarly the server port can also be altered here. Note: that the default is 6379 and that if multiple and individual caches are required then they are by definition redis-servers on different ports.

The redis-server package comes with the redis client redis-cli. This can be used to send and receive commands to and from the server, invoked by redis-cli or, for example, redis-cli -p 6380 to connect to the redis-cache on port 6380 (assuming one has been started).

Perhaps one of the most handy uses of redis-cli is the ability to monitor all commands issued to the server via the redis command MONITOR. INFO ALL is also a useful command for listing the server and cache settings and statistics. Note: that if requirepass is activated redis-cli with require you to authenticate via AUTH <passcode>.

Further documentation is available with a full list of redis commands.

The Actual Plugin

The bulk of this redis plugin for ECL is made up of the various SET and GET commands e.g. GetString or SetReal. They are accessible via the module redis from the redis plugin ECL library lib-redis. i.e.

IMPORT redis FROM lib_redis;

Here is a list of the core plugin functions.

###Set

SetUnicode( CONST VARSTRING key, CONST UNICODE value, CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
SetString(  CONST VARSTRING key, CONST STRING value,  CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
SetUtf8(    CONST VARSTRING key, CONST UTF8 value,    CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
SetBoolean( CONST VARSTRING key, BOOLEAN value,       CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
SetReal(    CONST VARSTRING key, REAL value,          CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
SetInteger( CONST VARSTRING key, INTEGER value,       CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
SetUnsigned(CONST VARSTRING key, UNSIGNED value,      CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
SetData(    CONST VARSTRING key, CONST DATA value,    CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)

###Get

INTEGER8   GetInteger(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
UNSIGNED8 GetUnsigned(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
STRING      GetString(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
UNICODE    GetUnicode(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
UTF8          GetUtf8(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
BOOLEAN    GetBoolean(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
REAL          GetReal(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
DATA          GetData(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)

###Numeric

INTEGER8 INCRBY(CONST VARSTRING key, INTEGER8 value = 1, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)

###Utility

BOOLEAN Exists(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000)
FlushDB(CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
Delete(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
Persist(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
Expire(CONST VARSTRING key, CONST VARSTRING options, INTEGER4 database = 0, UNSIGNED4 expire, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)
INTEGER DBSize(CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN cacheConnections = TRUE)

###PUB-SUB

UNSIGNED Publish(CONST VARSTRING keyOrChannel, CONST STRING message, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN lockedKey = FALSE, BOOLEAN cacheConnections = TRUE)
STRING Subscribe(CONST VARSTRING keyOrChannel, CONST VARSTRING options, INTEGER4 database = 0, CONST VARSTRING password = '', UNSIGNED4 timeout = 1000, BOOLEAN lockedKey = FALSE, BOOLEAN cacheConnections = TRUE)

The core points to note here are:

  • There is a SET and GET function associated with each fundamental ECL type. These must be used for and with their correct value types! Miss-use should result in an runtime exception, however, this is only conditional on having the value retrieved from the server fitting into memory of the requested type. E.g. it is possible for a STRING of length 8, set with SetString, being successfully retrieved from the cache via GetInteger without an ECL exception being thrown.
  • CONST VARSTRING options passes the server IP and port to the plugin in the strict format - --SERVER=<ip>:<port>. If options is empty, the default 127.0.0.1:6379 is used. Note: 6379 is the default port for redis-server.
  • UNSIGNED4 timeout has units ms and has a default value of 1 second (0 := infinity). c.f. 'Timeout Values' below for advice on choosing appropriate values.
  • UNSIGNED expire has units ms and a default of 0, i.e. forever.
  • Both Publish and Subscribe have a flag BOOLEAN lockedKey = FALSE such, that when TRUE, will encode CONST VARSTRING keyOrChannel as if it were a key allowing key-channel encoding compatibility with the GetOrLockString and SetAndPublishString functions. For this reason, they both also take a Database value as this is used in the encoding of the lock and channel. Please note however that the redis pub-sub paradigm is actually irrespective of database.
  • c.f. redis documentation for the following - Exists, FlushDB, Delete, Persist, Expire, DBSize, Publish, & Subscribe.

###Connection Caching

To prevent unnecessary opening and closing of connections between subsequent functions calls, these connections are cached and reused. This is only true if the said connection is free of errors, otherwise it is closed and a new one opened. There are three cached instances per thread, storing a single connection for subscriptions, publishes, and then one for everything else. The caching of connections is only possible for versions of hiredis greater than the minimum version noted in the section Installation and Dependencies. The caching can be turned ON and OFF on a per function basis using the cacheConnections boolean passed as a function parameter. In addition, the following system environment setting HPCC_REDIS_PLUGIN_CONNECTION_CACHING_LEVEL can be set to:

  • 0 - force any & all caching OFF
  • 1 - allow caching of connections (default).
  • 2 - force any & all caching ON.
  • < 0 and < 2 - undefined.

This environment variable must be set for the user group hpcc and can be done by editing /etc/profile (service restart required).

Regarding the caching of connections used purely for subscriptions, it is essential that they are successfully unsubscribed before being reused. Such an attempt to unsubscribe and confirm this, is limited in effort to do so, otherwise giving up and simply closing the connection and opening a new one. To aid in the tuning of this for both payload and system/network constraints, the following three system environment settings exist:

  • HPCC_REDIS_PLUGIN_CACHE_SUB_CONNECTIONS - turn the caching of subscription connections ON or OFF. Default = ON
  • HPCC_REDIS_PLUGIN_UNSUBSCRIBE_READ_ATTEMPTS - the maximum number of socket reads to attempt to receive the required unsubscribe confirmation before giving up. Default = 2.
  • HPCC_REDIS_PLUGIN_UNSUBSCRIBE_TIMEOUT - the timeout value (ms) used for such socket reads. Default = 100.

Note: For further implementation details refer to the comment associated with the definition of SubConnection::unsubscribe(...) in HPCC-Platform/plugins/redis/redis.cpp.

###The redisServer MODULE To avoid the cumbersome and unnecessary need to constantly pass options, password, timeout, and cacheConnections with each function call, the module redisServer can be imported to effectively wrap the above functions.

IMPORT redisServer FROM lib_redis;
myRedis := redisServer('--SERVER=127.0.0.1:6379', 'foobared', 2000, FALSE);
myRedis.SetString('myKey', 'supercalifragilisticexpialidocious');
myRedis.GetString('myKey');

###A Redis 'Database' The notion of a database within a redis cache is that of a partition, such that it may contain an identical key per database e.g.

myRedis.SetString('myKey', 'foo', 0);
myRedis.SetString('myKey', 'bar', 1);

myRedis.GetString('myKey', 0);//returns 'foo'
myRedis.GetString('myKey', 1);//returns 'bar'

Note: that the default database is 0. The maximum number of databases allowed by Redis is 2147483647 (int32).

Race Retrieval and Locking Keys

A common use of external caching systems such as redis is for temporarily storing data that may be expensive, computationally or otherwise, to obtain and thus doing so only once is paramount. In such a scenario it is possible (in cases usual) for multiple clients/requests to hit the cache simultaneously and upon finding that the data requested has not yet been stored, it is desired that only one of such requests obtain the new value and then store it for the others to then also obtain (from the cache). This plugin offers a solution to such a problem via the GetOrLock and SetAndPublish functions within the redisServer and redis modules of lib_redis. This module contains only three function categories - the SET and GET functions for STRING, UTF8, and UNICODE (i.e. only those that return empty strings) and lastly, an auxiliary function Unlock used to manually unlock locked keys as it be discussed.

The principle here is based around a cache miss in which a requested key does not exist, the first requester (race winner) 'locks' the key in an atomic fashion. Any other simultaneous requester (race loser) finds that the key exists but has been locked and thus SUBSCRIBES to the key awaiting a PUBLICATION message from the race-winner that the value has been set. Such a paradigm is well suited by redis due to its efficiently implemented PUB-SUB infrastructure.

###An ECL Example

IMPORT redisServer FROM lib_redis

myRedis := redisServer('--SERVER=127.0.0.1:6379');

STRING poppins := 'supercalifragilisticexpialidocious'; //Value to externally compute/retrieve from 3rd party vendor.

myFunc(STRING key, INTEGER4 database) := FUNCTION  //Function for computing/retrieving a value.
  return myRedis.GetString(key, database);
END;

SEQUENTIAL(
    myRedis.SetString('poppins', poppins, 3),

    //If the key does not exist it will 'lock' the key and retrun an empty STRING.
    STRING value := myRedis.GetOrLockString('supercali- what?');
    //All SetAndPublish<type>() return the value passed in as the 2nd parameter.
    IF (LENGTH(value) == 0, myRedis.SetAndPublishString('supercali- what?', myFunc('poppins', 3)), value);
    );

Note: further ECL examples can be found in the following files regarding the locking and non-locking functions.

###Timeout Values The timeout durations are effectively for the entire duration of a call to each of the functions exported by this plugin library. By 'effectively', it is meant that a timer is initiated at the start of each call and upon each internal communication with the redis server, any time remaining (at this point) is the timeout value passed to the redis API (hiredis) for that communication call. Since some plugin functions make more calls to the server than others (c.f. 'Behaviour and Implementation Details' below) it is possible for those functions with more server calls to timeout more regularly than those with less. To avoid this, it is advised to set the timeouts to a multiple of the anticipated latency of the client-server-IO, where such multiple should be at least the maximum expected number of internal redis calls made by these plugin functions, e.g. 12.

When using the ECL pattern described in the above section An ECL Example, it is required to set the timeout and lock expiration to be equal to the timeout (if any) of myFunc + that passed to SetAndPublish<type>, such that both the lock and waiting subscribers live long enough for a value to be set/published.

It should also be noted that, whilst it is possible to set different values for timeout and expire to the function GetOrLock<type>, it is advisable not to. This is such that the lock does not out live all waiting subscribers that collectively timeout and thus not blocking any subsequent retries of the locking pattern.

Behaviour and Implementation Details

A few notes to point out here:

  • PUB-SUB channels are not disconnected from the keyspace as they are in their native redis usage. The key itself is used as the lock with its value being set as the channel to later PUBLISH on or SUBSCRIBE to. This channel is a string, unique by only the key and database, prefixed with 'redis_ecl_lock'. E.g. the key 'myKey' designated for database 1, will have the following lock id - 'redis_ecl_lock_myKey_1'.
  • It is possible to manually 'unlock' this lock (DELETE the key) via the Unlock(<key>) function. Note: this function will fail on any communication or reply error however, it will silently fail, leaving the lock to expire, if the server observes any change to the key during the function call duration.
  • When the race-winner publishes, it actually publishes the value itself and that any subscriber will then obtain the key-value in this fashion. Therefore, not requiring an additional GET and possible further race conditions in doing so. Note: This does however, mean that it is possible for the actual redis SET to fail on one client/process, have the key-value received on another, and yet, the key-value still does not exist on the cache.
  • At present the 'lock' is not as such an actual lock, as only the locking functions acknowledge it. By current implementation it is better thought as a flag for GET to wait and subscribe. I.e. the locked key can be deleted and re-set just as any other key can be.
  • Below is a table of the number of calls to the redis server for each of the plugin functions (or categories of) including the maximum possible and nominal expected, where the latter is due to using a cached connection, i.e. neither the server IP, port, nor password have changed from the function called prior to the one in question.
Operation/Function Nominal Maximum Diff due to...
A new connection 3 4 database
Cached connection 0 2 database, timeout
Get 1 5 new connection
Set 1 5 new connection
FlushDB 1 5 new connection
Delete 1 5 new connection
Persist 1 5 new connection
Exists 1 5 new connection
DBSize 1 5 new connection
Expire 1 5 new connection
GetOrLock 7 11 new connection
GetOrLock (locked) 8 12 new connection
SetAndPublish (value length > 29) 1 5 new connection
SetAndPublish (value length < 29) 4 8 new connection
Unlock 5 9 new connection
Publish 1 4 new connection
Subscribe 5 5 new connection always needed