浏览代码

HPCC-15767 Start merging changes made for volatile variables

Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Gavin Halliday 9 年之前
父节点
当前提交
6f4ee9d133

+ 197 - 0
ecl/hql/hqlexpr.cpp

@@ -125,6 +125,203 @@ static int checkSeqId(unsigned __int64 seqid, unsigned why)
 
 #define STDIO_BUFFSIZE 0x10000     // 64K
 
+//---------------------------------------------------------------------------------------------------------------------
+
+/*
+
+There is a general issue with ECL (and other functional/declarative languages) about what to do with impure functions.
+Generally it is assumed that expressions can be evaluated on demand, evaluated more than once, not evaluated,
+evaluated in a different place, and that it will not affect the result of the query.  There are some expressions
+that don't follow those rules and cause problems.
+The following aims to describe the issues, and formalize the behaviour@
+
+Different impure modifiers
+- VOLATILE indicates that an expression may return a different value each time it is called.
+  E.g.,  RANDOM(), msTick()
+  Because volatile expressions return a different value each time, by default they are tagged as context
+  sensitive - to try and ensure they are evaluated in the same place as they were used in the source code.  So
+  volatile is expanded as two separate modifiers - NODUPLICATE and CONTEXT
+- CONTEXT indicates the value returned depends on the context (but is non-volatile within that context)
+  E.g., std.system.thorlib.node(), XMLTEXT
+- THROWS indicates an expression might throw an exception.
+  IF (cond, value, FAIL)
+- SKIPS indicates an expression may cause a transform to skip.
+- COSTLY The operation is expensive, so should not be duplicated.
+  E.g., Some PIPE/SOAPCALLs, external function calls.
+  A first step towards introducing a cost() function - where costly = cost(+inf)
+- EFFECT indicates the expression may have a side-effect.  The side-effect is tied to the expression that it is
+  associated with.  This only really has implications for ordering, which we currently make no guarantees about.
+
+Pseudo modifier:
+- once [ Implies pure,fold(false) ]
+- action indicates the expression performs a specific (costly?) action.  Equivalent to COSTLY+EFFECT
+- volatile.  Really a combination of NODUPLICATE and CONTEXT
+
+What decisions do the flags affect?
+
+canRemoveEvaluation()   - Is it ok to not evaluate an expression?
+canReduceEvaluations()  - Is it possible to reduce the number of times something is evaluated?
+canDuplicateExpr()      - Whether an expression can be duplicated.
+canChangeContext()      - Whether an expression can be moved to a different context.
+canRemoveGuard()        - Is it ok to evaluate this expression without any surrounding conditions?
+isVolatile()            - Whether an expression always generates the same value. (E.g., for matching distributions)
+canBeCommonedUp()       - Is it ok to evaluate two instances of the same expression only once?
+canBeReordered()        - Is it possible to reorder evaluation?
+
+How do these decisions relate to the modifiers?
+
+canRemoveEvaluation()
+- the whole system is based around lazy evaluation.  Nothing restricts an expressions from not being evaluated.
+
+canReduceNumberEvaluations()
+- noduplicate... yes
+  Say you have a counter which is assigned to rows in a dataset, and one row is then selected.  If only that single row
+  is calculated you will get a different result.  However lazy evaluation should ensure that is ok, just unexpected.
+  The context may also require checking for duplication if the dataset is shared...
+- otherwise - yes.
+  i.e. *all* expressions are lazy - there are no guarantees that an expression will be evaluated.
+
+canDuplicateExpr()
+- noduplicate - no since that will introduce an inconsistency.  This means volatile rows can only be selected
+  from a dataset if it is the only use of the dataset.
+- context - yes if same context.
+- throws - yes
+- skips - yes
+- costly - no
+- effect - yes
+
+canChangeContext/canHoist
+- noduplicate - yes.  (volatile would also set context, implying no since that may change the number of times something is executed).
+- context - no
+- throws - safer to say no.  What if it causes something to fail because of early evaluation?
+           Better would be to allow it, but only report the error if it is actually used.  This has implications for
+           the way results are stored in the workunit, and the implementation of the engines.
+- skips - no (but skips doesn't percolate outside a transform)
+- costly - yes if unconditional. no if conditional - we don't want it evaluated unnecessarily.
+- effect - yes - it is the expression that is important.
+
+canRemoveGuard (make something unconditional that was conditional)
+- noduplicate - possibly/yes.  It would be better to always evaluate than to evaluate multiple times.  The context is handled separately.
+- context - yes.
+- throws - no since it causes failures that wouldn't otherwise occur
+- skips - no, it could records to be lost.
+- costly - no by definition.
+- effect - yes.
+
+isVolatile()
+- Only set if the expression is volatile.  Equivalent to !canDuplicateExpr()
+
+canBeCommonedUpBetweenContexts()
+- noduplicate - This is explicitly managed by ensuring each volatile expression has a unique attribute associated with it.
+  It means that different instances of a volatile expression in different transforms must have different ids
+  so that combining transforms doesn't cause them to be combined.
+- context - ?no.  The same value evaluated in a different context will give a different value.
+- throws - yes
+- skips - yes
+- costly - yes.
+- effect - yes
+
+canCombineTransforms(a,b)
+- all - yes
+- provided volatile expressions are unique there shouldn't be any problems combining them.
+- still need to be careful about SKIPs having a different meaning in the combined transform.
+
+canBeReordered()
+ - we currently make no guarantees about the order that expressions are evaluated in, other than with
+   the SEQUENTIAL keyword, and implicit ordering of rows supplied to APPLY/OUTPUT.  Restricting the order would
+   cause significant issues with optimization (e.g., executing on multiple nodes, or strands within a channel). It
+   would require something similar to Haskell monads to impose some global ordering.
+
+Reducing the context dependency of expressions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The operator no_within(expression, context) has the effect of removing all the context-dependent attributes, and adds
+any dependencies from the context instead.  (Any explicit dependencies of the expression are also kept.)
+Note:  WITHIN can only be used to *reduce* the context-dependency.
+
+RANDOM() WITHIN {LEFT} - indicate the context for calling random.
+
+What should be the scope/extent of their effects?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Theoretically, each of these "impure" attributes are inherited by any expression that uses them.  However that can
+be too conservative, so the following limits are placed on their scope:
+
+noduplicate - always
+context     - always, except for within().  ?possibly not outside a transform, or dataset/action excluding its inputs.
+costly      - always
+action
+throwscalar - not outside transform/filter
+throwds     - not outside transform.  Not outside the action (e.g., output) that consumes it.
+skip        - no outside transform
+
+
+**************** THIS NEEDS MORE THOUGHT WORK - probably inspired from the examples *************************
+
+- A sink (e.g., OUTPUT), row selector ([]), or scalar aggregate (e.g., count(ds)) that is applied to a noduplicate dataset isn't itself noduplicate.
+- A sink (e.g., output) applied to a volatile expression isn't itself volatile.
+- An aggregate is not volatile if the scalar argument is volatile
+- Attributes are not volatile if their arguments are
+- ??? An activity that contains a volatile scalar item isn't itself volatile?  E.g., ds(id != RANDOM()).  I'm not convinced.
+
+For example this means IF(cond, ds, FAIL) will be context dependent.  But the activity (e.g, OUTPUT) that is based on it is not.  The entire OUTPUT could be evaluated elsewhere (e.g., in a parent context) if there are no other dependencies on the context.
+
+I would be inclined to use the same rule for context sensitive expressions and exceptions.
+
+Essentially the rule is:
+- the impure flags are not inherited from a transform
+- actions and attributes inherit no impure flags.  (They could possibly have them set explicitly.)
+
+What makes a unique volatile instance?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- Each instance in the original ECL source code creates a unique instance.
+- Each expansion of a macro counts as new source instance.
+- A call to a function containing a volatile should not create new instances.
+- It is possible to mark functions as volatile, so that each call creates a new unique instance of
+  any volatiles within it.
+
+So RANDOM() - RANDOM() should evaluate two random numbers,
+and x:= RANDOM(); x - x; should always evaluate to 0.
+
+So unique volatile identifiers are added to
+- volatile builtin operators (e.g, RANDOM())
+- volatile c++ functions
+- volatile external functions
+and contained volatile modifiers are made unique if a functional definition is specified as volatile.
+
+Modifiers on external functions and beginc++
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- pure
+- action
+- costly
+- once = pure, runtime only
+- volatile = nodup, context
+- nomove = context dependent
+- context(xxx)?
+- fail
+
+Context Dependent:
+~~~~~~~~~~~~~~~~~~
+There are several different flags to indicate context dependent:
+
+HEFgraphDependent - loop counter (?) graph result, (parameter!) - should probably use a pseudo table
+HEFcontainsNlpText - should use a pseudo table
+HEFcontainsXmlText - should use a pseudo table
+HEFcontainsSkip
+HEFcontainsCounter  - should use a pseudo table
+HEFtransformDependent - SELF, count(group)
+HEFtranslated
+HEFonFailDependent - FAILCODE/FAILMESSAGE
+HEFcontextDependentException - fields, pure virtual  [nohoist?]
+HEFoldthrows - legacy and should be killed
+
+Other related syntax
+~~~~~~~~~~~~~~~~~~~~
+PURE(expression) - treat an expression as pure - probably superseded with WITHIN {}
+*/
+
+//---------------------------------------------------------------------------------------------------------------------
+
 class HqlExprCache : public JavaHashTableOf<IHqlExpression>
 {
 public:

+ 1 - 1
ecl/regress/volatile.ecl

@@ -36,7 +36,7 @@ now := Debug.msTick();
 output(startTime*startTime-now*now);
 
 
-nowTime() := define Debug.msTick();
+nowTime() volatile := define Debug.msTick();
 
 //Evaluate nowTime twice
 output(startTime*startTime-nowTime()*nowTime());

+ 50 - 0
ecl/regress/volatile1.ecl

@@ -0,0 +1,50 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+
+// The following assume RANDOM() aren't strictly correct, but assume it is unlikely for RANDOM() to return the
+// same number twice in succession.
+
+//Random should be re-evaluated for each unique instance.
+output(IF(random() != random(), 'Pass', 'Fail'));
+
+//There is only a single instance of this variable - it should not be re-evaluated
+volatile1 := random();
+output(IF(volatile1 = volatile1, 'Pass', 'Fail'));
+
+//Again, re-evaluating the function should not create a new value
+volatile2() := random();
+output(IF(volatile2() = volatile2(), 'Pass', 'Fail'));
+
+//Again, no reason for the random in the function to be re-evaluated.
+volatile3(integer x) := random() % x;
+output(IF(volatile3(100) % 50 = volatile3(50), 'Pass', 'Fail'));
+
+//Explicitly create a unique volatile instance for each call instance - even if the same parameters
+volatile4(integer n) volatile := random();
+output(IF(volatile4(100) != volatile4(100), 'Pass', 'Fail'));
+output(IF(volatile4(100) != volatile4(99), 'Pass', 'Fail'));
+
+//Create a unique instance for each value of n
+volatile5(integer n) := volatile4(n);
+output(IF(volatile5(1) != volatile5(2), 'Pass', 'Fail'));
+output(IF(volatile5(5) = volatile5(5), 'Pass', 'Fail'));
+
+//Create a unique volatile instance each time the function is called.
+volatile6() volatile := random() % 100;
+output(IF(volatile6() != volatile6(), 'Pass', 'Fail'));

+ 53 - 0
ecl/regress/volatile2.ecl

@@ -0,0 +1,53 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+rtl := SERVICE
+ unsigned4 rtlRandom() : eclrtl,volatile,library='eclrtl',entrypoint='rtlRandom';
+END;
+
+// The following assume RANDOM() aren't strictly correct, but assume it is unlikely for RANDOM() to return the
+// same number twice in succession.
+
+//Random should be re-evaluated for each unique instance.
+output(IF(rtl.rtlRandom() != rtl.rtlRandom(), 'Pass', 'Fail'));
+
+//There is only a single instance of this variable - it should not be re-evaluated
+volatile1 := rtl.rtlRandom();
+output(IF(volatile1 = volatile1, 'Pass', 'Fail'));
+
+//Again, re-evaluating the function should not create a new value
+volatile2() := rtl.rtlRandom();
+output(IF(volatile2() = volatile2(), 'Pass', 'Fail'));
+
+//Again, no reason for the random in the function to be re-evaluated.
+volatile3(integer x) := rtl.rtlRandom() % x;
+output(IF(volatile3(100) % 50 = volatile3(50), 'Pass', 'Fail'));
+
+//Explicitly create a unique volatile instance for each call instance - even if the same parameters
+volatile4(integer n) volatile := rtl.rtlRandom();
+output(IF(volatile4(100) != volatile4(100), 'Pass', 'Fail'));
+output(IF(volatile4(100) != volatile4(99), 'Pass', 'Fail'));
+
+//Create a unique instance for each value of n
+volatile5(integer n) := volatile4(n);
+output(IF(volatile5(1) != volatile5(2), 'Pass', 'Fail'));
+output(IF(volatile5(5) = volatile5(5), 'Pass', 'Fail'));
+
+//Create a unique volatile instance each time the function is called.
+volatile6() volatile := rtl.rtlRandom() % 100;
+output(IF(volatile6() != volatile6(), 'Pass', 'Fail'));

+ 52 - 0
ecl/regress/volatile3.ecl

@@ -0,0 +1,52 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+//Should be re-evaluated for each unique call.
+output(IF(nextSequence() != nextSequence(), 'Pass', 'Fail'));
+
+//There is only a single instance of this variable - it should not be re-evaluated
+volatile1 := nextSequence();
+output(IF(volatile1 = volatile1, 'Pass', 'Fail'));
+
+//Again, re-evaluating the function should not create a new value
+volatile2() := nextSequence();
+output(IF(volatile2() = volatile2(), 'Pass', 'Fail'));
+
+//Again, no reason for the random in the function to be re-evaluated.
+volatile3(integer x) := nextSequence() % x;
+output(IF(volatile3(100) % 50 = volatile3(50), 'Pass', 'Fail'));
+
+//Explicitly create a unique volatile instance for each call instance - even if the same parameters
+volatile4(integer n) volatile := nextSequence();
+output(IF(volatile4(100) != volatile4(100), 'Pass', 'Fail'));
+output(IF(volatile4(100) != volatile4(99), 'Pass', 'Fail'));
+
+//Create a unique instance for each value of n
+volatile5(integer n) := volatile4(n);
+output(IF(volatile5(1) != volatile5(2), 'Pass', 'Fail'));
+output(IF(volatile5(5) = volatile5(5), 'Pass', 'Fail'));
+
+//Create a unique volatile instance each time the function is called.
+volatile6() volatile := nextSequence() % 100;
+output(IF(volatile6() != volatile6(), 'Pass', 'Fail'));

+ 41 - 0
ecl/regress/volatile4.ecl

@@ -0,0 +1,41 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+#option ('targetClusterType','hthor');
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+ds := dataset(10, transform({unsigned id}, SELF.id := nextSequence()));
+
+//Check the call to nextSequence isn't evaluated outside the dataset
+min1 := min(ds, id);
+output(IF(min1 = 1, 'Pass', 'Fail'));
+
+//Check the dataset isn't ev-evaluated
+min2 := min(ds, id-1);
+output(IF(min2 = 0, 'Pass', 'Fail'));
+
+max1 := max(ds, id);
+output(IF(max1 = 10, 'Pass', 'Fail'));
+
+//Check selecting a value from the dataset also doesn't re-evaluate the value.
+output(IF(ds[5].id = 5, 'Pass', 'Fail'));

+ 54 - 0
ecl/regress/volatile5.ecl

@@ -0,0 +1,54 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+ds1 := dataset(10, transform({unsigned id}, SELF.id := random());
+ds2 := dataset(random(), transform({unsigned id}, SELF.id := random());
+
+//A call to random() should be volatile
+output(IF(__IS__(random(), volatile), 'Pass', 'Fail'));
+
+//A call to a function containing a volatile is volatile - even if it doesn't create a new instance
+myValue() := random();
+output(IF(__IS__(myValue(), volatile), 'Pass', 'Fail'));
+
+//An output of a volatile value is not volatile
+o1 := output(random());
+output(IF(NOT __IS__(o1, volatile), 'Pass', 'Fail'));
+
+//A dataset with a volatile inside a transform is not itself volatile
+output(IF(NOT __IS__(ds1, volatile), 'Pass', 'Fail'));
+
+//But a volatile used in another context is - a volatile count
+output(IF(__IS__(ds2, volatile), 'Pass', 'Fail'));
+
+//or a volatile filter is also volatile.
+output(IF(__IS__(ds1(id = RANDOM(), volatile), 'Pass', 'Fail'));
+
+//Does this make sense - moving a filter over a project could possibly make the whole dataset volatile when it wasn't before.
+ds3 = DATASET(10, TRANSFORM({unsigned id}, SELF.id := COUNTER));
+ds4 := PROJECT(NOFOLD(ds3), TRANSFORM({unsigned id}, SELF.id := RANDOM());
+ds5 := ds4(id != 10);
+output(IF(NOT __IS__(ds5, volatile), 'Pass', 'Fail'));
+
+//An aggregate of a volatile value isn't volatile
+v1 := max(ds1, id * RANDOM());
+output(IF(NOT __IS__(v1, volatile), 'Pass', 'Fail'));
+
+//But of a dataset is.
+v2 := max(ds2, id);
+output(IF(__IS__(v2, volatile), 'Pass', 'Fail'));

+ 42 - 0
ecl/regress/volatile6a1.ecl

@@ -0,0 +1,42 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+//instance 1
+// used within a child query, and not globally, therefore, each transform call should
+// re-evaluate the child datset, since the default is not to move the dataset outside.
+
+ds2 := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER));
+outRec := { unsigned id, dataset(r) child };
+outRec t(ds2 l) := TRANSFORM
+    SELF.child := ds;
+    SELF.id := l.id;
+END;
+
+p := PROJECT(NOFOLD(ds2), t(LEFT));
+
+result1 := TABLE(nofold(p), { minId := MIN(child, id) });
+output(count(result1(minId = 1)) = 1);

+ 42 - 0
ecl/regress/volatile6a2.ecl

@@ -0,0 +1,42 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+//instance 1
+// used within a child query, and not globally, therefore, each transform call should
+// re-evaluate the child datset, since the default is not to move the dataset outside.
+
+ds2 := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER));
+outRec := { unsigned id, dataset(r) child };
+outRec t(ds2 l) := TRANSFORM
+    SELF.child := ds;
+    SELF.id := l.id;
+END;
+
+p := PROJECT(NOFOLD(ds2), t(LEFT));
+
+result2 := TABLE(p, { minId := MIN(child, id) });
+output(count(result2(minId = 1)) = 1);

+ 42 - 0
ecl/regress/volatile6b1.ecl

@@ -0,0 +1,42 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+//instance 2
+// used within a child query, but the dataset is marked as global.
+// Therefore hoist the child datset and use the common value each time.
+
+ds2 := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER));
+outRec := { unsigned id, dataset(r) child };
+outRec t(ds2 l) := TRANSFORM
+    SELF.child := ds WITHIN {};
+    SELF.id := l.id;
+END;
+
+p := PROJECT(NOFOLD(ds2), t(LEFT));
+
+result1 := TABLE(nofold(p), { minId := MIN(child, id) });
+output(count(result1(minId = 1)) = 100);

+ 42 - 0
ecl/regress/volatile6b2.ecl

@@ -0,0 +1,42 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+//instance 2
+// used within a child query, but the dataset is marked as global.
+// Therefore hoist the child datset and use the common value each time.
+
+ds2 := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER));
+outRec := { unsigned id, dataset(r) child };
+outRec t(ds2 l) := TRANSFORM
+    SELF.child := ds WITHIN {};
+    SELF.id := l.id;
+END;
+
+p := PROJECT(NOFOLD(ds2), t(LEFT));
+
+result2 := TABLE(p, { minId := MIN(child, id) });
+output(count(result2(minId = 1)) = 100);

+ 43 - 0
ecl/regress/volatile6c1.ecl

@@ -0,0 +1,43 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+//instance 3
+// used within a child query, but also globally.
+// each transform call should re-evaluate the child datset, since the default is not to move the dataset outside.
+
+ds2 := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER));
+outRec := { unsigned id, dataset(r) child };
+outRec t(ds2 l) := TRANSFORM
+    SELF.child := ds;
+    SELF.id := l.id;
+END;
+
+p := PROJECT(NOFOLD(ds2), t(LEFT));
+
+result1 := TABLE(nofold(p), { minId := MIN(child, id) });
+output(count(nofold(ds)) = 10);
+output(count(result1(minId = 1)) = 1);

+ 43 - 0
ecl/regress/volatile6c2.ecl

@@ -0,0 +1,43 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+//instance 3
+// used within a child query, but also globally.
+// each transform call should re-evaluate the child datset, since the default is not to move the dataset outside.
+
+ds2 := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER));
+outRec := { unsigned id, dataset(r) child };
+outRec t(ds2 l) := TRANSFORM
+    SELF.child := ds;
+    SELF.id := l.id;
+END;
+
+p := PROJECT(NOFOLD(ds2), t(LEFT));
+
+result2 := TABLE(p, { minId := MIN(child, id) });
+output(count(nofold(ds)) = 10);
+output(count(result2(minId = 1)) = 1);

+ 41 - 0
ecl/regress/volatile7a.ecl

@@ -0,0 +1,41 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+r2 := {DATASET(R) children};
+
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+ds2(unsigned base) := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER + base));
+
+r2 t1(r l) := TRANSFORM
+    value := RANDOM();
+    self.children := ds2(value + l.id);
+END;
+
+p := PROJECT(ds, t1(LEFT));
+
+summary := TABLE(NOFOLD(p), { unsigned delta := MAX(children, id) - MIN(children, id); });
+
+output(count(summary(delta = 99)) != 10);

+ 41 - 0
ecl/regress/volatile7b.ecl

@@ -0,0 +1,41 @@
+/*##############################################################################
+
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
+
+    All rights reserved. This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+############################################################################## */
+
+unsigned4 nextSequence() := BEGINC++
+#option volatile
+static unsigned mySequence = 0;
+return ++mySequence;
+ENDC++;
+
+r := {unsigned id};
+r2 := {DATASET(R) children};
+
+ds := dataset(10, transform(r, SELF.id := nextSequence()));
+
+ds2(unsigned base) := DATASET(100, transform({ unsigned id }, SELF.id := COUNTER + base));
+
+r2 t1(r l) := TRANSFORM
+    value := RANDOM() WITHIN {l};
+    self.children := ds2(value + l.id);
+END;
+
+p := PROJECT(ds, t1(LEFT));
+
+summary := TABLE(NOFOLD(p), { unsigned delta := MAX(children, id) - MIN(children, id); });
+
+output(count(summary(delta = 99)) = 10);

+ 1 - 1
ecl/regress/volatileds.ecl

@@ -1,6 +1,6 @@
 /*##############################################################################
 
-    Copyright (C) 2011 HPCC Systems®.
+    HPCC SYSTEMS software Copyright (C) 2016 HPCC Systems®.
 
     All rights reserved. This program is free software: you can redistribute it and/or modify
     it under the terms of the GNU Affero General Public License as