Browse Source

Merge branch 'candidate-5.2.0'

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
Richard Chapman 10 năm trước cách đây
mục cha
commit
7179255754

+ 5 - 1
docs/ECLLanguageReference/ECLR_mods/BltInFunc-SOAPCALL.xml

@@ -60,7 +60,11 @@
             conduct a Federated search where the request is sent to each of
             the target systems in the list. These URLs may contain standard
             form usernames and passwords, if required. The default
-            username/password are those contained in the workunit.</entry>
+            username/password are those contained in the workunit. If calling
+            an ESP Web service, you can append the ver_=n.nn parameter to
+            specify the version of the service. For example: <programlisting>SOAPCALL('http://127.0.0.1:8010/Wsdfu/?ver_=1.22', 
+         'DFUSearchData', 
+         instructure,DATASET(outsructure));</programlisting></entry>
           </row>
 
           <row>

+ 1 - 1
ecl/hqlcpp/hqlcpp.cpp

@@ -1607,7 +1607,7 @@ void HqlCppTranslator::cacheOptions()
         DebugOption(options.finalizeAllRows, "finalizeAllRows", false),
         DebugOption(options.maxLocalRowSize , "maxLocalRowSize", MAX_LOCAL_ROW_SIZE),
         DebugOption(options.optimizeGraph,"optimizeGraph", true),
-        DebugOption(options.orderDiskFunnel,"orderDiskFunnel", true),
+        DebugOption(options.orderDiskFunnel,"orderDiskFunnel", false),
         DebugOption(options.alwaysAllowAllNodes,"alwaysAllowAllNodes", false),
         DebugOption(options.slidingJoins,"slidingJoins", false),
         DebugOption(options.foldOptimized,"foldOptimized", false),

+ 1 - 0
esp/src/eclwatch/package.js

@@ -9,6 +9,7 @@ var profile = (function(){
             "hpcc/viz/map/us.json": true,
             "hpcc/viz/map/us_counties.json": true,
             "hpcc/viz/DojoD3": true,
+            "hpcc/viz/DojoD32DChart": true,
             "hpcc/viz/DojoD3BarChart": true,
             "hpcc/viz/DojoD3Choropleth": true,
             "hpcc/viz/DojoD3Choropleth": true,

+ 101 - 35
roxie/roxiemem/DOCUMENTATION.rst

@@ -9,40 +9,83 @@ Introduction
 This memory manager started life as the memory manager which was only used for the Roxie engine.  It had several
 original design goals:
 
-* Support link counted rows.
+* Support link counted rows.  (When the last reference is released the row is freed.)
 * Be as fast as possible on allocate and deallocate of small rows.
 * Allow rows serialized from slaves to be used directly without being cloned first.
 * Allow the memory used by a single query, or by all queries combined, to be limited, with graceful recovery.
 * Isolate roxie queries from one another, so that one query can't bring
   down all the rest by allocating too much memory.
-* Allow all the memory used by a query to be guaranteed to get freed when the query finishes, thus reducing the
-  possibility of memory leaks.
+* Guarantee all the memory used by a query is freed when the query finishes, reducing the possibility of memory leaks.
 * Predictable behaviour with no pathogenic cases.
 
 (Note that efficient usage of memory does not appear on that list - the expectation when the memory
 manager was first designed was that Roxie queries would use minimal amounts of memory and speed was
-more important.  Some subsequent changes e.g., Packed heaps help mitigate that.)
+more important.  Some subsequent changes e.g., Packed heaps, and configurable bucket sizes help mitigate that.)
+
+**************
+Main Structure
+**************
 
 The basic design is to reserve (but not commit) a single large block of memory in the virtual address space.  This
 memory is subdivided into "pages".  (These are not the same as the os virtual memory pages.  The memory manager pages
 are currently defined as 1Mb in size.)
 
-Memory is allocated from a set of "heaps.  Each heap owns a set of pages, and sub allocates memory of a
-single size from those pages.  All allocations from a particular page belong to the same heap.  Rounding the requested
-memory size up to the next heap-size means that memory
-is not lost due to fragmentation.
+The page bitmap
+===============
+The system uses a bitmap to indicate whether each page from the global memory has been allocated. All active
+IRowManager instances allocate pages from the same global memory space.
+To help reduce fragmentation allocations for single pages are fulfilled from one end of the address space, while
+allocations for multiple pages are fulfilled from the other.
+
+IRowManager
+===========
+This provides the primary interface for allocating memory.  The size of a requested allocation is rounded up to the
+next "bucket" size, and the allocation is then satisfied by the heap associated with that bucket size.  Different
+engines can specify different bucket sizes - an optional list is provided to setTotalMemoryLimit.  Roxie tends to use
+fewer buckets to help reduce the number of active heaps.  Thor uses larger numbers since it is more important to
+minimize the memory wasted.
+
+Roxie uses a separate instance of IRowManager for each query.  This provides the mechanism for limiting how much
+memory a query uses.  Thor uses a single instance of an IRowManager for each slave/master.
+
+Heaps
+=====
+Memory is allocated from a set of "heaps - where each heap allocates blocks of memory of a single size.  The heap
+exclusively owns a set of heaplet (each 1 page in size), which are held in a doubly linked list, and sub allocates
+memory from those heaplets.
 
-Information about each heap is stored in the base of the page (using a class with virtual functions) and the
+Information about each heaplet is stored in the base of the page (using a class with virtual functions) and the
 address of an allocated row is masked to determine which heap object it belongs to, and how it should be linked/released
 etc.  Any pointer not in the allocated virtual address (e.g., constant data) can be linked/released with no effect.
 
-Each allocation has a link count and an allocator id associated with it.  The allocator id represents the type of
-the row, and is used to determine what destructor needs to be called when the row is destroyed.  (The row also
-contains a flag to indicate if it is fully constructed so it is valid for the destructor to be called.)
+Each heaplet contains a high water mark of the address within the page that has already been allocated (freeBase),
+and a lockless singly-linked list of rows which have been released (r_block).  Releasing a row is non-blocking and
+does not involve any spin locks or critical sections.  However, this means that empty pages need to be returned to
+the global memory pool at another time.  (This is done in releaseEmptyPages()).
 
-An implementation of IRowManager processes all allocations for a particular roxie query.  This provides the
-mechanism for limiting how much memory a query uses.
+When the last row in a page is released a flag (possibleEmptyPages) is set in its associated heap.
+* This is checked before trying to free pages from a particular heap, avoiding waiting on a lock and traversing
+  a candidate list.
 
+Any page which *might* contain some spare memory is added to a lockless spare memory linked list.
+* Items are popped from this list when a heap fails to allocate memory from the current heaplet.  Each item is checked
+  in turn if it has space before allocating a new heaplet.
+* The list is also walked when checking to see which pages can be returned to the global memory.  The doubly linked
+  heaplet list allows efficient freeing.
+
+Each allocation has a link count and an allocator id associated with it.  The allocator id represents the type of
+the row, and is used to determine what destructor needs to be called when the row is destroyed.  (The count for a
+row also contains a flag in the top bit to indicate if it is fully constructed, and therefore valid for the
+destructor to be called.)
+
+Huge Heap
+=========
+A specialized heap is used to manage all allocations that require more than one page of memory.  These allocations
+are not held on a free list when they are released, but each is returned directly to the global memory pool.
+Allocations in the huge heap can be expanded and shrunk using the resizeRow() functions - see below.
+
+Specialised Heaps:
+==================
 For fixed size allocations it is possible to get a more efficient interface for allocating rows.  There are options
 to create unique fixed size heaps (to reduce thread contention) and packed heaps - where all rows share the same
 allocator id.
@@ -53,17 +96,16 @@ allocator id.
 Dynamic Spilling
 ****************
 
-Thor has different requirements to roxie.  In roxie, if a query exceeds its memory requirements then it is terminated.  Thor
-needs to be able to spill rows and other memory to disk and continue.  This is achieved by allowing any process that
-stores buffered rows to register a callback with the memory manager.  When more memory is required these are called
-to free up memory, and allow the job to continues.
+Thor has additional requirements to roxie.  In roxie, if a query exceeds its memory requirements then it is terminated.
+Thor needs to be able to spill rows and other memory to disk and continue.  This is achieved by allowing any process
+that stores buffered rows to register a callback with the memory manager.  When more memory is required these callbacks
+are called to free up memory, and allow the job to continue.
 
 Each callback can specify a priority - lower priority callbacks are called first since they are assumed to have a
 lower cost associated with spilling.  When more memory is required the callbacks are called in priority order until
 one of them succeeds.  The can also be passed a flag to indicate it is critical to force them to free up as much memory
 as possible.
 
-
 Complications
 =============
 
@@ -76,6 +118,20 @@ There are several different complications involved with the memory spilling:
 * Filtered record streams can cause significant wasted space in the memory blocks.
 * Resizing a multi-page allocation is non trivial.
 
+Callback Rules
+==============
+Some rules to follow when implementing callbacks:
+
+* A callback cannot allocate any memory from the memory manager.  If it does it is likely to deadlock.
+
+* You cannot allocate memory while holding a lock if that lock is also required by a callback.
+
+  Again this will cause deadlock.  If it proves impossible you can use a try-lock primitive in the callback,
+  but it means you won't be able to spill those rows.
+
+* If the heaps are fragmented it may be more efficient to repack the heaps than spill to disk.
+
+* If you're resizing a potentially big block of memory use the resize function with the callback.
 
 Resizing Large memory blocks
 ============================
@@ -92,7 +148,7 @@ one of these memory blocks needs to be expanded you need to be careful:
   (The callback may be from another thread...)
 
 * Therefore the memory manager contains a call which allows you to resize a block, but with a callback
-  which is used to atomically update the pointer so it always remains thead safe.
+  which is used to atomically update the pointer so it always remains thread safe.
 
 
 Compacting heaps
@@ -108,21 +164,6 @@ This works by iterating through each of the rows in a list.  If the row belongs
 and isn't part of a full heaplet, then the row is moved.  Since subsequent rows tend to be allocated from the same
 heaplet this has the effect of compacting the rows.
 
-Rules
-=====
-Some rules to follow when implementing callbacks:
-
-* A callback cannot allocate any memory from the memory manager.  If it does it is likely to deadlock.
-
-* You cannot allocate memory while holding a lock if that lock is also required by a callback.
-
-  Again this will cause deadlock.  If it proves impossible you can use a try-lock primitive in the callback,
-  but it means you won't be able to spill those rows.
-
-* If the heaps are fragmented it may be more efficient to repack the heaps than spill to disk.
-
-* If you're resizing a potentially big block of memory use the resize function with the callback.
-
 *************
 Shared Memory
 *************
@@ -135,3 +176,28 @@ The ILargeMemCallback provides a mechanism to dynamically allocate more memory t
 This could potentially be done in stages rather than all or nothing.
 
 (Currently unused as far as I know...)
+
+**********
+Huge pages
+**********
+
+When OS processes use a large amount of memory, mapping virtual addresses to physical addresses can begin to
+take a significant proportion of the execution time.  This is especially true once the TLB is not large enough to
+store all the mappings.  Huge pages can significantly help with this problem by reducing the number of TLB entries
+needed to cover the virtual address space.  The memory manager supports huge pages in two different ways:
+
+Huge pages can be preallocated (e.g., with hugeadm) for exclusive use as huge pages.  If huge pages are enabled
+for a particular engine, and sufficient huge pages are available to supply the memory for the memory manager, then
+they will be used.
+
+Linux kernels from 2.6.38 onward have support for transparent huge pages.  These do not need to be preallocated,
+instead the operating system tries to use them behind the scenes.  HPCC version 5.2 and following takes advantage
+of this feature to significantly speed memory access up when large amounts of memory are used by each process.
+
+Preallocated huge pages tend to be more efficient, but they have the disadvantage that the operating system currently
+does not reuse unused huge pages for other purposes e.g., disk cache.
+
+There is also a memory manager option to not return the memory to the operating system when it is no longer
+required.  This has the advantage of not clearing the memory whenever it is required again, but the same disadvantage
+as preallocated huge pages that the unused memory cannot be used for disk cache.  We recommend this option is
+selected when preallocated huge pages are in use - until the kernel allows them to be reused.

+ 513 - 144
roxie/roxiemem/roxiemem.cpp

@@ -112,6 +112,20 @@ const unsigned UNSIGNED_ALLBITS = (unsigned) -1;
 const unsigned TOPBITMASK = 1<<(UNSIGNED_BITS-1);
 const memsize_t heapBlockSize = UNSIGNED_BITS*HEAP_ALIGNMENT_SIZE;
 
+//Constants used when maintaining a list of blocks.  The blocks are stored as unsigned numbers, null has an unusual number so
+//that block 0 can be the heaplet at address heapBase.  The top bits are used as a mask to prevent the ABA problem in
+//a lockless list.  I suspect the mask could be increased.
+const unsigned BLOCKLIST_MASK = 0xFFFFFF;
+const unsigned BLOCKLIST_ABA_INC = (BLOCKLIST_MASK+1);
+const unsigned BLOCKLIST_ABA_MASK = ~BLOCKLIST_MASK;
+const unsigned BLOCKLIST_NULL = BLOCKLIST_MASK; // Used to represent a null entry
+const unsigned BLOCKLIST_LIMIT = BLOCKLIST_MASK; // Values above this are not valid
+
+inline bool isNullBlock(unsigned block)
+{
+    return (block & BLOCKLIST_MASK) == BLOCKLIST_NULL;
+}
+
 template <typename VALUE_TYPE, typename ALIGN_TYPE>
 inline VALUE_TYPE align_pow2(VALUE_TYPE value, ALIGN_TYPE alignment)
 {
@@ -138,19 +152,33 @@ typedef MapBetween<unsigned, unsigned, memsize_t, memsize_t> MapActivityToMemsiz
 
 static CriticalSection heapBitCrit;
 
-static void initializeHeap(bool allowHugePages, bool allowTransparentHugePages, bool retainMemory, unsigned pages, unsigned largeBlockGranularity, ILargeMemCallback * largeBlockCallback)
+static void initializeHeap(bool allowHugePages, bool allowTransparentHugePages, bool retainMemory, memsize_t pages, memsize_t largeBlockGranularity, ILargeMemCallback * largeBlockCallback)
 {
     if (heapBase) return;
 
     // CriticalBlock b(heapBitCrit); // unnecessary - must call this exactly once before any allocations anyway!
-    heapBitmapSize = (pages + UNSIGNED_BITS - 1) / UNSIGNED_BITS;
-    heapTotalPages = heapBitmapSize * UNSIGNED_BITS;
+    memsize_t bitmapSize = (pages + UNSIGNED_BITS - 1) / UNSIGNED_BITS;
+    memsize_t totalPages = bitmapSize * UNSIGNED_BITS;
+    memsize_t memsize = totalPages * HEAP_ALIGNMENT_SIZE;
+
+    if (totalPages > (unsigned)-1)
+        throw makeStringExceptionV(ROXIEMM_TOO_MUCH_MEMORY,
+                    "Heap cannot support memory of size %" I64F "u - decrease memory or increase HEAP_ALIGNMENT_SIZE",
+                    (__uint64)memsize);
+
+    if (totalPages >= BLOCKLIST_LIMIT)
+        throw makeStringExceptionV(ROXIEMM_TOO_MUCH_MEMORY,
+                    "Heap cannot support memory of size %" I64F "u - decrease memory or increase HEAP_ALIGNMENT_SIZE or BLOCKLIST_MASK",
+                    (__uint64)memsize);
+
+    heapBitmapSize = (unsigned)bitmapSize;
+    heapTotalPages = (unsigned)totalPages;
     heapLargeBlockGranularity = largeBlockGranularity;
     heapLargeBlockCallback = largeBlockCallback;
-    memsize_t memsize = memsize_t(heapTotalPages) * HEAP_ALIGNMENT_SIZE;
 
     heapNotifyUnusedEachFree = !retainMemory;
     heapNotifyUnusedEachBlock = false;
+
 #ifdef _WIN32
     // Not the world's best code but will do 
     char *next = (char *) HEAP_ALIGNMENT_SIZE;
@@ -1028,9 +1056,11 @@ class Heaplet : public HeapletBase
     friend class CHeap;
 protected:
     Heaplet *next;
+    Heaplet *prev;
     const IRowAllocatorCache *allocatorCache;
     CHeap * const heap;
     memsize_t chunkCapacity;
+    atomic_t nextSpace;
     
     inline unsigned getActivityId(unsigned allocatorId) const
     {
@@ -1040,8 +1070,10 @@ protected:
 public:
     Heaplet(CHeap * _heap, const IRowAllocatorCache *_allocatorCache, memsize_t _chunkCapacity) : heap(_heap), chunkCapacity(_chunkCapacity)
     {
+        atomic_set(&nextSpace, 0);
         assertex(heap);
         next = NULL;
+        prev = NULL;
         allocatorCache = _allocatorCache;
     }
 
@@ -1051,6 +1083,7 @@ public:
     virtual void reportLeaks(unsigned &leaked, const IContextLogger &logctx) const = 0;
     virtual void checkHeap() const = 0;
     virtual void getPeakActivityUsage(IActivityMemoryUsageMap *map) const = 0;
+    virtual bool isFull() const = 0;
 
 #ifdef _WIN32
 #ifdef new
@@ -1063,6 +1096,9 @@ public:
     {
         subfree_aligned(p, 1);
     }
+
+    inline void addToSpaceList();
+    virtual void verifySpaceList();
 };
 
 
@@ -1108,6 +1144,22 @@ public:
 
     inline unsigned numChunks() const { return queryCount()-1; }
 
+    //Is there any space within the heaplet that hasn't ever been allocated.
+    inline bool hasAnyUnallocatedSpace() const
+    {
+        //This could use a special value of freeBase to indicate it was full, but that would add complication to
+        //the allocation code which is more time critical.
+        unsigned curFreeBase = atomic_read(&freeBase);
+        size32_t bytesFree = dataAreaSize() - curFreeBase;
+        return (bytesFree >= chunkSize);
+    }
+
+    virtual bool isFull() const
+    {
+        //Has all the space been allocated at least once, and is the free chain empty.
+        return !hasAnyUnallocatedSpace() && (atomic_read(&r_blocks) & RBLOCKS_OFFSET_MASK) == 0;
+    }
+
     inline static unsigned dataOffset() { return HEAPLET_DATA_AREA_OFFSET(ChunkedHeaplet); }
 
     inline static size32_t dataAreaSize() { return  (size32_t)(HEAP_ALIGNMENT_SIZE - dataOffset()); }
@@ -1158,6 +1210,7 @@ public:
     }
 
     char * allocateChunk();
+    virtual void verifySpaceList();
 
 protected:
     inline unsigned makeRelative(const char *ptr)
@@ -1202,7 +1255,14 @@ protected:
             unsigned new_tag = ((old_blocks & RBLOCKS_CAS_TAG_MASK) + RBLOCKS_CAS_TAG);
             unsigned new_blocks = new_tag | r_ptr;
             if (atomic_cas(&r_blocks, new_blocks, old_blocks))
+            {
+                //If this is the first block being added to the free chain then add it to the space list
+                //It is impossible to make it more restrictive -e.g., only when freeing and full because of
+                //various race conditions.
+                if (atomic_read(&nextSpace) == 0)
+                    addToSpaceList();
                 break;
+            }
         }
 
         CHeap * savedHeap = heap;
@@ -1681,6 +1741,7 @@ public:
                 allocatorCache->onDestroy(allocatorId & MAX_ACTIVITY_ID, (void *)ptr);
 
             CHeap * savedHeap = heap;
+            addToSpaceList();
             // after the following dec(count) it is possible that the page could be freed, so cannot access any members of this
             compiler_memory_barrier();
             unsigned cnt = atomic_dec_and_read(&count);
@@ -1747,9 +1808,30 @@ public:
     {
         throwUnexpected();
     }
+
+    virtual bool isFull() const
+    {
+        return (atomic_read(&count) > 1);
+    }
 };
 
 //================================================================================
+
+inline unsigned heapletToBlock(Heaplet * heaplet)
+{
+    dbgassertex(heaplet);
+    return ((char *)heaplet - heapBase) / HEAP_ALIGNMENT_SIZE;
+}
+
+inline Heaplet * blockToHeaplet(unsigned block)
+{
+    unsigned maskedBlock = block & BLOCKLIST_MASK;
+    dbgassertex(maskedBlock != BLOCKLIST_NULL);
+    return (Heaplet *)(heapBase + maskedBlock * HEAP_ALIGNMENT_SIZE);
+}
+
+
+//================================================================================
 //
 struct ActivityEntry 
 {
@@ -2029,48 +2111,182 @@ protected:
 //Responsible for allocating memory for a chain of chunked blocks
 class CHeap : public CInterface
 {
+    friend class HeapCompactState;
 public:
     CHeap(CChunkingRowManager * _rowManager, const IContextLogger &_logctx, const IRowAllocatorCache *_allocatorCache, unsigned _flags)
-        : logctx(_logctx), rowManager(_rowManager), allocatorCache(_allocatorCache), active(NULL), flags(_flags)
+        : logctx(_logctx), rowManager(_rowManager), allocatorCache(_allocatorCache), activeHeaplet(NULL), heaplets(NULL), flags(_flags)
     {
         atomic_set(&possibleEmptyPages, 0);
+        atomic_set(&headMaybeSpace, BLOCKLIST_NULL);
     }
 
     ~CHeap()
     {
-        Heaplet *finger = active;
-        while (finger)
+        if (memTraceLevel >= 3)
         {
-            if (memTraceLevel >= 3)
-                logctx.CTXLOG("RoxieMemMgr: CChunkingRowManager d-tor freeing heaplet linked in active list - addr=%p rowMgr=%p",
-                        finger, this);
-            Heaplet *next = getNext(finger);
-            delete finger;
-            finger = next;
+            //ensure verifySpaceListConsistency isn't triggered by leaked allocations that are never freed.
+            if (activeHeaplet && atomic_read(&activeHeaplet->nextSpace) == 0)
+                atomic_set(&activeHeaplet->nextSpace, BLOCKLIST_NULL);
+
+            verifySpaceListConsistency();
+        }
+
+        if (heaplets)
+        {
+            Heaplet *finger = heaplets;
+
+            //Note: This loop doesn't unlink the list because the list and all blocks are going to be disposed.
+            do
+            {
+                if (memTraceLevel >= 3)
+                    logctx.CTXLOG("RoxieMemMgr: CChunkingRowManager d-tor freeing heaplet linked in active list - addr=%p rowMgr=%p",
+                            finger, this);
+                Heaplet *next = getNext(finger);
+                delete finger;
+                finger = next;
+            } while (finger != heaplets);
+        }
+        heaplets = NULL;
+        activeHeaplet = NULL;
+    }
+
+    void addToSpaceList(Heaplet * heaplet)
+    {
+        //Careful: Two threads might be calling this at exactly the same time: ensure only one goes any further
+        if (!atomic_cas(&heaplet->nextSpace, BLOCKLIST_NULL, 0))
+            return;
+
+        unsigned block = heapletToBlock(heaplet);
+        loop
+        {
+            unsigned head = atomic_read(&headMaybeSpace);
+
+            //Update the next pointer.  BLOCKLIST_ABA_INC is ORed with the value to ensure it is non-zero.
+            atomic_set(&heaplet->nextSpace, head | BLOCKLIST_ABA_INC);
+
+            //Ensure any items added onto the list have a new aba tag
+            unsigned newHead = block + (head & BLOCKLIST_ABA_MASK) + BLOCKLIST_ABA_INC;
+            if (atomic_cas(&headMaybeSpace, newHead, head))
+                break;
+        }
+    }
+
+    Heaplet * popFromSpaceList()
+    {
+        //This must only be called within a critical section since some functions assume only one active thread is
+        //allowed to remove elements from the list
+        loop
+        {
+            unsigned head = atomic_read(&headMaybeSpace);
+            if (isNullBlock(head))
+                return NULL;
+
+            Heaplet * heaplet = blockToHeaplet(head);
+            //Always valid to access a heaplet on a list, because we must remove from all lists before disposing.
+            unsigned next = atomic_read(&heaplet->nextSpace);
+
+            //No need to update the aba mask on removal since removal cannot create a false positives.
+            if (atomic_cas(&headMaybeSpace, next, head))
+            {
+                //Indicate that this item is no longer on the list.
+                atomic_set(&heaplet->nextSpace, 0);
+                //NOTE: If another thread tries to add it before this set succeeds that doesn't cause a problem since on return this heaplet will be processed
+                return heaplet;
+            }
+        }
+    }
+
+    void removeFromSpaceList(Heaplet * toRemove)
+    {
+        //This must only be called within a critical section so only one thread can access at once.
+        //NOTE: We don't care about items being added while this loop is iterating - since the item
+        //being removed cannot be being added.
+        //And nothing else can be being removed - since we are protected by the critical section
+
+        //NextSpace can't change while this function is executing
+        unsigned nextSpace = atomic_read(&toRemove->nextSpace);
+        //If not on the list then return immediately
+        if (nextSpace == 0)
+            return;
+
+        //Special case head because that can change while this is being executed...
+        unsigned searchBlock = heapletToBlock(toRemove);
+        unsigned head = atomic_read(&headMaybeSpace);
+        if (isNullBlock(head))
+        {
+            //The block wasn't found on the space list even though it should have been
+            throwUnexpected();
+            return;
+        }
+
+        if ((head & BLOCKLIST_MASK) == searchBlock)
+        {
+            //Currently head of the list, try and remove it
+            if (atomic_cas(&headMaybeSpace, nextSpace, head))
+            {
+                atomic_set(&toRemove->nextSpace, 0);
+                return;
+            }
+
+            //head changed - reread head and fall through since it must now be a child of that new head
+            head = atomic_read(&headMaybeSpace);
+        }
+
+        //Not at the head of the list, and head is not NULL
+        Heaplet * prevHeaplet = blockToHeaplet(head);
+        loop
+        {
+            unsigned next = atomic_read(&prevHeaplet->nextSpace);
+            if (isNullBlock(next))
+            {
+                //The block wasn't found on the space list even though it should have been
+                throwUnexpected();
+                return;
+            }
+
+            Heaplet * heaplet = blockToHeaplet(next);
+            if (heaplet == toRemove)
+            {
+                //Remove the item from the list, and indicate it is no longer on the list
+                //Can use atomic_set() because no other thread can be removing (and therefore modifying nextSpace)
+                atomic_set(&prevHeaplet->nextSpace, nextSpace);
+                atomic_set(&toRemove->nextSpace, 0);
+                return;
+            }
+            prevHeaplet = heaplet;
         }
-        active = NULL;
     }
 
     void reportLeaks(unsigned &leaked) const
     {
         SpinBlock c1(crit);
-        Heaplet *finger = active;
-        while (leaked && finger)
+        Heaplet * start = heaplets;
+        if (start)
         {
-            if (leaked && memTraceLevel >= 1)
-                finger->reportLeaks(leaked, logctx);
-            finger = getNext(finger);
+            Heaplet * finger = start;
+            while (leaked)
+            {
+                if (leaked && memTraceLevel >= 1)
+                    finger->reportLeaks(leaked, logctx);
+                finger = getNext(finger);
+                if (finger == start)
+                    break;
+            }
         }
     }
 
     void checkHeap()
     {
         SpinBlock c1(crit);
-        Heaplet *finger = active;
-        while (finger)
+        Heaplet * start = heaplets;
+        if (start)
         {
-            finger->checkHeap();
-            finger = getNext(finger);
+            Heaplet * finger = start;
+            do
+            {
+                finger->checkHeap();
+                finger = getNext(finger);
+            } while (finger != start);
         }
     }
 
@@ -2078,59 +2294,144 @@ public:
     {
         unsigned total = 0;
         SpinBlock c1(crit);
-        Heaplet *finger = active;
-        while (finger)
+        Heaplet * start = heaplets;
+        if (start)
         {
-            total += finger->queryCount() - 1; // There is one refcount for the page itself on the active q
-            finger = getNext(finger);
+            Heaplet * finger = start;
+            do
+            {
+                total += finger->queryCount() - 1; // There is one refcount for the page itself on the active q
+                finger = getNext(finger);
+            } while (finger != start);
         }
         return total;
     }
 
+    void verifySpaceListConsistency()
+    {
+        //Check that all blocks are either full, on the with-possible-space list or active
+        Heaplet * start = heaplets;
+        Heaplet * finger = start;
+        if (start)
+        {
+            do
+            {
+                if (!finger->isFull())
+                    finger->verifySpaceList();
+                finger = getNext(finger);
+            } while (finger != start);
+        }
+    }
+
+    unsigned releasePage(Heaplet * finger)
+    {
+        if (memTraceLevel >= 3)
+            logctx.CTXLOG("RoxieMemMgr: CChunkingRowManager::pages() freeing Heaplet linked in active list - addr=%p pages=%u capacity=%"I64F"u rowMgr=%p",
+                    finger, finger->sizeInPages(), (unsigned __int64) finger->_capacity(), this);
+
+        removeHeaplet(finger);
+        if (finger == activeHeaplet)
+            activeHeaplet = NULL;
+
+        unsigned size = finger->sizeInPages();
+        //It is possible (but very unlikely) for another thread to have added this block to the space list.
+        //Ensure it is not on the list.
+        removeFromSpaceList(finger);
+        delete finger;
+        return size;
+    }
+
     unsigned releaseEmptyPages(bool forceFreeAll)
     {
+        //If releaseEmptyPages() is called between the last release on a page (setting count to 1), and this flag
+        //getting set, it won't release the page *this time*.  But that is the same as the release happening
+        //slightly later.
+        if (atomic_read(&possibleEmptyPages) == 0)
+            return 0;
+
+        unsigned total = 0;
+        SpinBlock c1(crit);
+        //Check again in case other thread has also called this function and no other pages have been released.
         if (atomic_read(&possibleEmptyPages) == 0)
             return 0;
 
         //You will get a false positive if possibleEmptyPages is set while walking the active page list, but that
         //only mean the list is walked more than it needs to be.
-        //If releaseEmptyPages() is called between the last release on a page (setting count to 1), and this flag
-        //getting set, it won't release the page *this time*.  But that is the same as the release happening
-        //slightly later.
         atomic_set(&possibleEmptyPages, 0);
-        unsigned total = 0;
-        Heaplet *prev = NULL;
-        SpinBlock c1(crit);
-        Heaplet *finger = active;
-        while (finger)
+
+        //Any blocks that could be freed must either be the active block and/or on the maybe space list.
+        Heaplet * headHeaplet;
+        Heaplet * preserved = NULL;
+        //First free any empty blocks at the head of the maybe space list
+        loop
         {
-            Heaplet *next = getNext(finger);
-            if (finger->queryCount()==1)
+            unsigned head = atomic_read(&headMaybeSpace);
+            if (isNullBlock(head))
             {
-                //If this is the only page then only free it if forced to.
-                if (!prev && !next && !forceFreeAll)
-                {
-                    //There is still a potential page to release so reset the flag.
-                    atomic_set(&possibleEmptyPages, 1);
-                    break;
-                }
+                headHeaplet = NULL;
+                break;
+            }
 
-                if (memTraceLevel >= 3)
-                    logctx.CTXLOG("RoxieMemMgr: CChunkingRowManager::pages() freeing Heaplet linked in active list - addr=%p pages=%u capacity=%"I64F"u rowMgr=%p",
-                            finger, finger->sizeInPages(), (unsigned __int64) finger->_capacity(), this);
-                if (prev)
-                    setNext(prev, next);
-                else
-                    active = next;
+            headHeaplet = blockToHeaplet(head);
+            //Is it possible to free this heaplet?
+            if (headHeaplet->queryCount() != 1)
+                break;
 
-                total += finger->sizeInPages();
-                delete finger;
+            //If this is the only page then only free it if forced to.
+            if ((headHeaplet->next == headHeaplet) && !forceFreeAll)
+            {
+                preserved = headHeaplet;
+                break;
             }
-            else
+
+            //Always valid to access a heaplet on a list, because we must remove from all lists before disposing.
+            unsigned next = atomic_read(&headHeaplet->nextSpace);
+
+            //No need to update the aba mask on removal since removal cannot create a false positives.
+            if (atomic_cas(&headMaybeSpace, next, head))
             {
-                prev = finger;
+                atomic_set(&headHeaplet->nextSpace, 0);
+                total += releasePage(headHeaplet);
             }
-            finger = next;
+        }
+
+        //Not going to modify head, so can now walk the rest of the items, other threads can only add to head.
+        if (headHeaplet)
+        {
+            Heaplet * prevHeaplet = headHeaplet;
+            loop
+            {
+                unsigned curSpace = atomic_read(&prevHeaplet->nextSpace);
+                if (isNullBlock(curSpace))
+                    break;
+
+                Heaplet * heaplet = blockToHeaplet(curSpace);
+                if (heaplet->queryCount() == 1)
+                {
+                    //Remove it directly rather than walking the list to remove it.
+                    unsigned nextSpace = atomic_read(&heaplet->nextSpace);
+                    atomic_set(&prevHeaplet->nextSpace, nextSpace);
+                    atomic_set(&heaplet->nextSpace, 0);
+                    total += releasePage(heaplet);
+                }
+                else
+                    prevHeaplet = heaplet;
+            }
+        }
+
+        if (activeHeaplet && forceFreeAll)
+        {
+            //I am not convinced this can ever lead to an extra page being released - except when it is released
+            //after the space list has been walked.  Keep for the moment just to be sure.
+            assertex(!preserved);
+            if (activeHeaplet->queryCount() == 1)
+                total += releasePage(activeHeaplet);
+        }
+        else if (preserved)
+        {
+            //Add this page back onto the potential-space list
+            atomic_set(&possibleEmptyPages, 1);
+            addToSpaceList(preserved);
         }
 
         return total;
@@ -2142,22 +2443,26 @@ public:
         memsize_t numAllocs = 0;
         {
             SpinBlock c1(crit);
-            Heaplet *finger = active;
-            while (finger)
+            Heaplet * start = heaplets;
+            if (start)
             {
-                unsigned thisCount = finger->queryCount()-1;
-                if (thisCount != 0)
-                    finger->getPeakActivityUsage(usageMap);
-                numAllocs += thisCount;
-                numPages++;
-                finger = getNext(finger);
+                Heaplet * finger = start;
+                do
+                {
+                    unsigned thisCount = finger->queryCount()-1;
+                    if (thisCount != 0)
+                        finger->getPeakActivityUsage(usageMap);
+                    numAllocs += thisCount;
+                    numPages++;
+                    finger = getNext(finger);
+                } while (finger != start);
             }
         }
         if (numPages)
             reportHeapUsage(usageMap, numPages, numAllocs);
     }
 
-    inline bool isEmpty() const { return !active; }
+    inline bool isEmpty() const { return !heaplets; }
 
     virtual bool matches(size32_t searchSize, unsigned searchActivity, unsigned searchFlags) const
     {
@@ -2166,31 +2471,62 @@ public:
 
     void noteEmptyPage() { atomic_set(&possibleEmptyPages, 1); }
 
-    inline void internalLock() { crit.enter(); }
-    inline void internalUnlock() { crit.leave(); }
-
-
 protected:
     virtual void reportHeapUsage(IActivityMemoryUsageMap * usageMap, unsigned numPages, memsize_t numAllocs) const = 0;
 
     inline Heaplet * getNext(const Heaplet * ptr) const { return ptr->next; }
     inline void setNext(Heaplet * ptr, Heaplet * next) const { ptr->next = next; }
+    inline Heaplet * getPrev(const Heaplet * ptr) const { return ptr->prev; }
+    inline void setPrev(Heaplet * ptr, Heaplet * prev) const { ptr->prev = prev; }
+
+    //Must be called within a critical section
+    void insertHeaplet(Heaplet * ptr)
+    {
+        if (!heaplets)
+        {
+            ptr->prev = ptr;
+            ptr->next = ptr;
+            heaplets = ptr;
+        }
+        else
+        {
+            Heaplet * next = heaplets;
+            Heaplet * prev = next->prev;
+            ptr->next = next;
+            ptr->prev = prev;
+            prev->next = ptr;
+            next->prev = ptr;
+        }
+    }
 
-    inline void moveHeapletToHead(Heaplet * prev, Heaplet * newHead)
+    void removeHeaplet(Heaplet * ptr)
     {
-        Heaplet * next = getNext(newHead);
-        setNext(prev, next);
-        setNext(newHead, active);
-        active = newHead;
+        Heaplet * next = ptr->next;
+        if (next != ptr)
+        {
+            Heaplet * prev = ptr->prev;
+            next->prev = prev;
+            prev->next = next;
+            //Ensure that heaplets isn't invalid
+            heaplets = next;
+        }
+        else
+            heaplets = NULL;
+        //NOTE: We do not clear next/prev in the heaplet being removed.
     }
 
+    inline void internalLock() { crit.enter(); }
+    inline void internalUnlock() { crit.leave(); }
+
 protected:
     unsigned flags; // before the pointer so it packs better in 64bit.
-    Heaplet *active;
+    Heaplet * activeHeaplet; // which block is the current candidate for adding rows.
+    Heaplet * heaplets; // the linked list of heaplets for this heap
     CChunkingRowManager * rowManager;
     const IRowAllocatorCache *allocatorCache;
     const IContextLogger & logctx;
     mutable SpinLock crit;      // MORE: Can probably be a NonReentrantSpinLock if we're careful
+    atomic_t headMaybeSpace;  // The head of the list of heaplets which potentially have some space.
     atomic_t possibleEmptyPages;  // Are there any pages with 0 records.  Primarily here to avoid walking long page chains.
 };
 
@@ -2218,8 +2554,12 @@ public:
                 heap->internalUnlock();
             heap = _heap;
             if (_heap)
+            {
                 _heap->internalLock();
-            next = NULL;
+                next = _heap->heaplets;
+            }
+            else
+                next = NULL;
         }
     }
 public:
@@ -2270,6 +2610,7 @@ public:
 
     inline unsigned maxChunksPerPage() const { return chunksPerPage; }
 
+
 protected:
     inline void * inlineDoAllocate(unsigned allocatorId, unsigned maxSpillCost);
     virtual ChunkedHeaplet * allocateHeaplet() = 0;
@@ -2336,6 +2677,28 @@ void noteEmptyPage(CHeap * const heap)
     heap->noteEmptyPage();
 }
 
+void Heaplet::addToSpaceList()
+{
+    if (atomic_read(&nextSpace) != 0)
+        return;
+    heap->addToSpaceList(this);
+}
+
+void Heaplet::verifySpaceList()
+{
+    if (atomic_read(&nextSpace) == 0)
+    {
+        ERRLOG("%p@%u: Verify failed: %p %u", heap, (unsigned)GetCurrentThreadId(), this, isFull());
+    }
+}
+
+void ChunkedHeaplet::verifySpaceList()
+{
+    if (atomic_read(&nextSpace) == 0)
+    {
+        ERRLOG("%p@%u: Verify failed: %p %u %x %x", heap, (unsigned)GetCurrentThreadId(), this, isFull(), atomic_read(&freeBase), atomic_read(&r_blocks));
+    }
+}
 
 char * ChunkedHeaplet::allocateChunk()
 {
@@ -3770,8 +4133,7 @@ void * CHugeHeap::doAllocate(memsize_t _size, unsigned allocatorId, unsigned max
     }
 
     SpinBlock b(crit);
-    setNext(head, active);
-    active = head;
+    insertHeaplet(head);
     return head->allocateHuge(_size);
 }
 
@@ -3816,32 +4178,14 @@ void CHugeHeap::expandHeap(void * original, memsize_t copysize, memsize_t oldcap
         {
             HugeHeaplet *oldhead = (HugeHeaplet *) oldbase;
             HugeHeaplet *head = (HugeHeaplet *) realloced;
+            //NOTE: Huge pages are only added to the space list when they are freed => no need to check
+            //if it needs removing and re-adding to that list.
             if (realloced != oldbase)
             {
                 // Remove the old block from the chain
                 {
                     SpinBlock b(crit);
-                    if (active==oldhead)
-                    {
-                        active = getNext(oldhead);
-                    }
-                    else
-                    {
-                        Heaplet *finger = active;
-                        // Remove old pointer from the chain
-                        while (finger)
-                        {
-                            Heaplet *next = getNext(finger);
-                            if (next == oldhead)
-                            {
-                                setNext(finger, getNext(oldhead));
-                                break;
-                            }
-                            else
-                                finger = next;
-                        }
-                        assert(finger != NULL); // Should always have found it
-                    }
+                    removeHeaplet(oldhead);
                 }
 
                 //Copying data within the block => must lock for the duration
@@ -3850,10 +4194,9 @@ void CHugeHeap::expandHeap(void * original, memsize_t copysize, memsize_t oldcap
 
                 // MORE - If we were really clever, we could manipulate the page table to avoid moving ANY data here...
                 memmove(realloced, oldbase, copysize + HugeHeaplet::dataOffset());  // NOTE - assumes no trailing data (e.g. end markers)
+
                 SpinBlock b(crit);
-                // Add at front of chain
-                setNext(head, active);
-                active = head;
+                insertHeaplet(head);
             }
             void * ret = (char *) realloced + HugeHeaplet::dataOffset();
             memsize_t newCapacity = head->setCapacity(newsize);
@@ -3904,28 +4247,42 @@ void * CChunkedHeap::inlineDoAllocate(unsigned allocatorId, unsigned maxSpillCos
     ChunkedHeaplet * donorHeaplet;
     char * chunk;
     {
-        Heaplet *prev = NULL;
         SpinBlock b(crit);
-        Heaplet *finger = active;
-        while (finger)
+        if (activeHeaplet)
         {
             //This cast is safe because we are within a member of CChunkedHeap
-            donorHeaplet = static_cast<ChunkedHeaplet *>(finger);
+            donorHeaplet = static_cast<ChunkedHeaplet *>(activeHeaplet);
             chunk = donorHeaplet->allocateChunk();
             if (chunk)
             {
-                if (prev)
-                    moveHeapletToHead(prev, finger);
+                //The code at the end of this function needs to be executed outside of the spinblock.
+                //Just occasionally gotos are the best way of expressing something
+                goto gotChunk;
+            }
+            activeHeaplet = NULL;
+        }
+
+        //Now walk the list of blocks which may potentially have some free space:
+        loop
+        {
+            Heaplet * next = popFromSpaceList();
+            if (!next)
+                break;
 
+            //This cast is safe because we are within a member of CChunkedHeap
+            donorHeaplet = static_cast<ChunkedHeaplet *>(next);
+            chunk = donorHeaplet->allocateChunk();
+            if (chunk)
+            {
+                activeHeaplet = donorHeaplet;
                 //The code at the end of this function needs to be executed outside of the spinblock.
                 //Just occasionally gotos are the best way of expressing something
                 goto gotChunk;
             }
-            prev = finger;
-            finger = getNext(finger);
         }
     }
 
+    //NB: At this point activeHeaplet = NULL;
     loop
     {
         rowManager->checkLimit(1, maxSpillCost);
@@ -3934,6 +4291,9 @@ void * CChunkedHeap::inlineDoAllocate(unsigned allocatorId, unsigned maxSpillCos
         if (donorHeaplet)
             break;
         rowManager->restoreLimit(1);
+
+        //Could check if activeHeaplet was now set (and therefore allocated by another thread), and if so restart
+        //the function, but grabbing the spin lock would be inefficient.
         if (!rowManager->releaseCallbackMemory(maxSpillCost, true))
             throwHeapExhausted(1);
     }
@@ -3944,8 +4304,14 @@ void * CChunkedHeap::inlineDoAllocate(unsigned allocatorId, unsigned maxSpillCos
 
     {
         SpinBlock b(crit);
-        setNext(donorHeaplet, active);
-        active = donorHeaplet;
+        insertHeaplet(donorHeaplet);
+
+        //While this thread was allocating a block, another thread also did the same.  Ensure that other block is
+        //placed on the list of those with potentially free space.
+        if (activeHeaplet)
+            addToSpaceList(activeHeaplet);
+        activeHeaplet = donorHeaplet;
+
         //If no protecting spinblock there would be a race e.g., if another thread allocates all the rows!
         chunk = donorHeaplet->allocateChunk();
         dbgassertex(chunk);
@@ -3958,38 +4324,41 @@ gotChunk:
 
 const void * CChunkedHeap::compactRow(const void * ptr, HeapCompactState & state)
 {
-    Heaplet *prev = NULL;
-
     //Use protect heap instead of a lock, so that multiple compacts on the same heap (very likely) avoid
     //re-entering the critical sections.
     state.protectHeap(this);
     Heaplet *finger = state.next;
-    if (!finger)
-        finger = active;
-    while (finger)
+
+    //This loop currently walks through the heaplet list.  It *might* be more efficient to walk the list of
+    //heaplets with potential space
+    if (finger)
     {
-       //This cast is safe because we are within a member of CChunkedHeap
-        ChunkedHeaplet * chunkedFinger = static_cast<ChunkedHeaplet *>(finger);
-        const void *ret = chunkedFinger->moveRow(ptr);
-        if (ret)
+        loop
         {
-            //Instead of moving this block to the head of the list, save away the next block to try to put a block into
-            //since we know what all blocks before this must be filled.
-            state.next = finger;
-            //if (prev)
-            //    moveHeapletToHead(prev, finger);
-
-            HeapletBase *srcBase = HeapletBase::findBase(ptr);
-            if (srcBase->isEmpty())
+           //This cast is safe because we are within a member of CChunkedHeap
+            ChunkedHeaplet * chunkedFinger = static_cast<ChunkedHeaplet *>(finger);
+            const void *ret = chunkedFinger->moveRow(ptr);
+            if (ret)
             {
-                state.numPagesEmptied++;
-                //could call releaseEmptyPages(false) at this point since already in the crit section.
+                //Instead of moving this block to the head of the list, save away the next block to try to put a block into
+                //since we know what all blocks before this must be filled.
+                state.next = finger;
+
+                HeapletBase *srcBase = HeapletBase::findBase(ptr);
+                if (srcBase->isEmpty())
+                {
+                    state.numPagesEmptied++;
+                    //could call releaseEmptyPages(false) at this point since already in the crit section.
+                }
+                return ret;
             }
-            return ret;
+            dbgassertex((chunkedFinger->numChunks() == maxChunksPerPage()) || (chunkedFinger->numChunks() == 0));
+            finger = getNext(finger);
+
+            //Check if we have looped all the way around
+            if (finger == heaplets)
+                break;
         }
-        prev = finger;
-        dbgassertex((chunkedFinger->numChunks() == maxChunksPerPage()) || (chunkedFinger->numChunks() == 0));
-        finger = getNext(finger);
     }
     return ptr;
 }
@@ -4366,12 +4735,12 @@ extern void setMemoryStatsInterval(unsigned secs)
 extern void setTotalMemoryLimit(bool allowHugePages, bool allowTransparentHugePages, bool retainMemory, memsize_t max, memsize_t largeBlockSize, const unsigned * allocSizes, ILargeMemCallback * largeBlockCallback)
 {
     assertex(largeBlockSize == align_pow2(largeBlockSize, HEAP_ALIGNMENT_SIZE));
-    unsigned totalMemoryLimit = (unsigned) (max / HEAP_ALIGNMENT_SIZE);
-    unsigned largeBlockGranularity = (unsigned)(largeBlockSize / HEAP_ALIGNMENT_SIZE);
+    memsize_t totalMemoryLimit = (unsigned) (max / HEAP_ALIGNMENT_SIZE);
+    memsize_t largeBlockGranularity = (unsigned)(largeBlockSize / HEAP_ALIGNMENT_SIZE);
     if ((max != 0) && (totalMemoryLimit == 0))
         totalMemoryLimit = 1;
     if (memTraceLevel)
-        DBGLOG("RoxieMemMgr: Setting memory limit to %"I64F"d bytes (%u pages)", (unsigned __int64) max, totalMemoryLimit);
+        DBGLOG("RoxieMemMgr: Setting memory limit to %" I64F "d bytes (%" I64F "u pages)", (unsigned __int64) max, (unsigned __int64)totalMemoryLimit);
     initializeHeap(allowHugePages, allowTransparentHugePages, retainMemory, totalMemoryLimit, largeBlockGranularity, largeBlockCallback);
     initAllocSizeMappings(allocSizes ? allocSizes : defaultAllocSizes);
 }
@@ -5050,8 +5419,8 @@ protected:
 
     void testSizes()
     {
-        ASSERT(ChunkedHeaplet::dataOffset() == CACHE_LINE_SIZE);
-        ASSERT(HugeHeaplet::dataOffset() == CACHE_LINE_SIZE);
+        ASSERT(ChunkedHeaplet::dataOffset() % CACHE_LINE_SIZE == 0);
+        ASSERT(HugeHeaplet::dataOffset() % CACHE_LINE_SIZE == 0);
         ASSERT(FixedSizeHeaplet::chunkHeaderSize == 8);
         ASSERT(PackedFixedSizeHeaplet::chunkHeaderSize == 4);  // NOTE - this is NOT 8 byte aligned, so can't safely be used to allocate ptr arrays
 
@@ -5441,7 +5810,7 @@ protected:
     };
     void testHeapletCas()
     {
-        memsize_t maxMemory = numCasThreads * numCasIter * numCasAlloc * 32;
+        memsize_t maxMemory = (((memsize_t)numCasThreads * numCasIter) * numCasAlloc) * 32;
         //Because this is allocating from a single heaplet check if it can overflow the memory
         if (maxMemory > FixedSizeHeaplet::dataAreaSize())
             return;
@@ -5751,10 +6120,10 @@ protected:
 
         //NOTE: The efficiency of the packing does depend on the row order, so ideally this would test multiple orderings
         //of the array
-        unsigned rowsPerPage = (HEAP_ALIGNMENT_SIZE - sizeof(FixedSizeHeaplet)) / compactingAllocSize;
+        unsigned rowsPerPage = (HEAP_ALIGNMENT_SIZE - FixedSizeHeaplet::dataOffset()) / compactingAllocSize;
         unsigned numPagesBefore = rowManager->numPagesAfterCleanup(false);
         unsigned expectedPages = (numRowsLeft + rowsPerPage-1)/rowsPerPage;
-        ASSERT(numPagesFull == numPagesBefore);
+        CPPUNIT_ASSERT_EQUAL(numPagesFull, numPagesBefore);
         unsigned startTime = msTick();
         memsize_t compacted = rowManager->compactRows(numRows, rows);
         unsigned endTime = msTick();
@@ -5781,7 +6150,7 @@ protected:
         Owned<IFixedRowHeap> rowHeap1 = rowManager->createFixedRowHeap(compactingAllocSize-FixedSizeHeaplet::chunkHeaderSize, 0, 0);
         Owned<IFixedRowHeap> rowHeap2 = rowManager->createFixedRowHeap(compactingAllocSize-PackedFixedSizeHeaplet::chunkHeaderSize, 0, RHFpacked);
 
-        unsigned rowsPerPage = (HEAP_ALIGNMENT_SIZE - sizeof(FixedSizeHeaplet)) / compactingAllocSize;
+        unsigned rowsPerPage = (HEAP_ALIGNMENT_SIZE - FixedSizeHeaplet::dataOffset()) / compactingAllocSize;
         memsize_t maxRows = (useLargeMemory ? largeMemory : smallMemory) * rowsPerPage;
         testCompacting(rowManager, rowHeap1, maxRows, 50);
         testCompacting(rowManager, rowHeap1, maxRows, 800);

+ 3 - 2
roxie/roxiemem/roxiemem.hpp

@@ -37,12 +37,13 @@
 #define ROXIEMM_MEMORY_POOL_EXHAUSTED     ROXIEMM_ERROR_START+1
 #define ROXIEMM_INVALID_MEMORY_ALIGNMENT  ROXIEMM_ERROR_START+2
 #define ROXIEMM_HEAP_ERROR                ROXIEMM_ERROR_START+3
+#define ROXIEMM_TOO_MUCH_MEMORY           ROXIEMM_ERROR_START+4
 
 
 #ifdef __64BIT__
-#define HEAP_ALIGNMENT_SIZE I64C(0x100000u)                     // 1 mb heaplets - may be too big?
+#define HEAP_ALIGNMENT_SIZE I64C(0x40000u)                      // 256kb heaplets
 #else
-#define HEAP_ALIGNMENT_SIZE 0x100000                            // 1 mb heaplets - may be too big?
+#define HEAP_ALIGNMENT_SIZE 0x40000                             // 256kb heaplets
 #endif
 #define HEAP_ALIGNMENT_MASK ((~(HEAP_ALIGNMENT_SIZE)) + 1)
 #define ACTIVITY_MASK   0x00ffffff                              // must be > HEAP_ALIGNMENT_SIZE

+ 1 - 1
thorlcr/graph/thgraph.cpp

@@ -2506,7 +2506,7 @@ void CJobBase::init()
     resumed = false;
 
     bool crcChecking = 0 != getWorkUnitValueInt("THOR_ROWCRC", globals->getPropBool("@THOR_ROWCRC", false));
-    bool usePackedAllocator = 0 != getWorkUnitValueInt("THOR_PACKEDALLOCATOR", globals->getPropBool("@THOR_PACKEDALLOCATOR", false));
+    bool usePackedAllocator = 0 != getWorkUnitValueInt("THOR_PACKEDALLOCATOR", globals->getPropBool("@THOR_PACKEDALLOCATOR", true));
     unsigned memorySpillAt = (unsigned)getWorkUnitValueInt("memorySpillAt", globals->getPropInt("@memorySpillAt", 80));
     thorAllocator.setown(createThorAllocator(((memsize_t)globalMemorySize)*0x100000, memorySpillAt, *logctx, crcChecking, usePackedAllocator));