12 years ago · 29e65fba71
--- a/roxie/roxiemem/DOCUMENTATION.rst
+++ b/roxie/roxiemem/DOCUMENTATION.rst
@@ -0,0 +1,137 @@
 
				+========================
			
 
				+The Roxie Memory Manager
			
 
				+========================
			
 
				+
			
 
				+************
			
 
				+Introduction
			
 
				+************
			
 
				+
			
 
				+This memory manager started life as the memory manager which was only used for the Roxie engine.  It had several
			
 
				+original design goals:
			
 
				+
			
 
				+* Support link counted rows.
			
 
				+* Be as fast as possible on allocate and deallocate of small rows.
			
 
				+* Allow rows serialized from slaves to be used directly without being cloned first.
			
 
				+* Allow the memory used by a single query, or by all queries combined, to be limited, with graceful recovery.
			
 
				+* Isolate roxie queries from one another, so that one query can't bring
			
 
				+  down all the rest by allocating too much memory.
			
 
				+* Allow all the memory used by a query to be guaranteed to get freed when the query finishes, thus reducing the
			
 
				+  possibility of memory leaks.
			
 
				+* Predictable behaviour with no pathogenic cases.
			
 
				+
			
 
				+(Note that efficient usage of memory does not appear on that list - the expectation when the memory
			
 
				+manager was first designed was that Roxie queries would use minimal amounts of memory and speed was
			
 
				+more important.  Some subsequent changes e.g., Packed heaps help mitigate that.)
			
 
				+
			
 
				+The basic design is to reserve (but not commit) a single large block of memory in the virtual address space.  This
			
 
				+memory is subdivided into "pages".  (These are not the same as the os virtual memory pages.  The memory manager pages
			
 
				+are currently defined as 1Mb in size.)
			
 
				+
			
 
				+Memory is allocated from a set of "heaps.  Each heap owns a set of pages, and sub allocates memory of a
			
 
				+single size from those pages.  All allocations from a particular page belong to the same heap.  Rounding the requested
			
 
				+memory size up to the next heap-size means that memory
			
 
				+is not lost due to fragmentation.
			
 
				+
			
 
				+Information about each heap is stored in the base of the page (using a class with virtual functions) and the
			
 
				+address of an allocated row is masked to determine which heap object it belongs to, and how it should be linked/released
			
 
				+etc.  Any pointer not in the allocated virtual address (e.g., constant data) can be linked/released with no effect.
			
 
				+
			
 
				+Each allocation has a link count and an allocator id associated with it.  The allocator id represents the type of
			
 
				+the row, and is used to determine what destructor needs to be called when the row is destroyed.  (The row also
			
 
				+contains a flag to indicate if it is fully constructed so it is valid for the destructor to be called.)
			
 
				+
			
 
				+An implementation of IRowManager processes all allocations for a particular roxie query.  This provides the
			
 
				+mechanism for limiting how much memory a query uses.
			
 
				+
			
 
				+For fixed size allocations it is possible to get a more efficient interface for allocating rows.  There are options
			
 
				+to create unique fixed size heaps (to reduce thread contention) and packed heaps - where all rows share the same
			
 
				+allocator id.
			
 
				+
			
 
				+(Note to self: Is there ever any advantage having a heap that is unique but not packed??)
			
 
				+
			
 
				+****************
			
 
				+Dynamic Spilling
			
 
				+****************
			
 
				+
			
 
				+Thor has different requirements to roxie.  In roxie, if a query exceeds its memory requirements then it is terminated.  Thor
			
 
				+needs to be able to spill rows and other memory to disk and continue.  This is achieved by allowing any process that
			
 
				+stores buffered rows to register a callback with the memory manager.  When more memory is required these are called
			
 
				+to free up memory, and allow the job to continues.
			
 
				+
			
 
				+Each callback can specify a priority - lower priority callbacks are called first since they are assumed to have a
			
 
				+lower cost associated with spilling.  When more memory is required the callbacks are called in priority order until
			
 
				+one of them succeeds.  The can also be passed a flag to indicate it is critical to force them to free up as much memory
			
 
				+as possible.
			
 
				+
			
 
				+
			
 
				+Complications
			
 
				+=============
			
 
				+
			
 
				+There are several different complications involved with the memory spilling:
			
 
				+
			
 
				+* There will be many different threads allocating rows.
			
 
				+* Callbacks could be triggered at any time.
			
 
				+* There is a large scope for deadlock between the callbacks and allocations.
			
 
				+* It may be better to not resize a large array if rows had to be evicted to resize it.
			
 
				+* Filtered record streams can cause significant wasted space in the memory blocks.
			
 
				+* Resizing a multi-page allocation is non trivial.
			
 
				+
			
 
				+
			
 
				+Resizing Large memory blocks
			
 
				+============================
			
 
				+Some of the memory allocations cover more than one "page" - e.g., arrays used to store blocks of rows.  (These
			
 
				+are called huge pages internally, not to be confused with operating system support for huge pages...)  When
			
 
				+one of these memory blocks needs to be expanded you need to be careful:
			
 
				+
			
 
				+* Allocating a new page, copying, updating the pointer (within a cs) and then freeing is safe.  Unfortunately
			
 
				+  it may involve copying a large chunk of memory.  It may also fail if there isn't memory for the new and old
			
 
				+  block, even if the existing block could have been expanded into an adjacent block.
			
 
				+
			
 
				+* You can't lock, call a resize routine and update the pointer because the resize routine may need to allocate
			
 
				+  a new memory block- that may trigger a callback, which could in turn deadlock trying to gain the lock.
			
 
				+  (The callback may be from another thread...)
			
 
				+
			
 
				+* Therefore the memory manager contains a call which allows you to resize a block, but with a callback
			
 
				+  which is used to atomically update the pointer so it always remains thead safe.
			
 
				+
			
 
				+
			
 
				+Compacting heaps
			
 
				+================
			
 
				+Occasionally you have processes which read a large number of rows and then filter them so only a few are still
			
 
				+held in memory.  Rows tend to be allocated in sequence through the heap pages, which can mean those few remaining
			
 
				+rows are scattered over many pages.  If they could all be moved to a single page it would free up a significant
			
 
				+amount of memory.
			
 
				+
			
 
				+The memory manager contains a function to pack a set of rows into a smaller number of pages: IRowManager->compactRows().
			
 
				+
			
 
				+This works by iterating through each of the rows in a list.  If the row belongs to a heap that could be compacted,
			
 
				+and isn't part of a full heaplet, then the row is moved.  Since subsequent rows tend to be allocated from the same
			
 
				+heaplet this has the effect of compacting the rows.
			
 
				+
			
 
				+Rules
			
 
				+=====
			
 
				+Some rules to follow when implementing callbacks:
			
 
				+
			
 
				+* A callback cannot allocate any memory from the memory manager.  If it does it is likely to deadlock.
			
 
				+
			
 
				+* You cannot allocate memory while holding a lock if that lock is also required by a callback.
			
 
				+
			
 
				+  Again this will cause deadlock.  If it proves impossible you can use a try-lock primitive in the callback,
			
 
				+  but it means you won't be able to spill those rows.
			
 
				+
			
 
				+* If the heaps are fragmented it may be more efficient to repack the heaps than spill to disk.
			
 
				+
			
 
				+* If you're resizing a potentially big block of memory use the resize function with the callback.
			
 
				+
			
 
				+*************
			
 
				+Shared Memory
			
 
				+*************
			
 
				+
			
 
				+Much of the time Thor doesn't uses full memory available to it.  If you are running multiple Thor processes
			
 
				+on the same machine you may want to configure the system so that each Thor has a private block of memory,
			
 
				+but there is also a shared block of memory which can be used by whichever process needs it.
			
 
				+
			
 
				+The ILargeMemCallback provides a mechanism to dynamically allocate more memory to a process as it requires it.
			
 
				+This could potentially be done in stages rather than all or nothing.
			
 
				+
			
 
				+(Currently unused as far as I know...)
			
--- a/roxie/roxiemem/README.rst
+++ b/roxie/roxiemem/README.rst
@@ -0,0 +1,5 @@
 
				+This directory contains the memory manager which is used by all of the engines.
			
 
				+
			
 
				+More details of how to use it and its implementation are found in the `Memory Manager Documentation`_.
			
 
				+
			
 
				+.. _Memory Manager Documentation: DOCUMENTATION.rst
			
--- a/roxie/roxiemem/sourcedoc.xml
+++ b/roxie/roxiemem/sourcedoc.xml
@@ -1,77 +0,0 @@
 
				-<?xml version="1.0" encoding="utf-8"?>
			
 
				-<!--
			
 
				-################################################################################
			
 
				-#    HPCC SYSTEMS software Copyright (C) 2012 HPCC Systems.
			
 
				-#
			
 
				-#    Licensed under the Apache License, Version 2.0 (the "License");
			
 
				-#    you may not use this file except in compliance with the License.
			
 
				-#    You may obtain a copy of the License at
			
 
				-#
			
 
				-#       http://www.apache.org/licenses/LICENSE-2.0
			
 
				-#
			
 
				-#    Unless required by applicable law or agreed to in writing, software
			
 
				-#    distributed under the License is distributed on an "AS IS" BASIS,
			
 
				-#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
			
 
				-#    See the License for the specific language governing permissions and
			
 
				-#    limitations under the License.
			
 
				-################################################################################
			
 
				--->
			
 
				-<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">
			
 
				-<section>
			
 
				-    <title>roxie/roxiemem</title>
			
 
				-
			
 
				-    <para>
			
 
				-        The roxie/roxiemem directory contains the sources for the roxie/roxiemem library. This is used by roxie
			
 
				-        and eclagent (and possibly in the future Thor) to allocate memory for ECL data rows. Note that it is NOT used
			
 
				-        for allocation of other objects used in queries.
			
 
				-    </para>
			
 
				-
			
 
				-    <para>
			
 
				-        The roxiemem memory manager's design goals were:
			
 
				-        <orderedlist>
			
 
				-            <listitem><para>
			
 
				-                Allow the memory used by a single query, or by all queries combined, to be limited, with graceful recovery.
			
 
				-            </para></listitem>
			
 
				-            <listitem><para>
			
 
				-                Allow all the memory used by a query to be guaranteed to get freed when the query finishes, thus reducing the
			
 
				-                possibility of memory leaks.
			
 
				-            </para></listitem>
			
 
				-            <listitem><para>
			
 
				-                Support link-counted rows without having to copy serialized data from slaves.
			
 
				-            </para></listitem>
			
 
				-            <listitem><para>
			
 
				-                Predictable behaviour with no pathogenic cases.
			
 
				-            </para></listitem>
			
 
				-            <listitem><para>
			
 
				-                Be as fast as possible on allocate and deallocate of small rows.
			
 
				-            </para></listitem>
			
 
				-        </orderedlist>
			
 
				-    </para>
			
 
				-    <para>
			
 
				-        Note that efficient usage of memory does not appear on that list - the expectation when the memory
			
 
				-        manager was first designed was that Roxie queries would use minimal amounts of memory and speed was
			
 
				-        more important.
			
 
				-    </para>
			
 
				-    <para>
			
 
				-        In order to extend roxiemem to support Thor usage too, some changes are needed to these design goals:
			
 
				-        <orderedlist>
			
 
				-            <listitem><para>
			
 
				-                As thor is not executing multiple queries in the same process space, the per-query limiting is
			
 
				-                not useful.
			
 
				-            </para></listitem>
			
 
				-            <listitem><para>
			
 
				-                Thor cares a lot more than Roxie about efficient use of memory as well as pure speed of allocation.
			
 
				-            </para></listitem>
			
 
				-            <listitem><para>
			
 
				-                Thor typically needs to spill and continue when row memory exceeds available, rather than failing the query.
			
 
				-                For this to work it needs to be able to tell when memory is running out.
			
 
				-            </para></listitem>
			
 
				-        </orderedlist>
			
 
				-    </para>
			
 
				-    <para>
			
 
				-       RoxieMem uses a chunking allocation paradigm to avoid any possibility of fragmentation (which helps keep the behaviour
			
 
				-       predictable). The patterns of allocation are reasonably well known (and we can use information from the ECL compiler
			
 
				-       to tell us more - for example - the sizes of record that are going to be allocated by a query). Where variable-size
			
 
				-       rows are in use this is less predictable though we still know the pattern of sizes used for expanding.
			
 
				-    </para>
			
 
				-</section>
			
--- a/roxie/sourcedoc.xml
+++ b/roxie/sourcedoc.xml
@@ -27,7 +27,6 @@
 
				     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="roxie/sourcedoc.xml" />
			
 
				     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="roxieclient/sourcedoc.xml" />
			
 
				     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="roxieclient/sourcedoc.xml" />
			
 
				-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="roxiemem/sourcedoc.xml" />
			
 
				     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="roxiepipe/sourcedoc.xml" />
			
 
				     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="udplib/sourcedoc.xml" />
			
 
				 </section>