Browse Source

Add the docs about hibernation

Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
Vadim Markovtsev 6 years ago
parent
commit
4924d51f9b
5 changed files with 100 additions and 4 deletions
  1. 13 4
      README.md
  2. 32 0
      doc/HIBERNATION.md
  3. 18 0
      doc/PIPELINE_ITEMS.md
  4. 37 0
      doc/hibernateable_pipeline_item.dot
  5. BIN
      doc/hibernateable_pipeline_item.png

+ 13 - 4
README.md

@@ -388,14 +388,12 @@ contain `"type"` which reflects the plot kind.
 
 1. Processing all the commits may fail in some rare cases. If you get an error similar to https://github.com/src-d/hercules/issues/106
 please report there and specify `--first-parent` as a workaround.
-1. Currently, go-git's file system storage backend is considerably slower than the in-memory one,
-so you should clone repos instead of reading them from disk whenever possible. Please note that the
-in-memory storage may require much RAM, for example, the Linux kernel takes over 200GB in 2017.
+1. Burndown collection may fail with an Out-Of-Memory error. See the next session for the workarounds.
 1. Parsing YAML in Python is slow when the number of internal objects is big. `hercules`' output
 for the Linux kernel in "couples" mode is 1.5 GB and takes more than an hour / 180GB RAM to be
 parsed. However, most of the repositories are parsed within a minute. Try using Protocol Buffers
 instead (`hercules --pb` and `labours.py -f pb`).
-1. To speed-up yaml parsing
+1. To speed up yaml parsing
    ```
    # Debian, Ubuntu
    apt install libyaml-dev
@@ -406,3 +404,14 @@ instead (`hercules --pb` and `labours.py -f pb`).
    pip uninstall pyyaml
    pip --no-cache-dir install pyyaml
    ```
+   
+### Burndown Out-Of-Memory
+
+If the analyzed repository is big and extensively uses branching, the burndown stats collection may
+fail with an OOM. You should try the following:
+
+1. Read the repo from disk instead of cloning into memory.
+2. Use `--skip-blacklist` to avoid analyzing the unwanted files. It is also possible to constrain the `--language`.
+3. Use the [hibernation](doc/HIBERNATION.md) feature: `--hibernation-distance 10 --burndown-hibernation-threshold=1000`. Play with those two numbers to start hibernating right before the OOM.
+4. Hibernate on disk: `--burndown-hibernation-disk --burndown-hibernation-dir /path`.
+5. `--first-parent`, you win.

+ 32 - 0
doc/HIBERNATION.md

@@ -0,0 +1,32 @@
+# Hibernation
+
+Hercules supports signalling pipeline items when they are not going to be needed for some period
+n the future and when they are going to be used after that period.
+Pipeline items which support hibernation are expected to compress and decompress their data
+corresponding to the described signals.
+This mechanism is called *hibernation*. It can be used in the cases when there many parallel
+branches and the free operating memory runs too small.
+Hibernation is a special analysis mode and is disabled by default. It can be enabled with
+
+```
+hercules --burndown-distance N
+```
+
+where N is the minimum distance between two sequential usages of a branch to hibernate it.
+The distance is measured in the number of commits, forks, merges. etc. in the linear execution plan.
+Usually 10 is a good default; the bigger N, the less hibernation operations,
+the faster the analysis but the bigger memory pressure.
+
+There is also `--hibernate-disk` flag which maintains 
+
+## Burndown
+
+The burndown analysis' hibernation compresses the blame information about files with LZ4 algorithm.
+It works very effectively and is actually better than zlib according to the tests.
+There are some further defined flags:
+
+`--burndown-hibernation-threshold N` is the minimum number of files registered in a branch to start hibernating.
+
+`--burndown-hibernation-disk` dumps the compressed blame info on disk instead of keeping them in memory.
+
+`--burndown-hibernation-dir` sets the path for the previous feature.

+ 18 - 0
doc/PIPELINE_ITEMS.md

@@ -71,3 +71,21 @@ type ResultMergeablePipelineItem interface {
 ```
 
 ![ResultMergeablePipelineItem](result_mergeable_pipeline_item.png)
+
+### HibernateablePipelineItem (optional)
+
+See [what is hibernation](HIBERNATION.md).
+
+```go
+// HibernateablePipelineItem is the interface to allow pipeline items to be frozen (compacted, unloaded)
+// while they are not needed in the hosting branch.
+type HibernateablePipelineItem interface {
+	PipelineItem
+	// Hibernate signals that the item is temporarily not needed and it's memory can be optimized.
+	Hibernate() error
+	// Boot signals that the item is needed again and must be de-hibernate-d.
+	Boot() error
+}
+```
+
+![HibernateablePipelineItem](hibernateable_pipeline_item.png)

+ 37 - 0
doc/hibernateable_pipeline_item.dot

@@ -0,0 +1,37 @@
+digraph PipelineItem {
+  Name -> Registration
+  Provides -> Registration
+  Registration -> Resolution
+  Requires -> Resolution
+  Resolution -> Configure
+  Flag -> "Command Line"
+  ListConfigurationOptions -> "Command Line"
+  "Command Line" -> Configure
+  Configure -> Initialize
+  Repository -> Initialize
+  Initialize -> Consume
+  Commits -> Consume
+  Consume -> Consume
+  Consume -> Fork
+  Fork -> Consume
+  Consume -> Merge
+  Merge -> Consume
+  Merge -> "<disposal>"
+  Merge -> Finalize
+  Consume -> Hibernate
+  Hibernate -> Boot
+  Boot -> Consume
+  Consume -> Finalize
+  Finalize -> Result
+  Result -> Serialize
+  Serialize -> YAML
+  Serialize -> "Protocol Buffers"
+  Registration [style=filled, fillcolor=dimgray, fontcolor=white]
+  Resolution [style=filled, fillcolor=dimgray, fontcolor=white]
+  "Command Line" [style=filled, fillcolor=dimgray, fontcolor=white]
+  Repository [style=filled, fillcolor=gray]
+  Commits [style=filled, fillcolor=gray]
+  Result [style=filled, fillcolor=gray]
+  YAML [style=filled, fillcolor=gray]
+  "Protocol Buffers" [style=filled, fillcolor=gray]
+}

BIN
doc/hibernateable_pipeline_item.png