9 лет назад · c03bf606eb
--- a/README.md
+++ b/README.md
@@ -1,16 +1,62 @@
 
				 Hercules
			
 
				 --------
			
 
				 
			
 
				-This tool calculates the weekly lines burnout in a Git repository.
			
 
				+This tool calculates the lines burnout stats in a Git repository.
			
 
				+Exactly the same what [git-of-theseus](https://github.com/erikbern/git-of-theseus)
			
 
				+does actually, but using [go-git](https://github.com/src-d/go-git).
			
 
				+Why? [source{d}](http://sourced.tech) builds it's own data pipeline to
			
 
				+process every git repository in the world and the calculation of the
			
 
				+annual burnout ratio will be embedded into it. This project is the
			
 
				+open source implementation of the specific `git blame` flavour on top
			
 
				+of go-git. It is done incrementally using the custom RB tree tracking
			
 
				+algorithm, only the last modification date is recorded.
			
 
				 
			
 
				-###Usage
			
 
				+There are two tools: `hercules` and `labours.py`. The first is the program
			
 
				+written in Go which collects the burnout stats from a Git repository.
			
 
				+The second is the Python script which draws the stack area plot. They
			
 
				+are normally used together through a pipe. `hercules` prints
			
 
				+text results. The first line is three numbers: UNIX timestamp which
			
 
				+corresponds to the time the repository was created, *granularity* and *sampling*.
			
 
				+Granularity is the number of days each band in the stack consists of. For example,
			
 
				+to get the annual burnout plot, set granularity to 365. Sampling is the
			
 
				+frequency with which the burnout is snapshotted. The smaller the value,
			
 
				+the more smooth is the plot but the more work is done.
			
 
				+
			
 
				+###Installation
			
 
				+You are going to need Go and Python 2 or 3.
			
 
				+```
			
 
				+go get gopkg.in/src-d/hercules.v1/cmd/hercules
			
 
				+pip install pandas seaborn
			
 
				+wget https://github.com/src-d/hercules/raw/master/labours.py
			
 
				+```
			
 
				 
			
 
				+###Usage
			
 
				 ```
			
 
				+# Use "memory" go-git backend and display the plot. This is the fastest but the repository data must fit into RAM.
			
 
				 hercules https://github.com/src-d/go-git | python3 labours.py
			
 
				-hercules /path/to/cloned/go-git | python3 labours.py
			
 
				-hercules https://github.com/torvalds/linux /tmp/linux_cache | python3 labours.py
			
 
				-git rev-list HEAD | tac | hercules -commits -sampling 7 - https://github.com/src-d/go-git | python3 labours.py
			
 
				+# Use "file system" go-git backend and print the raw data.
			
 
				+hercules /path/to/cloned/go-git
			
 
				+#  Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache and display the plot.
			
 
				+hercules https://github.com/git/git /tmp/repo-cache | python3 labours.py
			
 
				+
			
 
				+# Now something fun
			
 
				+# Get the linear history from git rev-list, reverse it
			
 
				+# Pipe to hercules, produce the snapshot every 30 days with 1 year grouping
			
 
				+# Save the raw data to cache.txt, so that later simply cat cache.txt | python3 labours.py
			
 
				+# Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
			
 
				+git rev-list HEAD | tac | hercules -commits - -sampling 30 -granularity 365 https://github.com/git/git | tee cache.txt | python3 labours.py --font-size 16 --backend Agg --output git.png
			
 
				 ```
			
 
				 
			
 
				+###Caveats
			
 
				+1. Currently, go-git's "diff tree" algorithm's complexity is n log(n) where
			
 
				+n is the number of files in the tree. Git's and libgit2's complexity
			
 
				+is sublinear, almost constant because they are comparing the hashes of subtrees. go-git
			
 
				+will have the same complexity in the very near future.
			
 
				+
			
 
				+2. Currently, go-git's "file system" backend does not cache anything in memory.
			
 
				+Every object retrieval operation decompresses the packfiles, parses them, etc.
			
 
				+Effectively, the performance **slowdown** is **100x**. This will be fixed
			
 
				+in the near future too.
			
 
				+
			
 
				 ###License
			
 
				 MIT.
			
--- a/labours.py
+++ b/labours.py
@@ -5,6 +5,11 @@ import sys
 
				 import numpy
			
 
				 
			
 
				 
			
 
				+if sys.version_info[0] < 3:
			
 
				+    # OK, ancients, I will support Python 2, but you owe me a beer
			
 
				+    input = raw_input
			
 
				+
			
 
				+
			
 
				 def parse_args():
			
 
				     parser = argparse.ArgumentParser()
			
 
				     parser.add_argument("--output", default="",