8 tahun lalu · b13d49411e
--- a/README.md
+++ b/README.md
@@ -1,29 +1,19 @@
 
																 Hercules [![Build Status](https://travis-ci.org/src-d/hercules.svg?branch=master)](https://travis-ci.org/src-d/hercules) [![codecov](https://codecov.io/github/src-d/hercules/coverage.svg)](https://codecov.io/gh/src-d/hercules)
															
 
																 --------
															
 
																-This project calculates and plots the lines burndown and other fun stats in Git repositories.
															
 
																-Exactly the same what [git-of-theseus](https://github.com/erikbern/git-of-theseus)
															
 
																-does actually, but using [go-git](https://github.com/src-d/go-git).
															
 
																-Why? [source{d}](http://sourced.tech) builds it's own data pipeline to
															
 
																-process every git repository in the world and the calculation of the
															
 
																-annual burnout ratio will be embedded into it. `hercules` contains an
															
 
																-open source implementation of the specific `git blame` flavour on top
															
 
																-of go-git. Blaming is performed incrementally using the custom RB tree tracking
															
 
																-algorithm, only the last modification date is recorded.
															
 
																+Amazingly fast and highly customizable Git repository analysis engine written in Go. Batteries included.
															
 
																+Powered by [go-git](https://github.com/src-d/go-git) and [Babelfish](https://doc.bblf.sh).
															
 
																 There are two tools: `hercules` and `labours.py`. The first is the program
															
 
																-written in Go which collects the burndown and other stats from a Git repository.
															
 
																-The second is the Python script which draws the stack area plots and optionally
															
 
																-resamples the time series. These two tools are normally used together through
															
 
																-the pipe. `hercules` prints results in plain text. The first line is four numbers:
															
 
																-UNIX timestamp which corresponds to the time the repository was created,
															
 
																-UNIX timestamp of the last commit, *granularity* and *sampling*.
															
 
																-Granularity is the number of days each band in the stack consists of. Sampling
															
 
																-is the frequency with which the burnout state is snapshotted. The smaller the
															
 
																-value, the more smooth is the plot but the more work is done.
															
 
																+written in Go which takes a Git repository and runs a Directed Acyclic Graph (DAG) of analysis tasks.
															
 
																+The second is the Python script which draws some predefined plots. These two tools are normally used together through
															
 
																+a pipe. It is possible to write custom analyses using the plugin system.
															
 
																+
															
 
																+![git/git image](doc/dag.png)
															
 
																+<p align="center">The DAG of burndown and couples analyses with UAST diff refining. Generated with <code>hercules -burndown -burndown-people -couples -feature=uast -dry-run -dump-dag doc/dag.dot https://github.com/src-d/hercules</code></p>
															
 
																 ![git/git image](doc/linux.png)
															
 
																-<p align="center">torvalds/linux burndown (granularity 30, sampling 30, resampled by year)</p>
															
 
																+<p align="center">torvalds/linux line burndown (granularity 30, sampling 30, resampled by year)</p>
															
 
																 There is an option to resample the bands inside `labours.py`, so that you can
															
 
																 define a very precise distribution and visualize it different ways. Besides,
															
@@ -40,26 +30,28 @@ cd $GOPATH/src/gopkg.in/hercules.v3/cmd/hercules
 
																 make
															
 
																 ```
															
 
																-The first command is going to fail - this is intended.
															
 
																+The first command fails with `libuast.h` not found - this is expected. Pretend that nothing has
															
 
																+happened and carry on.
															
 
																 #### Windows
															
 
																 Numpy and SciPy are requirements. Install the correct version by downloading the wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy.
															
 
																+Couples analysis also needs Tensorflow.
															
 
																 ### Usage
															
 
																 ```
															
 
																-# Use "memory" go-git backend and display the plot. This is the fastest but the repository data must fit into RAM.
															
 
																-hercules https://github.com/src-d/go-git | python3 labours.py --resample month
															
 
																-# Use "file system" go-git backend and print the raw data.
															
 
																+# Use "memory" go-git backend and display the burndown plot. "memory" is the fastest but the repository's git data must fit into RAM.
															
 
																+hercules -burndown https://github.com/src-d/go-git | python3 labours.py -m project --resample month
															
 
																+# Use "file system" go-git backend and print some basic information about the repository.
															
 
																 hercules /path/to/cloned/go-git
															
 
																-# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache, use Protocol Buffers and display the unresampled plot.
															
 
																-hercules -pb https://github.com/git/git /tmp/repo-cache | python3 labours.py -f pb --resample raw
															
 
																+# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache, use Protocol Buffers and display the burndown plot without resampling.
															
 
																+hercules -burndown -pb https://github.com/git/git /tmp/repo-cache | python3 labours.py -m project -f pb --resample raw
															
 
																 # Now something fun
															
 
																 # Get the linear history from git rev-list, reverse it
															
 
																-# Pipe to hercules, produce the snapshots for every 30 days grouped by 30 days
															
 
																+# Pipe to hercules, produce burndown snapshots for every 30 days grouped by 30 days
															
 
																 # Save the raw data to cache.yaml, so that later is possible to python3 labours.py -i cache.yaml
															
 
																 # Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
															
 
																-git rev-list HEAD | tac | hercules -commits - https://github.com/git/git | tee cache.yaml | python3 labours.py --font-size 16 --backend Agg --output git.png
															
 
																+git rev-list HEAD | tac | hercules -commits - -burndown https://github.com/git/git | tee cache.yaml | python3 labours.py -m project --font-size 16 --backend Agg --output git.png
															
 
																 ```
															
 
																 `labours.py -i /path/to/yaml` allows to read the output from `hercules` which was saved on disk.
															
@@ -74,21 +66,38 @@ corresponding directory instead of cloning from scratch:
 
																 hercules https://github.com/git/git /tmp/repo-cache
															
 
																 # Second time - use the cache
															
 
																-hercules /tmp/repo-cache
															
 
																+hercules -some-analysis /tmp/repo-cache
															
 
																 ```
															
 
																 #### Docker image
															
 
																 ```
															
 
																-docker run --rm srcd/hercules hercules -pb https://github.com/git/git | docker run --rm -i -v $(pwd):/io srcd/hercules labours.py -f pb -o /io/git_git.png
															
 
																+docker run --rm srcd/hercules hercules -burndown -pb https://github.com/git/git | docker run --rm -i -v $(pwd):/io srcd/hercules labours.py -f pb -m project -o /io/git_git.png
															
 
																 ```
															
 
																-### Extensions
															
 
																+### Built-in analyses
															
 
																+
															
 
																+#### Project burndown
															
 
																+
															
 
																+```
															
 
																+hercules -burndown
															
 
																+python3 labours.py -m project
															
 
																+```
															
 
																+
															
 
																+Line burndown statistics for the whole repository.
															
 
																+Exactly the same what [git-of-theseus](https://github.com/erikbern/git-of-theseus)
															
 
																+does but much faster. Blaming is performed efficiently and incrementally using a custom RB tree tracking
															
 
																+algorithm, and only the last modification date is recorded while running the analysis.
															
 
																+
															
 
																+All burndown analyses depend on the values of *granularity* and *sampling*.
															
 
																+Granularity is the number of days each band in the stack consists of. Sampling
															
 
																+is the frequency with which the burnout state is snapshotted. The smaller the
															
 
																+value, the more smooth is the plot but the more work is done.
															
 
																 #### Files
															
 
																 ```
															
 
																-hercules -files
															
 
																+hercules -burndown -burndown-files
															
 
																 python3 labours.py -m files
															
 
																 ```
															
@@ -97,11 +106,11 @@ Burndown statistics for every file in the repository which is alive in the lates
 
																 #### People
															
 
																 ```
															
 
																-hercules -people [-people-dict=/path/to/identities]
															
 
																+hercules -burndown -burndown-people [-people-dict=/path/to/identities]
															
 
																 python3 labours.py -m person
															
 
																 ```
															
 
																-Burndown statistics for developers. If `-people-dict` is not specified, the identities are
															
 
																+Burndown statistics for the repository's contributors. If `-people-dict` is not specified, the identities are
															
 
																 discovered by the following algorithm:
															
 
																 0. We start from the root commit towards the HEAD. Emails and names are converted to lower case.
															
@@ -121,7 +130,7 @@ by `|`. The case is ignored.
 
																 <p align="center">Wireshark top 20 devs - churn matrix</p>
															
 
																 ```
															
 
																-hercules -people [-people-dict=/path/to/identities]
															
 
																+hercules -burndown -burndown-people [-people-dict=/path/to/identities]
															
 
																 python3 labours.py -m churn_matrix
															
 
																 ```
															
@@ -143,7 +152,7 @@ The sequence of developers is stored in `people_sequence` YAML node.
 
																 <p align="center">Ember.js top 20 devs - code ownership</p>
															
 
																 ```
															
 
																-hercules -people [-people-dict=/path/to/identities]
															
 
																+hercules -burndown -burndown-people [-people-dict=/path/to/identities]
															
 
																 python3 labours.py -m ownership
															
 
																 ```
															
@@ -176,10 +185,14 @@ can be visualized with t-SNE implemented in TF Projector.
 
																 #### Everything in a single pass
															
 
																 ```
															
 
																-hercules -files -people -couples [-people-dict=/path/to/identities]
															
 
																+hercules -burndown -burndown-files -burndown-people -couples [-people-dict=/path/to/identities]
															
 
																 python3 labours.py -m all
															
 
																 ```
															
 
																+### Plugins
															
 
																+
															
 
																+Hercules has a plugin system and allows to run custom analyses. See [PLUGINS.md](PLUGINS.md).
															
 
																+
															
 
																 ### Bad unicode errors
															
 
																 YAML does not support the whole range of Unicode characters and the parser on `labours.py` side
															
@@ -187,7 +200,7 @@ may raise exceptions. Filter the output from `hercules` through `fix_yaml_unicod
 
																 such offending characters.
															
 
																 ```
															
 
																-hercules -people https://github.com/... | python3 fix_yaml_unicode.py | python3 labours.py -m people
															
 
																+hercules -burndown -burndown-people https://github.com/... | python3 fix_yaml_unicode.py | python3 labours.py -m people
															
 
																 ```
															
 
																 ### Plotting
															
--- a/doc/dag.dot
+++ b/doc/dag.dot
@@ -0,0 +1,27 @@
 
																+digraph Hercules {
															
 
																+  "6 BlobCache" -> "7 [blob_cache]"
															
 
																+  "0 DaysSinceStart" -> "3 [day]"
															
 
																+  "10 FileDiff" -> "12 [file_diff]"
															
 
																+  "16 FileDiffRefiner" -> "17 Burndown"
															
 
																+  "1 IdentityDetector" -> "4 [author]"
															
 
																+  "8 RenameAnalysis" -> "17 Burndown"
															
 
																+  "8 RenameAnalysis" -> "9 Couples"
															
 
																+  "8 RenameAnalysis" -> "10 FileDiff"
															
 
																+  "8 RenameAnalysis" -> "11 UAST"
															
 
																+  "8 RenameAnalysis" -> "14 UASTChanges"
															
 
																+  "2 TreeDiff" -> "5 [changes]"
															
 
																+  "11 UAST" -> "13 [uasts]"
															
 
																+  "14 UASTChanges" -> "15 [changed_uasts]"
															
 
																+  "4 [author]" -> "17 Burndown"
															
 
																+  "4 [author]" -> "9 Couples"
															
 
																+  "7 [blob_cache]" -> "17 Burndown"
															
 
																+  "7 [blob_cache]" -> "10 FileDiff"
															
 
																+  "7 [blob_cache]" -> "8 RenameAnalysis"
															
 
																+  "7 [blob_cache]" -> "11 UAST"
															
 
																+  "15 [changed_uasts]" -> "16 FileDiffRefiner"
															
 
																+  "5 [changes]" -> "6 BlobCache"
															
 
																+  "5 [changes]" -> "8 RenameAnalysis"
															
 
																+  "3 [day]" -> "17 Burndown"
															
 
																+  "12 [file_diff]" -> "16 FileDiffRefiner"
															
 
																+  "13 [uasts]" -> "14 UASTChanges"
															
 
																+}
															
--- a/doc/dag.png
+++ b/doc/dag.png