Vadim Markovtsev пре 7 година
родитељ
комит
b13d49411e
3 измењених фајлова са 77 додато и 37 уклоњено
  1. 50 37
      README.md
  2. 27 0
      doc/dag.dot
  3. BIN
      doc/dag.png

+ 50 - 37
README.md

@@ -1,29 +1,19 @@
 Hercules [![Build Status](https://travis-ci.org/src-d/hercules.svg?branch=master)](https://travis-ci.org/src-d/hercules) [![codecov](https://codecov.io/github/src-d/hercules/coverage.svg)](https://codecov.io/gh/src-d/hercules)
 Hercules [![Build Status](https://travis-ci.org/src-d/hercules.svg?branch=master)](https://travis-ci.org/src-d/hercules) [![codecov](https://codecov.io/github/src-d/hercules/coverage.svg)](https://codecov.io/gh/src-d/hercules)
 --------
 --------
 
 
-This project calculates and plots the lines burndown and other fun stats in Git repositories.
-Exactly the same what [git-of-theseus](https://github.com/erikbern/git-of-theseus)
-does actually, but using [go-git](https://github.com/src-d/go-git).
-Why? [source{d}](http://sourced.tech) builds it's own data pipeline to
-process every git repository in the world and the calculation of the
-annual burnout ratio will be embedded into it. `hercules` contains an
-open source implementation of the specific `git blame` flavour on top
-of go-git. Blaming is performed incrementally using the custom RB tree tracking
-algorithm, only the last modification date is recorded.
+Amazingly fast and highly customizable Git repository analysis engine written in Go. Batteries included.
+Powered by [go-git](https://github.com/src-d/go-git) and [Babelfish](https://doc.bblf.sh).
 
 
 There are two tools: `hercules` and `labours.py`. The first is the program
 There are two tools: `hercules` and `labours.py`. The first is the program
-written in Go which collects the burndown and other stats from a Git repository.
-The second is the Python script which draws the stack area plots and optionally
-resamples the time series. These two tools are normally used together through
-the pipe. `hercules` prints results in plain text. The first line is four numbers:
-UNIX timestamp which corresponds to the time the repository was created,
-UNIX timestamp of the last commit, *granularity* and *sampling*.
-Granularity is the number of days each band in the stack consists of. Sampling
-is the frequency with which the burnout state is snapshotted. The smaller the
-value, the more smooth is the plot but the more work is done.
+written in Go which takes a Git repository and runs a Directed Acyclic Graph (DAG) of analysis tasks.
+The second is the Python script which draws some predefined plots. These two tools are normally used together through
+a pipe. It is possible to write custom analyses using the plugin system.
+
+![git/git image](doc/dag.png)
+<p align="center">The DAG of burndown and couples analyses with UAST diff refining. Generated with <code>hercules -burndown -burndown-people -couples -feature=uast -dry-run -dump-dag doc/dag.dot https://github.com/src-d/hercules</code></p>
 
 
 ![git/git image](doc/linux.png)
 ![git/git image](doc/linux.png)
-<p align="center">torvalds/linux burndown (granularity 30, sampling 30, resampled by year)</p>
+<p align="center">torvalds/linux line burndown (granularity 30, sampling 30, resampled by year)</p>
 
 
 There is an option to resample the bands inside `labours.py`, so that you can
 There is an option to resample the bands inside `labours.py`, so that you can
 define a very precise distribution and visualize it different ways. Besides,
 define a very precise distribution and visualize it different ways. Besides,
@@ -40,26 +30,28 @@ cd $GOPATH/src/gopkg.in/hercules.v3/cmd/hercules
 make
 make
 ```
 ```
 
 
-The first command is going to fail - this is intended.
+The first command fails with `libuast.h` not found - this is expected. Pretend that nothing has
+happened and carry on.
 
 
 #### Windows
 #### Windows
 Numpy and SciPy are requirements. Install the correct version by downloading the wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy.
 Numpy and SciPy are requirements. Install the correct version by downloading the wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy.
+Couples analysis also needs Tensorflow.
 
 
 ### Usage
 ### Usage
 ```
 ```
-# Use "memory" go-git backend and display the plot. This is the fastest but the repository data must fit into RAM.
-hercules https://github.com/src-d/go-git | python3 labours.py --resample month
-# Use "file system" go-git backend and print the raw data.
+# Use "memory" go-git backend and display the burndown plot. "memory" is the fastest but the repository's git data must fit into RAM.
+hercules -burndown https://github.com/src-d/go-git | python3 labours.py -m project --resample month
+# Use "file system" go-git backend and print some basic information about the repository.
 hercules /path/to/cloned/go-git
 hercules /path/to/cloned/go-git
-# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache, use Protocol Buffers and display the unresampled plot.
-hercules -pb https://github.com/git/git /tmp/repo-cache | python3 labours.py -f pb --resample raw
+# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache, use Protocol Buffers and display the burndown plot without resampling.
+hercules -burndown -pb https://github.com/git/git /tmp/repo-cache | python3 labours.py -m project -f pb --resample raw
 
 
 # Now something fun
 # Now something fun
 # Get the linear history from git rev-list, reverse it
 # Get the linear history from git rev-list, reverse it
-# Pipe to hercules, produce the snapshots for every 30 days grouped by 30 days
+# Pipe to hercules, produce burndown snapshots for every 30 days grouped by 30 days
 # Save the raw data to cache.yaml, so that later is possible to python3 labours.py -i cache.yaml
 # Save the raw data to cache.yaml, so that later is possible to python3 labours.py -i cache.yaml
 # Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
 # Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
-git rev-list HEAD | tac | hercules -commits - https://github.com/git/git | tee cache.yaml | python3 labours.py --font-size 16 --backend Agg --output git.png
+git rev-list HEAD | tac | hercules -commits - -burndown https://github.com/git/git | tee cache.yaml | python3 labours.py -m project --font-size 16 --backend Agg --output git.png
 ```
 ```
 
 
 `labours.py -i /path/to/yaml` allows to read the output from `hercules` which was saved on disk.
 `labours.py -i /path/to/yaml` allows to read the output from `hercules` which was saved on disk.
@@ -74,21 +66,38 @@ corresponding directory instead of cloning from scratch:
 hercules https://github.com/git/git /tmp/repo-cache
 hercules https://github.com/git/git /tmp/repo-cache
 
 
 # Second time - use the cache
 # Second time - use the cache
-hercules /tmp/repo-cache
+hercules -some-analysis /tmp/repo-cache
 ```
 ```
 
 
 #### Docker image
 #### Docker image
 
 
 ```
 ```
-docker run --rm srcd/hercules hercules -pb https://github.com/git/git | docker run --rm -i -v $(pwd):/io srcd/hercules labours.py -f pb -o /io/git_git.png
+docker run --rm srcd/hercules hercules -burndown -pb https://github.com/git/git | docker run --rm -i -v $(pwd):/io srcd/hercules labours.py -f pb -m project -o /io/git_git.png
 ```
 ```
 
 
-### Extensions
+### Built-in analyses
+
+#### Project burndown
+
+```
+hercules -burndown
+python3 labours.py -m project
+```
+
+Line burndown statistics for the whole repository.
+Exactly the same what [git-of-theseus](https://github.com/erikbern/git-of-theseus)
+does but much faster. Blaming is performed efficiently and incrementally using a custom RB tree tracking
+algorithm, and only the last modification date is recorded while running the analysis.
+
+All burndown analyses depend on the values of *granularity* and *sampling*.
+Granularity is the number of days each band in the stack consists of. Sampling
+is the frequency with which the burnout state is snapshotted. The smaller the
+value, the more smooth is the plot but the more work is done.
 
 
 #### Files
 #### Files
 
 
 ```
 ```
-hercules -files
+hercules -burndown -burndown-files
 python3 labours.py -m files
 python3 labours.py -m files
 ```
 ```
 
 
@@ -97,11 +106,11 @@ Burndown statistics for every file in the repository which is alive in the lates
 #### People
 #### People
 
 
 ```
 ```
-hercules -people [-people-dict=/path/to/identities]
+hercules -burndown -burndown-people [-people-dict=/path/to/identities]
 python3 labours.py -m person
 python3 labours.py -m person
 ```
 ```
 
 
-Burndown statistics for developers. If `-people-dict` is not specified, the identities are
+Burndown statistics for the repository's contributors. If `-people-dict` is not specified, the identities are
 discovered by the following algorithm:
 discovered by the following algorithm:
 
 
 0. We start from the root commit towards the HEAD. Emails and names are converted to lower case.
 0. We start from the root commit towards the HEAD. Emails and names are converted to lower case.
@@ -121,7 +130,7 @@ by `|`. The case is ignored.
 <p align="center">Wireshark top 20 devs - churn matrix</p>
 <p align="center">Wireshark top 20 devs - churn matrix</p>
 
 
 ```
 ```
-hercules -people [-people-dict=/path/to/identities]
+hercules -burndown -burndown-people [-people-dict=/path/to/identities]
 python3 labours.py -m churn_matrix
 python3 labours.py -m churn_matrix
 ```
 ```
 
 
@@ -143,7 +152,7 @@ The sequence of developers is stored in `people_sequence` YAML node.
 <p align="center">Ember.js top 20 devs - code ownership</p>
 <p align="center">Ember.js top 20 devs - code ownership</p>
 
 
 ```
 ```
-hercules -people [-people-dict=/path/to/identities]
+hercules -burndown -burndown-people [-people-dict=/path/to/identities]
 python3 labours.py -m ownership
 python3 labours.py -m ownership
 ```
 ```
 
 
@@ -176,10 +185,14 @@ can be visualized with t-SNE implemented in TF Projector.
 #### Everything in a single pass
 #### Everything in a single pass
 
 
 ```
 ```
-hercules -files -people -couples [-people-dict=/path/to/identities]
+hercules -burndown -burndown-files -burndown-people -couples [-people-dict=/path/to/identities]
 python3 labours.py -m all
 python3 labours.py -m all
 ```
 ```
 
 
+### Plugins
+
+Hercules has a plugin system and allows to run custom analyses. See [PLUGINS.md](PLUGINS.md).
+
 ### Bad unicode errors
 ### Bad unicode errors
 
 
 YAML does not support the whole range of Unicode characters and the parser on `labours.py` side
 YAML does not support the whole range of Unicode characters and the parser on `labours.py` side
@@ -187,7 +200,7 @@ may raise exceptions. Filter the output from `hercules` through `fix_yaml_unicod
 such offending characters.
 such offending characters.
 
 
 ```
 ```
-hercules -people https://github.com/... | python3 fix_yaml_unicode.py | python3 labours.py -m people
+hercules -burndown -burndown-people https://github.com/... | python3 fix_yaml_unicode.py | python3 labours.py -m people
 ```
 ```
 
 
 ### Plotting
 ### Plotting

+ 27 - 0
doc/dag.dot

@@ -0,0 +1,27 @@
+digraph Hercules {
+  "6 BlobCache" -> "7 [blob_cache]"
+  "0 DaysSinceStart" -> "3 [day]"
+  "10 FileDiff" -> "12 [file_diff]"
+  "16 FileDiffRefiner" -> "17 Burndown"
+  "1 IdentityDetector" -> "4 [author]"
+  "8 RenameAnalysis" -> "17 Burndown"
+  "8 RenameAnalysis" -> "9 Couples"
+  "8 RenameAnalysis" -> "10 FileDiff"
+  "8 RenameAnalysis" -> "11 UAST"
+  "8 RenameAnalysis" -> "14 UASTChanges"
+  "2 TreeDiff" -> "5 [changes]"
+  "11 UAST" -> "13 [uasts]"
+  "14 UASTChanges" -> "15 [changed_uasts]"
+  "4 [author]" -> "17 Burndown"
+  "4 [author]" -> "9 Couples"
+  "7 [blob_cache]" -> "17 Burndown"
+  "7 [blob_cache]" -> "10 FileDiff"
+  "7 [blob_cache]" -> "8 RenameAnalysis"
+  "7 [blob_cache]" -> "11 UAST"
+  "15 [changed_uasts]" -> "16 FileDiffRefiner"
+  "5 [changes]" -> "6 BlobCache"
+  "5 [changes]" -> "8 RenameAnalysis"
+  "3 [day]" -> "17 Burndown"
+  "12 [file_diff]" -> "16 FileDiffRefiner"
+  "13 [uasts]" -> "14 UASTChanges"
+}