radu/hercules: Fast, insightful and highly customizable Git history analysis. @ 6772bb8b4b2a31f6ff8df3d905dcd449c967d38d

Fast, insightful and highly customizable Git history analysis.

Vadim Markovtsev 6772bb8b4b Implement fused insert+delete		преди 9 години
cmd	4d3f9500c2 Add TravisCI config	преди 9 години
.gitignore	cce947b98a Initial commit	преди 9 години
.travis.yml	4d3f9500c2 Add TravisCI config	преди 9 години
LICENSE	cce947b98a Initial commit	преди 9 години
README.md	828ccc45d4 Add Travis link	преди 9 години
analyser.go	6772bb8b4b Implement fused insert+delete	преди 9 години
file.go	6772bb8b4b Implement fused insert+delete	преди 9 години
file_test.go	6772bb8b4b Implement fused insert+delete	преди 9 години
git-git.png	a8b665a65d Add readme swag	преди 9 години
labours.py	c03bf606eb Update readme	преди 9 години
rbtree.go	a3ee37f91f Initial files	преди 9 години

Hercules

This tool calculates the lines burnout stats in a Git repository. Exactly the same what git-of-theseus does actually, but using go-git. Why? source{d} builds it's own data pipeline to process every git repository in the world and the calculation of the annual burnout ratio will be embedded into it. This project is the open source implementation of the specific git blame flavour on top of go-git. It is done incrementally using the custom RB tree tracking algorithm, only the last modification date is recorded.

There are two tools: hercules and labours.py. The first is the program written in Go which collects the burnout stats from a Git repository. The second is the Python script which draws the stack area plot. They are normally used together through a pipe. hercules prints text results. The first line is three numbers: UNIX timestamp which corresponds to the time the repository was created, granularity and sampling. Granularity is the number of days each band in the stack consists of. For example, to get the annual burnout plot, set granularity to 365. Sampling is the frequency with which the burnout is snapshotted. The smaller the value, the more smooth is the plot but the more work is done.

git/git burndown (granularity 365, sampling 30)

###Installation You are going to need Go and Python 2 or 3.

go get gopkg.in/src-d/hercules.v1/cmd/hercules
pip install pandas seaborn
wget https://github.com/src-d/hercules/raw/master/labours.py

###Usage

# Use "memory" go-git backend and display the plot. This is the fastest but the repository data must fit into RAM.
hercules https://github.com/src-d/go-git | python3 labours.py
# Use "file system" go-git backend and print the raw data.
hercules /path/to/cloned/go-git
# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache and display the plot.
hercules https://github.com/git/git /tmp/repo-cache | python3 labours.py

# Now something fun
# Get the linear history from git rev-list, reverse it
# Pipe to hercules, produce the snapshot every 30 days with 1 year grouping
# Save the raw data to cache.txt, so that later simply cat cache.txt | python3 labours.py
# Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
git rev-list HEAD | tac | hercules -commits - -sampling 30 -granularity 365 https://github.com/git/git | tee cache.txt | python3 labours.py --font-size 16 --backend Agg --output git.png

###Caveats

Currently, go-git's "diff tree" algorithm's complexity is n log(n) where n is the number of files in the tree. Git's and libgit2's complexity is sublinear, almost constant because they are comparing the hashes of subtrees. go-git will have the same complexity in the very near future.
Currently, go-git's "file system" backend does not cache anything in memory. Every object retrieval operation decompresses the packfiles, parses them, etc. Effectively, the performance slowdown is 100x. This will be fixed in the near future too.

###License MIT.

README.md

Hercules