Fast, insightful and highly customizable Git history analysis.

Vadim Markovtsev f52cd6e01a Fix the sharding and the convergence 8 роки тому
cmd 7c80aa21d0 Fix invalid chars in YAML strings 8 роки тому
toposort 2b1ed97819 Refactor the engine to enable many analyses 8 роки тому
.gitignore cce947b98a Initial commit 8 роки тому
.travis.yml 23865aad23 Fix Travis build 8 роки тому
LICENSE cce947b98a Initial commit 8 роки тому
README.md d6ec7b73d7 Update the readme 8 роки тому
blob_cache.go 2b1ed97819 Refactor the engine to enable many analyses 8 роки тому
burndown.go db87120df4 Fix deleting binary files in history 8 роки тому
couples.go 763ead8089 Add Tensorflow Projector visualisation of couples 8 роки тому
day.go 2b1ed97819 Refactor the engine to enable many analyses 8 роки тому
doc.go 8a60feab56 Add documentation 8 роки тому
dummies.go 2b1ed97819 Refactor the engine to enable many analyses 8 роки тому
file.go d38c16ab62 Run go fmt 8 роки тому
file_test.go d38c16ab62 Run go fmt 8 роки тому
fix_yaml_unicode.py a431fead30 Add the workaround for broken YAML unicode 8 роки тому
identity.go 37183d8d8b Add couples collection 8 роки тому
labours.py f52cd6e01a Fix the sharding and the convergence 8 роки тому
linux.png 396aa0e8fb Update the readme with Linux image 8 роки тому
pipeline.go 2b1ed97819 Refactor the engine to enable many analyses 8 роки тому
rbtree.go 8a60feab56 Add documentation 8 роки тому
renames.go 2b1ed97819 Refactor the engine to enable many analyses 8 роки тому
swivel.py 763ead8089 Add Tensorflow Projector visualisation of couples 8 роки тому
tree_diff.go 2b1ed97819 Refactor the engine to enable many analyses 8 роки тому

README.md

Build Status

Hercules

This tool calculates the lines burnout stats in a Git repository. Exactly the same what git-of-theseus does actually, but using go-git. Why? source{d} builds it's own data pipeline to process every git repository in the world and the calculation of the annual burnout ratio will be embedded into it. This project is an open source implementation of the specific git blame flavour on top of go-git. Blaming is done incrementally using the custom RB tree tracking algorithm, only the last modification date is recorded.

There are two tools: hercules and labours.py. The first is the program written in Go which collects the burnout stats from a Git repository. The second is the Python script which draws the stack area plot and optionally resamples the time series. These two tools are normally used together through the pipe. hercules prints results in plain text. The first line is four numbers: UNIX timestamp which corresponds to the time the repository was created, UNIX timestamp of the last commit, granularity and sampling. Granularity is the number of days each band in the stack consists of. Sampling is the frequency with which the burnout state is snapshotted. The smaller the value, the more smooth is the plot but the more work is done.

git/git image

torvalds/linux burndown (granularity 30, sampling 30, resampled by year)

There is an option to resample the bands inside labours.py, so that you can define a very precise distribution and visualize it different ways. Besides, resampling aligns the bands across periodic boundaries, e.g. months or years. Unresampled bands are apparently not aligned and start from the project's birth date.

There is a presentation available.

Installation

You are going to need Go and Python 2 or 3.

go get gopkg.in/src-d/hercules.v1/cmd/hercules
pip install pandas seaborn
wget https://github.com/src-d/hercules/raw/master/labours.py

Usage

# Use "memory" go-git backend and display the plot. This is the fastest but the repository data must fit into RAM.
hercules https://github.com/src-d/go-git | python3 labours.py --resample month
# Use "file system" go-git backend and print the raw data.
hercules /path/to/cloned/go-git
# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache and display the unresampled plot.
hercules https://github.com/git/git /tmp/repo-cache | python3 labours.py --resample raw

# Now something fun
# Get the linear history from git rev-list, reverse it
# Pipe to hercules, produce the snapshots for every 30 days grouped by 30 days
# Save the raw data to cache.txt, so that later simply cat cache.txt | python3 labours.py
# Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
git rev-list HEAD | tac | hercules -commits - https://github.com/git/git | tee cache.txt | python3 labours.py --font-size 16 --backend Agg --output git.png

Extensions

Option -files additionally prints the corresponding burndown table for every file in the repository. -people does the same for the developers; -people-dict allows to specify the custom identity matching.

Correspondingly, labours.py has --mode which allows to plot all the burndowns for files, people and the overwrite matrix. The latter shows how much code written by a developer is removed by other developers, the rows are normalized to the number of individual insertions.

Caveats

  1. Currently, go-git's "file system" backend is considerably slower than the in-memory one, so you should clone repos instead of reading them from disk whenever possible.

License

MIT.