radu/hercules: Fast, insightful and highly customizable Git history analysis. @ d7be9b0e6e490926b45d519914b8e26ea8e9f81c

Fast, insightful and highly customizable Git history analysis.

2 Ágak

Vadim Markovtsev d7be9b0e6e Print the timestamp of the last commit		8 éve
cmd	d7be9b0e6e Print the timestamp of the last commit	8 éve
.gitignore	cce947b98a Initial commit	9 éve
.travis.yml	23865aad23 Fix Travis build	8 éve
LICENSE	cce947b98a Initial commit	9 éve
README.md	c769fdc9be Update README.md	8 éve
analyser.go	37a9da0c54 Handle negative time	8 éve
file.go	0bc941f975 Fix the bug with deleting at 0 time	8 éve
file_test.go	d7be9b0e6e Print the timestamp of the last commit	8 éve
git-git.png	a8b665a65d Add readme swag	9 éve
labours.py	f14a6bb68b Fix some corner cases in the plotting	8 éve
rbtree.go	a3ee37f91f Initial files	9 éve

Hercules

This tool calculates the lines burnout stats in a Git repository. Exactly the same what git-of-theseus does actually, but using go-git. Why? source{d} builds it's own data pipeline to process every git repository in the world and the calculation of the annual burnout ratio will be embedded into it. This project is an open source implementation of the specific git blame flavour on top of go-git. Blaming is done incrementally using the custom RB tree tracking algorithm, only the last modification date is recorded.

There are two tools: hercules and labours.py. The first is the program written in Go which collects the burnout stats from a Git repository. The second is the Python script which draws the stack area plot and optionally resamples the time series. These two tools are normally used together through the pipe. hercules prints results in plain text. The first line is three numbers: UNIX timestamp which corresponds to the time the repository was created, granularity and sampling. Granularity is the number of days each band in the stack consists of. For example, to generate the annual burnout plot, set granularity to 365. Sampling is the frequency with which the burnout is snapshotted. The smaller the value, the more smooth is the plot but the more work is done.

git/git burndown (granularity 365, sampling 30, no resampling)

There is an option to resample the bands inside labours.py, so that you can define very precise distribution and visualize it differently. Besides, resampling aligns the bands across the year (month, week) boundaries.

Installation

You are going to need Go and Python 2 or 3.

go get gopkg.in/src-d/hercules.v1/cmd/hercules
pip install pandas seaborn
wget https://github.com/src-d/hercules/raw/master/labours.py

Usage

# Use "memory" go-git backend and display the plot. This is the fastest but the repository data must fit into RAM.
hercules https://github.com/src-d/go-git | python3 labours.py --resample month
# Use "file system" go-git backend and print the raw data.
hercules /path/to/cloned/go-git
# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache and display the plot.
hercules https://github.com/git/git /tmp/repo-cache | python3 labours.py --resample month

# Now something fun
# Get the linear history from git rev-list, reverse it
# Pipe to hercules, produce the snapshots for every 30 days grouped by 30 days
# Save the raw data to cache.txt, so that later simply cat cache.txt | python3 labours.py
# Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
git rev-list HEAD | tac | hercules -commits - https://github.com/git/git | tee cache.txt | python3 labours.py --font-size 16 --backend Agg --output git.png

Caveats

Currently, go-git's "file system" backend is much slower than the in-memory one, so you should clone repos instead of reading them from disk whenever possible.

License

MIT.

README.md

Hercules

Installation

Usage

Caveats

License