Automate the Annoying: Issue numbers in git commits

Johannes Pieger

Posted on June 2, 2025

Git

Photo by Johannes Pieger

You can find the tool developed in this post here.

In one project we use BDD style tests with cucumber to test our software. For regulatory reasons, we track all tests in Azure DevOps.

(Ab)using the example from https://cucumber.io, our setup looks a bit like this:

Feature: Withdrawing cash

  @tc:1001
  Scenario: Successful withdrawal within balance
    Given Alice has 234.56 in their account
    When Alice tries to withdraw 200.00
    Then the withdrawal is successful

  @tc:1002
  Scenario: Declined withdrawal in excess of balance
    Given Hamza has 198.76 in their account
    When Hamza tries to withdraw 200.00
    Then the withdrawal is declined

Every test has a tag that references the work item on Azure DevOps. To enable easier tracing, we mention all test scenarios we touch in the commit message:

test: update cash money tests

...

Tests: #1001, #1002

That’s fine and dandy, until you get to that PR that adds a new step to two dozen different tests…

We can automate this!

Look at your diff,

now back to the test,

now back to your diff,

now back to the test.

Collecting all those @tc:... annotations manually is annoying and error prone. So lets have the computer do it.

Conceptually the implementation is easy:

Grab the git diff and filter for gherkin files
Parse the gherkin and cross-reference with the diff
Put everything in the commit message

Lets start at the beginning:

libgit2

For interacting with the repository, I’m using git2, a wrapper for the libgit2 C library.

The main hurdle was to figure out what I actually wanted. Since it’s about committing, we need everything green in the output of git status. We need the index.

So, open up the repo and diff the index with the latest committed state (aka HEAD):

let repo = Repository::open_from_env().unwrap();

let mut diff_opts = DiffOptions::default();
// We don't need no context
diff_opts.context_lines(0);

let head = repo
    .resolve_reference_from_short_name("HEAD")
    .unwrap()
    .peel_to_commit()
    .unwrap()
    .tree()
    .unwrap();

let diff = repo
    .diff_tree_to_index(Some(&tree), None, Some(&mut diff_opts))
    .unwrap();

Much simpler than shelling out, calling git diff --cached -U0 and parsing its output with all the possible edge cases.

Then we iterate all lines in the diff:

let mut changes = Vec::new();

diff.foreach(
    &mut |_, _| true,
    None,
    None,
    Some(&mut |file, _, line| {
        if file
            .new_file()
            .path()
            .is_none_or(|p| p.extension().is_none_or(|e| e != "feature"))
        {
            return true;
        }

        let Some(path) = file.new_file().path().map(ToOwned::to_owned) else {
            return true;
        };

        if let Some(num) = line.old_lineno() {
            changes.push((path, num, Version::Old));
        }

        if let Some(num) = line.new_lineno() {
            changes.push((path, num, Version::New));
        }

        result.push(change);

        true
    }),
);

git2 tries to stick closely to the API of the libgit2 library and here it shows the difference between Rust and C nicely. Instead of an Iterator we get a foreach() function with a bunch of callbacks.

For our use case we’re only interested in the line callback. For every gherkin file (*.feature) we store which line was changed, and whether the change is in the old or new version of the file.

For deletions we check the old file, for insertions the new file.

With that, the first step is done.

gherkin

With a set of paths and line numbers in hand, we can tackle the next step.

Parsing gherkin!

cargo add gherkin

That should take care of the heavy lifting.

let (path, line, version) = changes.pop();

let text = load_file(path, version).unwrap();

let feature = Feature::parse(&text, Default::default()).unwrap();

Taking a look at the feature struct, we can see a list of scenarios and every scenario comes with span information and a list of tags.

Profit!

We can cross-reference the scenario spans with the changed line numbers. If a line from the diff falls into a scenario span, the tags of the scenario tell us the issue number we need.

The only issue: Spans are byte offsets in the file and we have line numbers.

Nothing that a bit of preprocessing can’t fix:

let mut ptr = 0;
let line_ranges = text.split_inclusive('\n')
    .map(|line| {
        let end = ptr + line.len() + 1;
        let range = ptr..end;
        ptr = end;
        range
    })
    .collect::<Vec<_>>;

Run through all lines and convert each line to a range, indicating the byte positions in the file. Then we can index this vector with the line number to compare with the span information from the gherkin struct:

let scenario = feature
    .scenarios
    .iter()
    .find(|s| does_intersect(s.span, line_ranges[line-1]));

If we do find a matching scenario, then it’s as simple as parsing scenario.tags and grabbing the number from that.

And why do complicated parsing, when you can just do a simple prefix match:

fn parse_testcase_number(tag: &str) -> Option<u32> {
    tag.strip_prefix("tc:")?.parse().ok()
}

Now that we know the numbers we need to reference, it’s time to bring them into the commit message.

prepare-commit-msg

Git offers hooks to automatically run code as part of the normal workflow. One of those hooks is prepare-commit-msg, which runs just before a commit and can adjust the message.

Just what we need!

The interface is simple:

If .git/hooks/prepare-commit-msg exists, it is run as the hook.

Git passes us one to three arguments:

prepare-commit-msg <message_file> [source] [hash]

The source and hash arguments can mostly be ignored, message_file is the interesting one.

It contains the path to a file with the “initial” commit message. This includes any templates (via git commit -t or commit.template), and the default git instructions:


# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch main
# Changes to be committed:

Since issue numbers belong into the trailers, we want to insert them at the end of the message.

For readability ideally just before the instructions.

So the full integration looks like this:

Open the commit message file
Format the issue numbers as trailer (Tests: #1001, #1002)
Insert the trailer just before the instructions (a big block starting with #)
Write new contents back to commit message file

With all that done, we can hit compile, copy the binary to .git/hooks/prepare-commit-msg, and try it out:

Conclusion

Share via

Link copied to clipboard

Stream video with frame-level metadata via WebRTC using GStreamer

On this page:

Posts

We can automate this!

libgit2

gherkin

prepare-commit-msg

Conclusion