Explore Projects

Abstract

Most growing software teams eventually face the same quiet problem: the repository becomes heavier than the work itself. Old branches, forgotten tags, and binary files buried in history can turn a normal Git project into something slow, risky, and hard to clean. This project built a safe cleanup workflow for a large Git repository where old binary history represented roughly 70% of the storage pressure.

Results

The cleanup was turned from a risky manual task into a repeatable process. Candidate branches were checked against open merge requests before deletion, outdated branches and tags could be removed through a dry-run-first workflow, and old binary history was prepared for controlled cleanup. The result was a reusable approach and a clear path to reduce repository weight by targeting the historical data that mattered most.

Introduction

Some repository problems do not appear suddenly. They grow quietly, one branch, one tag, and one committed binary file at a time.

In this case, the Git repository had reached about 39 GB. The project still worked, but normal cleanup was no longer enough. After many old branches and tags were removed, the storage problem remained almost unchanged. A binary report showed the reason: more than one million Git objects were processed, and 13,723 unique binary objects still occupied 27.48 GB of history.
The task became clear: separate what was still active from what was only historical, protect opened merge requests, and prepare a controlled cleanup process that could reduce the hidden weight of the repository without exposing private project data.

The cleanup problem

Why the repository became heavy, and why deleting refs was not enough

The repository became heavy because visible cleanup and storage cleanup were not the same problem. The first cleanup removed many old branches and tags. On the surface, that looked like progress: fewer refs, fewer names to review, less visible clutter. But the repository size stayed almost the same.

The reason was hidden in Git history. Large binary files were still reachable from old commits, so Git continued to preserve them even after many branches and tags were gone. The cleanup problem was no longer just “what can we delete?” It became “what is still active, what is only historical, and what can be removed without breaking the project?”

Visible cleanup does not remove hidden Git history

Safety rules

A repository cleanup looks simple only from a distance. In real work, deleting the wrong branch can close an active merge request, removing a protected ref can fail unexpectedly, and rewriting history can confuse every clone that still points to the old commits.

For this reason the cleanup was treated as a controlled operation, not a one-click deletion. The public version also had to stay clean: no real project URLs, no tokens, no customer names, no internal branch names, and no private file paths.

  • Opened merge requests, including draft merge requests, had to be protected
  • Default and protected branches could not be treated as ordinary deletion candidates
  • Every destructive step needed a dry run and a log before execution
  • Recent and active history had to be preserved
  • The public explanation had to be reusable without exposing private data

Evidence used

The work started by collecting enough evidence to avoid guessing. The branch and tag lists showed what looked outdated. Merge request checks showed what was still active. The binary report explained why the repository stayed heavy even after ref cleanup.

Input What it showed Why it mattered
Branch candidate list Branches proposed for deletion Starting point for cleanup review
Tag candidate list Old tags proposed for deletion Reduced visible historical clutter
Merge request API results Opened and draft merge requests linked to branches Prevented active work from being closed by cleanup
Binary report 13,723 unique binary objects using 27.48 GB Explained why deleting refs was not enough
Cutoff date Boundary between recent work and older binary history Allowed old-only objects to be targeted more carefully
Commit map Old-to-new commit mapping after history rewrite Required for GitLab repository cleanup
 

The hard part

The difficult part was not writing a command to delete things. The difficult part was deciding what should not be touched. A branch can look old and still be part of someone’s unfinished merge request. A file can disappear from the current tree and still occupy space because Git keeps it in history. A repository can look cleaner after ref cleanup and still carry the same weight underneath.

The cleaner solution was to make the thinking visible: classify first, delete second, rewrite history only after the evidence is separated from guesswork. That made the work less dramatic in execution, but much safer in practice.

Safety gates before deleting anything from the repository

Method: from repository audit to controlled cleanup

The method was kept deliberately practical. First make the repository understandable, then protect active work, then run deletion as a reviewed operation, and only after that prepare history cleanup.

1. Preparation

The first step was to stop treating the repository as one big problem. Branches, tags, merge requests, binary objects, and recent history each needed a separate check. This made the work easier to review and safer to repeat.

  • Use prepared branch and tag lists instead of deleting manually from the interface
  • Normalize names and remove duplicates before running the cleanup workflow
  • Keep all actions dry-run by default
  • Write output logs for skipped, missing, failed, and successful operations
2. Branch and merge request safety check

Branches can look abandoned while still being connected to active work. The branch safety check reviewed opened merge requests first, including draft merge requests, and created a safer deletion path.

  • Branches linked to opened merge requests were excluded
  • Filtered output separated safe candidates from blocked branches
  • A details file recorded why a branch was excluded
3. Dry-run cleanup test

The deletion step was designed to show what would happen before making changes. This made the cleanup reviewable: the team could inspect logs, confirm counts, and only then run the same command with execution enabled.

  • Dry-run showed branch and tag delete URLs without deleting them
  • The default branch was skipped
  • Protected refs and permission issues were logged instead of hidden
4. Binary history analysis

After branch and tag cleanup, the binary report still showed tens of gigabytes in history. This changed the focus from visible refs to old objects. The cleanup needed to identify binary blobs that existed only before a chosen cutoff date.

  • Collect binary objects from all reachable history
  • Collect binary objects still reachable from recent commits
  • Strip only old-only binary blobs from the rewrite list
5. History cleanup preparation

The final cleanup path used a history rewrite approach. After rewriting, the commit map becomes the bridge between the old and new history, and GitLab repository cleanup can use it to remove internal references and reclaim storage.

  • Run history rewrite only from a fresh mirror or disposable cleanup clone
  • Save the generated commit map for audit and GitLab cleanup
  • Force-push rewritten refs only after review
  • Run GitLab repository cleanup after the rewrite process

From repository audit to controlled cleanup

Results

The results are grouped by what changed in the cleanup process: active work became protected, deletion became reviewable, repository weight became measurable, and the final workflow became reusable.

1. Active branch protection

The first practical result was a separation between branches that only looked old and branches that were still part of active work. Instead of publishing internal file names or branch lists, the workflow can be understood as three outcomes: branches ready for review, branches held back because they touched opened merge requests, and an evidence record explaining the decision.

  • A review list for branches that could move toward deletion
  • A hold-back list for branches still connected to opened merge requests
  • An evidence log showing why a branch was protected

Opened merge requests protect active branches from cleanup

2. Reviewed branch and tag deletion

The deletion step became much less fragile once it was dry-run first. The same workflow could be used for review and execution, with the difference controlled by an explicit flag.

  • Branch and tag candidates were read from files
  • Dry-run logs showed exactly what would be deleted
  • Execution logs recorded successful deletes, skipped refs, missing refs, and permission errors
  • The default branch was skipped automatically

Dry-run logs turn deletion into a reviewable step

3. Repository storage pressure

The most important result was not a deleted branch count. It was understanding where the weight actually lived. The visible cleanup helped organization, but the binary report showed that the larger issue was old binary history.

  • More than one million Git objects were processed in analysis
  • 13,723 unique binary objects accounted for 27.48 GB
  • Old binary history represented roughly 70% of the repository pressure
  • The cleanup target moved from refs alone to old-only binary blobs

Old binary history carried most of the repository weight

4. Final result

The final deliverable was a reusable cleanup method and a public-safe explanation of the approach. The working version uses placeholders, prepared lists, dry-runs, and logs so the same idea can be repeated on another repository without exposing private customer data.

  • A branch-safety step separates active work from cleanup candidates
  • A deletion step removes only reviewed branches and tags
  • A history-cleanup step targets old-only binary blobs by cutoff date
  • The public version explains the workflow without exposing internal project details

A one-time cleanup became a reusable workflow

Summary

This project started as a storage problem, but the real value was in making the cleanup understandable. Old branches and tags could be reviewed and removed, active merge requests could be protected, and binary history could be measured before any rewrite was attempted.

The result is not just a smaller-repository plan. It is a safer way to reason about Git history: what is visible, what is still active, what is old, and what can be removed only after evidence says it is safe.