Pushing large files to GitHub

Pushing large files to GitHub

When pushing large files (file size > 100MB = GitHub file size limit) to GitHub, one might experience errors such as below:

error: RPC failed; HTTP 408 curl 22 The requested URL returned
... 
fatal: the remote end hung up unexpectedly

Here, RPC stands for “Remote Procedure Call” between your local repository and the GitHub repository, HTTP 408 stands for request timeout. Many other error messages may pop up, but the culprit is usually that you are trying to push a large file over the internet.

Increase postbuffer size

One potential solution to the above problem when the large file is still relatively small (< 100M) is to increase the postbuffer size with:

git config --global http.postbuffer 500M

where 500M or <some other number> is the new buffer size (the default size is 1M). However, this solution is not generally effective, as per the official documentation, “raising this limit is only effective for disabling chunked transfer encoding” (which happens over http/1.1 when buffer size is surpassed) and “therefore should be used only where the remote server or a proxy only supports HTTP/1.0 or is noncompliant with the HTTP standard”.

To check existing postbuffer size, we can use:

git config --get http.postbuffer

Git LFS

The better solution is to use Git Large File Storage (LFS), which is designed to handle large files such as audio/video samples, datasets, etc. and replace them with text pointers inside Git, while storing them separately elsewhere.

To use LFS, one should install it and then specifically track large files with:

git lfs track "*.<extension>"

Afterwards, all files ending with <extension> will be replaced by a text pointer, and a .gitattributes file specifying the tracked extensions will be created. The original large files will be moved to the lfs cache inside the .git repository, and only their pointers will be tracked by git. You can then commit and push as usual. To view the files tracked by LFS, type:

git lfs ls-files --all --long

One caveat of the above is that git lfs track creates a new commit. Say you have commit1 where you committed some large files which cannot be pushed to the remote repository, so you did commit2 using LFS, and you push again. This will still fail as git pushes the commits sequentially, and while commit2 adds the large files to LFS, commit1 didn’t and you get the same error. You can solve this error by squashing the last 2 commits as 1 using interactive rebase:

git rebase -i HEAD~2

edit rebase list in the interactive rebase editor (1st editor):

pick <commit1_hash> <Commit message for commit 1>
squash <commit2_hash> <Commit message for commit 2>

Optionally modify commit message in the commit message editor (2nd editor), and force push to remote with a new squashed history:

git push origin <branch_name> --force

To avoid the above, we can use the migrate feature of LFS with:

git lfs migrate import --include="*.<extension>* --everything

Here, LFS will track large <extension> files as well as rewrite history wherever they are involved, so no new commit is needed, but a forced push is required. To have an idea of what files are large, we can do:

git lfs migrate info

This command will print the extensions of the largest files in descending order of file size.

As a last note, one should be careful with the use of LFS and GitHub as GitHub only allows 1GB of free LFS quota, any more would need the purchase of additional data packs, where one pack costs $5 per month and provides 50GB of storage. If your data quota is exceeded, LFS will be disabled for your account.