Diff your Word documents

I don’t use Word, but other people do. No shade; just facts. I want my important records to be versioned so that I can more-easily understand what has changed when they change. Word and its cousins make that annoying. I want to be able to summarize the differences between Draft 1 and Draft 2 of a document that someone else is writing. I want to do this without arrogantly insisting they use different tools.

Fortunately, this is possible.

The Solution

Use docx2txt to convert the Word document to plain text. Teach git diff how to treat *.docx files as files that can be compared by first converting them to plain text.

Now, at least, I can see the diff in plain text, so that I know where to look in the Word document for the corresponding difference.

It’s not perfect, but it works plenty well enough for me.

  1. Change gitattributes so that it treats *.docx files as specialized “docx” files. Here, “docx” is a label that behaves like a file type.
  2. Change gitconfig so that diffing “docx” files uses a command that converts “docx” files to plain text.

Now, when you use git diff, you see a diff in the converted text, rather than in the original binary file. This at least shows you the context of the difference, so that you can search for in it Word or LibreOffice or whatever word processor you prefer.

# in $XDG_CONFIG_HOME/git/attributes, the global-for-the-user .gitattributes file
# https://stackoverflow.com/questions/22439517/view-docx-file-on-github-and-use-git-diff-on-docx-file-format
*.docx diff=docx

When I make a change such as this, I always cite my sources in a comment. Many of these are StackOveflow pages, while they still exist!

# in $HOME/.gitconfig, the global-for-the-user .gitconfig file
[diff "docx"]
textconv = docx2txt <

I use the < version of this command so that docx2txt prints output to stdout instead of to a file with the same basename and the extension .txt.

References