THRIFT-2026: Eliminate some undefined behavior in C/C++
Clients: glib, C++
Patch: Jim Apple <jbapple-impala@apache.org>
This closes #1214
This patch fixes some undefined behavior were found using Clang's
UndefinedBehaviorSanitizer (UBSan). To check for undefined behavior,
run /build/docker/scripts/ubsan.sh. This is run during CI builds, as
well.
The examples of the types of undefined behavior fixed in this commit
are:
1. Enumerations exhibit undefined behavior when they have values
outside of a range dependent on the values of their enumerators, as
specified in C++14's chapter 7.2 ("Enumeration declarations"),
paragraph 8.
2. Left shift of negative values, used in zigzag encoding, is
undefined behavior. See 5.8 ("Shift operators"), paragraph 2 for
C++ and 6.5.7 ("Bitwise shift operators"), paragraph 4 for C99 and
C11.
diff --git a/build/docker/scripts/ubsan.sh b/build/docker/scripts/ubsan.sh
new file mode 100755
index 0000000..6db10f3
--- /dev/null
+++ b/build/docker/scripts/ubsan.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+set -ex
+
+# Wraps autotools.sh, but each binary crashes if it exhibits undefined behavior. See
+# http://releases.llvm.org/3.8.0/tools/clang/docs/UndefinedBehaviorSanitizer.html
+
+# Install a more recent clang than default:
+sudo apt-get update
+sudo apt-get install -y --no-install-recommends clang-3.8 llvm-3.8-dev
+export CC=clang-3.8
+export CXX=clang++-3.8
+
+# Set the undefined behavior flags. This crashes on all undefined behavior except for
+# undefined casting, aka "vptr".
+#
+# TODO: fix undefined vptr behavior and turn this option back on.
+export CFLAGS="-fsanitize=undefined -fno-sanitize-recover=undefined -fno-sanitize=vptr"
+# Builds without optimization and with debugging symbols for making crash reports more
+# readable.
+export CFLAGS="${CFLAGS} -O0 -ggdb3"
+export CXXFLAGS="${CFLAGS}"
+export UBSAN_OPTIONS=print_stacktrace=1
+
+# llvm-symbolizer must be on PATH, but the above installation instals a binary called
+# "llvm-symbolizer-3.8", not "llvm-symbolizer". This fixes that with a softlink in a new
+# directory.
+CLANG_PATH="$(mktemp -d)"
+trap "rm -rf ${CLANG_PATH}" EXIT
+ln -s "$(whereis llvm-symbolizer-3.8 | rev | cut -d ' ' -f 1 | rev)" \
+ "${CLANG_PATH}/llvm-symbolizer"
+export PATH="${CLANG_PATH}:${PATH}"
+llvm-symbolizer -version
+
+build/docker/scripts/autotools.sh $*