13 Nov

Cross compiling Sollya to Windows with Emscripten

One component of Codeplay’s ComputeAorta that I manage is our high precision maths library Abacus.

One major component of Abacus, and in fact all math libraries, are a requirement to have remez reduced polynomial approximations of functions. In the past we’ve made use of Maple, Mathematica, lolremez, our own fork of lolremez, and to be honest none of them have been satisfactory to our needs. We want a scriptable solution that we can use to bake the generated polynomials automatically into Abacus with minimal user intervention.

I was lucky enough to be involved in a twitter thread with Marc B. Reynolds where he pointed me at Sollya. It’s Linux only which sucks (I’m primarily a Windows developer), but I fired up a VM and tried it out – and I’ve got to say, its pretty darn good! The non-Windows support is a big issue though, so how to fix that?

Enter stage left – Emscripten!

So I’ve known about Emscripten for a while, but never had a really compelling reason to use it. I suddenly thought ‘I wonder if I could use Emscripten to compile Sollya to JavaScript, then use Node.js to run it on Windows?’.

Yep, you are right, I’m mad. This can’t be a good way to take software meant for Linux and cross compile it for Windows, right? That just made me all the more curious to see if it could work.

Sollya and all its dependencies

Sollya requires a bunch of different projects to work: libxml2GMP, MPFR, MPFI, fplll, and lastly Sollya itself. So I first downloaded all of these, built them all from source, and built Sollya using gcc on Linux – just to test that I could build it.

Then, using Emscripten’s emconfigure (which you place before the typical linux call to ./configure) it replaces any compiler usage with the Emscripten compiler emcc, we can try and build Sollya again but for JavaScript!

So I started with libxml2, which worked! And then onto GMP – and explosions. Some stack overflowing pointed me to Compiling GMP/MPFR with Emscripten which states that for some reason (I didn’t dig into why) Emscripten couldn’t compile GMP if the host platform was not 32 bits. I looked at the answer where it suggests you chroot and thought ‘that seems like a lot of work to mess with my current 64-bit VM that I do other things on, I’ll fire up a new VM to mess with’. But since I’m creating a new VM anyway, I decided to just create a 32-bit Ubuntu VM and use that instead (which meant less configuration work on my part).

So with my 32-bit VM, I started the whole process of compiling libxml2, gmp, mpfr, mpfi, fplll (wow I’m on a roll!) and finally I get to Sollya and… it failed.

Sollya and dlopen

Sollya makes use of dlopen, and thus the ./configure script in Sollya will check that dlopen is a command that works on the target platform. The problem is, ./configure doesn’t use the correct signature for the dlopen call – it just does:

 extern char dlopen();

and then ensures that the linker doesn’t complain when this is linked against -ldl. The signature of dlopen is:

void* dlopen(char*, int);

and Emscripten looks for that exact signature, and complains if the function doesn’t have the correct number and type of arguments. This meant as far as ./configure was concerned, the system didn’t have dlopen (even though Emscripten can stub implement it), and it failed.

Ever the hacker, I just decided to patch the ./configure to not error out:

sed -i -e "s/as_fn_error .. \"libdl unusable\"/echo \"skipped\"\#/" ./sollya/configure

tried to build again, and Sollya built!

Emscripten and .bc’s

Emscripten seems to output an LLVM bitcode (.bc) file by default – and I couldn’t work out how to tell emconfigure to output a JavaScript file instead.

So what I did was take the bitcode file that was in ‘sollya’ and used emcc directly to turn this into a JavaScript file.

emcc complained if the input bitcode file wasn’t named <something>.bc, so I first renamed it to sollya.bc:

cp sollya sollya.bc
emcc sollya.bc -o sollya.js

and I got a whopping 27MB JavaScript file out!

Next I used node to run this JavaScript against a simple test script I wrote:

print("Single precision:");
r=[1/sqrt(2)-1; sqrt(2)-1];
f=log2(1+x)/x;
p=fpminimax(f, 11, [|single...|], r, floating, relative);
p;
print("\nDouble precision:");
p=fpminimax(f, 21, [|double...|], r, floating, relative);
p;

and ran Node.js:

nodejs sollya.js < script.sollya

and it ran!

But it kept on running, like infinite-loop running – the thing just never stopped. I was getting a ton of ‘sigaction not implemented’ messages, so I wondered if Sollya was doing something really ugly with signals to handle exiting from a script. I thought about digging into it, then realised Sollya has an explicit ‘quit;’ command, so I added that to the bottom of the script:

print("Single precision:");
r=[1/sqrt(2)-1; sqrt(2)-1];
f=log2(1+x)/x;
p=fpminimax(f, 11, [|single...|], r, floating, relative);
p;
asd;
print("\nDouble precision:");
p=fpminimax(f, 21, [|double...|], r, floating, relative);
p;
quit;

and it ran and exited as expected.

> Single precision:
> Warning: at least one of the given expressions is not a constant but requires evaluation.
Evaluation is guaranteed to ensure the inclusion property. The approximate result is at least 165 bit accurate.
> > > 1.44269502162933349609375 + x * (-0.7213475704193115234375 + x * (0.4809020459651947021484375 + x * (-0.360668718814849853515625 + x * (0.2883343398571014404296875 + x * (-0.24055089056491851806640625 + x * (0.21089743077754974365234375 + x * (-0.1813324391841888427734375 + x * (0.10872711241245269775390625 + x * (-0.10412885248661041259765625 + x * (0.35098421573638916015625 + x * (-0.383228302001953125)))))))))))
> Warning: the identifier "asd" is neither assigned to, nor bound to a library function nor external procedure, nor equal to the current free variable.
Will interpret "asd" as "x".
x
> 
Double precision:
> > 1.44269504088896338700465094007086008787155151367187 + x * (-0.72134752044448169350232547003543004393577575683594 + x * (0.4808983469630028761976348050666274502873420715332 + x * (-0.36067376022224723053355432966782245784997940063477 + x * (0.288539008174513611493239295668900012969970703125 + x * (-0.24044917347913088989663776828820118680596351623535 + x * (0.20609929188248227172053361755388323217630386352539 + x * (-0.18033688048265933412395156665297690778970718383789 + x * (0.160299431107568057797152505372650921344757080078125 + x * (-0.144269475404082331282396012284152675420045852661133 + x * (0.13115467750388201673139576541871065273880958557129 + x * (-0.120225818807988840686284959247132064774632453918457 + x * (0.110964912764316969706612781010335311293601989746094 + x * (-0.103018221150312991318820365904684877023100852966309 + x * (9.6317404417675320238423353202961152419447898864746e-2 + x * (-9.0652910508716211257507211485062725841999053955078e-2 + x * (8.4035326134831819788750806310417829081416130065918e-2 + x * (-7.5783141066360651394440139938524225726723670959473e-2 + x * (7.650699022117241065998882731946650892496109008789e-2 + x * (-9.2331285631306825312236696845502592623233795166016e-2 + x * (8.7941823766079466051515112212655367329716682434082e-2 + x * (-3.8635539215562890447142052607887308113276958465576e-2)))))))))))))))))))))

So now I have a JavaScript file that works when I run it through Node.js, but we’ve got a couple of issues:

  • The JavaScript is freaking huge!
  • We don’t want to require Node.js to be installed either for our developers.

File size

Digging into Emscripten I found that there were a couple of options we could use:

  • -O3 – same as all compilers, we can specify that the compiler should optimize the code heavily.
  • -llvm-lto 2 – this enables all the optimizations to occur on the entire set of bitcode files once they are all linked together. This will allow for a ton more inlining to take place which should help our performance.

Adding both these options, the size of the produce sollya.js was 4.1MB! A whopping 6.5x reduction in file size – and its actually optimized properly now too.

Creating a standalone Windows binary?

So I’ve got sollya.js – and I can run this with Node.js on Windows and get actual valid polynomials. But I really want a standalone executable that has no dependencies, is this possible? Searching around, I found out about nexe – a way to bundle a Node.js application into a single executable. It basically puts Node.js and the JavaScript file into the same executable, and calls Node.js on the JavaScript at runtime. While this isn’t amazing – would it work?

First off – you have to use nexe on the platform you want to run the end executable on  – so I copied the sollya.js from my VM to my Windows host, and then after installing nexe I ran:

nexe -i sollya.js -f -o sollya.exe

And what do you know – I can run sollya.exe and it works as expected. The downside is that because the executable is shipping an entire copy of Node.js with it – sollya.exe is a whopping 29MB to ship around.

Performance

I’ve compared the natively compiled sollya executable with the JavaScript variant. I ran them 50 times, and averaged out the results.

sollya sollyajs  JS vs Native Performance
1.37144s 4.93946s 3.6x slower

So as expected – given that we are running through JavaScript and Node.js, we are 3.6x slower than the natively compiled executable. I’m honestly surprised we are not slower (I’d heard horror stories of 90x slowdowns with Emscripten) so this seems not too bad to me.

Conclusion

It seems that with Emscripten, in combination with Node.js and Nexe, we can compile a program on Linux to be run entirely on Windows – which is pretty freaking cool. There are probably many other more sane ways to do exactly this, but I find it pretty amazing that this is even possible. Now I can ‘natively’ run a Windows executable which will calculate all the polynomial approximations I need on Windows too – saving our team from having to have a Linux VM when re-generating the polynomials is required.

CMake script to build Sollya with Emscripten

In case anyone is interested, I use a CMake file to bring in all the dependencies and build Sollya using Emscripten.

cmake_minimum_required(VERSION 3.4)
project(emsollya)

include(ExternalProject)

ExternalProject_Add(libxml2
  PREFIX ${CMAKE_BINARY_DIR}/libxml2
  URL ftp://xmlsoft.org/libxml2/libxml2-git-snapshot.tar.gz
  PATCH_COMMAND NOCONFIGURE=1 sh ${CMAKE_BINARY_DIR}/libxml2/src/libxml2/autogen.sh
  CONFIGURE_COMMAND emconfigure ${CMAKE_BINARY_DIR}/libxml2/src/libxml2/configure
    --disable-shared --without-python --prefix=${CMAKE_BINARY_DIR}/libxml2
  BUILD_COMMAND make
  INSTALL_COMMAND make install
)

ExternalProject_Add(gmp
  PREFIX ${CMAKE_BINARY_DIR}/gmp
  URL https://gmplib.org/download/gmp/gmp-6.1.2.tar.bz2
  CONFIGURE_COMMAND emconfigure ${CMAKE_BINARY_DIR}/gmp/src/gmp/configure
    --disable-assembly --enable-cxx --disable-shared
    --prefix=${CMAKE_BINARY_DIR}/gmp
  BUILD_COMMAND make
  INSTALL_COMMAND make install
)

ExternalProject_Add(mpfr
  DEPENDS gmp
  PREFIX ${CMAKE_BINARY_DIR}/mpfr
  URL http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.bz2
  CONFIGURE_COMMAND emconfigure ${CMAKE_BINARY_DIR}/mpfr/src/mpfr/configure
    --disable-shared --with-gmp=${CMAKE_BINARY_DIR}/gmp
    --prefix=${CMAKE_BINARY_DIR}/mpfr
  BUILD_COMMAND make
  INSTALL_COMMAND make install
)

ExternalProject_Add(mpfi
  DEPENDS gmp mpfr
  PREFIX ${CMAKE_BINARY_DIR}/mpfi
  URL https://gforge.inria.fr/frs/download.php/file/30129/mpfi-1.5.1.tar.bz2
  CONFIGURE_COMMAND emconfigure ${CMAKE_BINARY_DIR}/mpfi/src/mpfi/configure
    --disable-shared --with-gmp=${CMAKE_BINARY_DIR}/gmp
    --with-mpfr=${CMAKE_BINARY_DIR}/mpfr
    --prefix=${CMAKE_BINARY_DIR}/mpfi
  BUILD_COMMAND make
  INSTALL_COMMAND make install
)

ExternalProject_Add(fplll
  DEPENDS gmp mpfr
  PREFIX ${CMAKE_BINARY_DIR}/fplll
  GIT_REPOSITORY https://github.com/fplll/fplll.git
  GIT_TAG cd47f76b017762317245de7878c7b41eff9ab5d0
  PATCH_COMMAND sh ${CMAKE_BINARY_DIR}/fplll/src/fplll/autogen.sh
  CONFIGURE_COMMAND emconfigure ${CMAKE_BINARY_DIR}/fplll/src/fplll/configure
    --disable-shared --with-gmp=${CMAKE_BINARY_DIR}/gmp
    --with-mpfr=${CMAKE_BINARY_DIR}/mpfr
    --prefix=${CMAKE_BINARY_DIR}/fplll
  BUILD_COMMAND make
  INSTALL_COMMAND make install
)

ExternalProject_Add(sollya
  DEPENDS gmp mpfr mpfi fplll libxml2
  PREFIX ${CMAKE_BINARY_DIR}/sollya
  URL http://sollya.gforge.inria.fr/sollya-weekly-11-05-2017.tar.bz2
  PATCH_COMMAND sed -i -e "s/as_fn_error .. \"libdl unusable\"/echo \"skipped\"\#/"
    ${CMAKE_BINARY_DIR}/sollya/src/sollya/configure
  CONFIGURE_COMMAND EMCONFIGURE_JS=1 emconfigure
    ${CMAKE_BINARY_DIR}/sollya/src/sollya/configure
    --disable-shared --with-gmp=${CMAKE_BINARY_DIR}/gmp
    --with-fplll=${CMAKE_BINARY_DIR}/fplll
    --with-mpfi=${CMAKE_BINARY_DIR}/mpfi
    --with-mpfr=${CMAKE_BINARY_DIR}/mpfr
    --with-xml2=${CMAKE_BINARY_DIR}/libxml2
    --prefix=${CMAKE_BINARY_DIR}/fplll
  BUILD_COMMAND make
  INSTALL_COMMAND make install
)

ExternalProject_Get_Property(sollya BINARY_DIR)

add_custom_command(OUTPUT ${CMAKE_BINARY_DIR}/sollya.js
  COMMAND cmake -E copy ${BINARY_DIR}/sollya ${CMAKE_BINARY_DIR}/sollya.bc
  COMMAND emcc --memory-init-file 0 -O3 --llvm-lto 2 ${CMAKE_BINARY_DIR}/sollya.bc -o ${CMAKE_BINARY_DIR}/sollya.js
  DEPENDS ${BINARY_DIR}/sollya
)

add_custom_target(sollya_js ALL DEPENDS ${CMAKE_BINARY_DIR}/sollya.js)

add_dependencies(sollya_js sollya)
06 Nov

LLVM & CRT – auto-magically selecting the correct CRT

LLVM comes with a really useful set of options LLVM_USE_CRT_<config> which allows you to specify a different C RunTime (CRT) when compiling with Visual Studio. If you want to be able to compile LLVM as a release build, but compile some code that uses it in debug (EG. our ComputeAorta product that allows customers to implement OpenCL/Vulkan on their hardware insanely quickly), Visual Studio will complain about mixing the differing version of the CRT. By using the LLVM_USE_CRT_<config> option, we can specify that LLVM compiles in a release build, but using a debug CRT.

There is one annoying catch with this though – compiling LLVM can be expensive to build. We’ll average 10 minutes build time for a full build of LLVM. We don’t want to recompile LLVM, and we don’t want to be constantly building different copies of LLVM everytime we pull in the latest commits for 2-4 different versions of the CRT. We want to be able to change a ComputeAorta build from debug/release without having to rebuild LLVM, and we want all this to just work™ without any manual input from a developer.

Changing LLVM

So what we need to do is detect which CRT LLVM was built against. My first thought was to allow LLVM to export which CRT it was built against into an LLVM install. LLVM already outputs an LLVMConfig.cmake during its install process, so why not just record what CRT was used too? I contacted the LLVM mailing list asking... and got no response. I’ve found in general if you are not a super active contributor and located in the bay area this is a common occurrence. Not wanting to be that guy that nags on the mailing list about things no-one else clearly cares about, how else could I solve it?

Detecting the CRT

So I reasoned that since the Visual Studio linker could detect and give me a good error message when I was accidentally mixing CRT versions, there must be some information recorded in the library files produced from Visual Studio that said which CRT the library was linked against. Using dumpbin.exe (which is included with Visual Studio) I first called:

$ dumpbin /? 
Microsoft (R) COFF/PE Dumper Version 14.00.24215.1 
Copyright (C) Microsoft Corporation. All rights reserved. 
 
usage: DUMPBIN [options] [files] 
 
 options: 
 
 /ALL 
 /ARCHIVEMEMBERS 
 /CLRHEADER 
 /DEPENDENTS 
 /DIRECTIVES 
 /DISASM[:{BYTES|NOBYTES}] 
 /ERRORREPORT:{NONE|PROMPT|QUEUE|SEND} 
 /EXPORTS 
 /FPO 
 /HEADERS 
 /IMPORTS[:filename] 
 /LINENUMBERS 
 /LINKERMEMBER[:{1|2}] 
 /LOADCONFIG 
 /NOLOGO 
 /OUT:filename 
 /PDATA 
 /PDBPATH[:VERBOSE] 
 /RANGE:vaMin[,vaMax] 
 /RAWDATA[:{NONE|1|2|4|8}[,#]] 
 /RELOCATIONS 
 /SECTION:name 
 /SUMMARY 
 /SYMBOLS 
 /TLS 
 /UNWINDINFO

And through a process of elimination I ran the ‘/DIRECTIVES’ command against one of the .lib files in LLVM which gave:

$ dumpbin /DIRECTIVES LLVMCore.lib
Microsoft (R) COFF/PE Dumper Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file LLVMCore.lib

File Type: LIBRARY

   Linker Directives
   -----------------
   /FAILIFMISMATCH:_MSC_VER=1900
   /FAILIFMISMATCH:_ITERATOR_DEBUG_LEVEL=2
   /FAILIFMISMATCH:RuntimeLibrary=MDd_DynamicDebug
   /DEFAULTLIB:msvcprtd
   /FAILIFMISMATCH:_CRT_STDIO_ISO_WIDE_SPECIFIERS=0
   /FAILIFMISMATCH:LLVM_ENABLE_ABI_BREAKING_CHECKS=1
   /DEFAULTLIB:MSVCRTD
   /DEFAULTLIB:OLDNAMES

...

And what do you know ‘/FAILIFMISMATCH:RuntimeLibrary=MDd_DynamicDebug’ is telling the linker to output an error message if the CRT is not the dynamic debug variant! So now I have a method of detecting the CRT from one of LLVM’s libraries, how to incorporate that in our build?

CMake Integration

LLVM uses CMake for its builds, and thus we also use CMake for our builds. We already include LLVM by specifying the location of an LLVM install like:

$ cmake -DCA_LLVM_INSTALL_DIR=<directory> .
-- Overriding option 'CA_LLVM_INSTALL_DIR' to '<directory>' (default was '').

And then within our CMake we do:

# Setup LLVM/Clang search paths.
list(APPEND CMAKE_MODULE_PATH
  ${CA_LLVM_INSTALL_DIR}/lib/cmake/llvm
  ${CA_LLVM_INSTALL_DIR}/lib/cmake/clang)

# Include LLVM.
include(LLVMConfig)

# Include Clang.
include(ClangTargets)

So I added a new DetectLLVMMSVCCRT.cmake to our CMake modules and included it just after the ClangTargets include. This does the following:

  • Get the directory of CMAKE_C_COMPILER (always cl.exe in our case).
  • Look for dumpbin.exe in the same directory.
  • Get the location of LLVMCore.lib.
    • My reasoning is that most libraries in LLVM could change over time, but the core library of LLVM is unlikely to be moved (I hope!).
  • Run dumpbin /DIRECTIVES LLVMCore.lib
    • Find the first usage of ‘/FAILIFMISMATCH:RuntimeLibrary=’
    • Get the string that occurs between ‘/FAILIFMISMATCH:RuntimeLibrary=’ and the next ‘_’

And then we’ve got the CRT we need to use to build with. To actually set the CRT to use, we can just call LLVM’s ChooseMSVCCRT.cmake (that ships in an LLVM install), specifying the LLVM_USE_CRT_<config> variables and voila, we’ll be using the same CRT as LLVM, and get no linker errors!

The full CMake script is:

if(NOT CMAKE_SYSTEM_NAME STREQUAL Windows)
  return()
endif()

# Get the directory of cl.exe
get_filename_component(tools_dir "${CMAKE_C_COMPILER}" DIRECTORY)

# Find the dumpbin.exe executable in the directory of cl.exe
find_program(dumpbin "dumpbin.exe" PATHS "${tools_dir}" NO_DEFAULT_PATH)

if("${dumpbin}" STREQUAL "dumpbin-NOTFOUND")
  message(WARNING "Could not detect which CRT LLVM was built against - "
                  "could not find 'dumpbin.exe'.")
  return()
endif()

# Get the location in the file-system of LLVMCore.lib
get_target_property(llvmcore LLVMCore LOCATION)

if("${llvmcore}" STREQUAL "llvmcore-NOTFOUND")
  message(WARNING "Could not detect which CRT LLVM was built against - "
                  "could not find location of 'LLVMCore.lib'.")
  return()
endif()

# Get the directives that LLVMCore.lib contains
execute_process(COMMAND "${dumpbin}" "/DIRECTIVES" "${llvmcore}"
  OUTPUT_VARIABLE output)

# Find the first directive specifying what CRT to use
string(FIND "${output}" "/FAILIFMISMATCH:RuntimeLibrary=" position)

# Strip away everything but the directive we want to examine
string(SUBSTRING "${output}" ${position} 128 output)

# Remove the directive prefix which we don't need
string(REPLACE "/FAILIFMISMATCH:RuntimeLibrary=" "" output "${output}")

# Get the position of the '_' character that breaks the CRT from all else
string(FIND "${output}" "_" position)

# Substring output to be one of the four CRT values: MDd MD MTd MT
string(SUBSTRING "${output}" 0 ${position} output)

# Set all possible CMAKE_BUILD_TYPE's to the CRT that LLVM was linked against
set(LLVM_USE_CRT_DEBUG "${output}")
set(LLVM_USE_CRT_RELWITHDEBINFO "${output}")
set(LLVM_USE_CRT_MINSIZEREL "${output}")
set(LLVM_USE_CRT_RELEASE "${output}")

# Include the LLVM cmake module to choose the correct CRT
include(ChooseMSVCCRT)

Conclusion

We’ve been able to do what we set out to do – auto-magically make our project that uses an LLVM install work reliably even with mixed Debug/Release builds. This has reduced the number of LLVM compiles I do daily by 2x (yay) and also allowed me to stop tracking (and caring) about CRT conflicts and how to avoid them.