In this blogpost, I will first explain the WebAssembly binary format and its sections. Then, I’ll demonstrate how to create a valid polyglot wasm module that contain an html/js payload embedded using 2 different techniques. Finally, I’ll give you the link to the github repository if you want to try on your own and learn more about WebAssembly.
Let’s start by the definition of Wikipedia:
a polyglot is a computer program or script written in a valid form of multiple programming languages, which performs the same operations or output independent of the programming language used to compile or interpret it.
- “Funky File Formats” talk by @angealbertini – slides, video
- Compilation of polyglots ressources – github
- 𝑻𝒓𝒖𝒆𝑷𝒐𝒍𝒚𝒈𝒍𝒐𝒕 project – website, github
So, the first bytes of a valid WebAssembly module are the magic bytes ‘\0asm’ (i.e. null byte followed by
asm). Following, there is a 4 byte version number, fixed to the value 0x1 since the release of MVP 1.0.
This preamble of 2 fields is enough to create a valid WebAssembly module. Note that if we try to change the value of the version field some WebAssembly parsers and VMs will reject your module.
The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte section code (0-11) that encodes either a known section or a custom section. Each known section is optional and may appear at most once.
I'll not go into more detail on the content of each section, but if you would like to learn more I suggest you check out the following resources:
- WebAssembly Binary Encoding – link
- Introduction to WebAssembly – link
- Understanding WebAssembly text format – link
Let’s start with a basic “HelloWorld”. WebAssembly modules can be compiled from C/C++/Rust/… source code or directly written using the WebAssembly text representation (wat/wast).
This module will return the offset of the string
hello from WebAssembly ! when the function
hello will be called by a VM. This offset is a pointer to the string stored in linear memory, i.e. the memory shared by the wasm module and the WebAssembly VM.
As you can see, this data section allow us to store completely arbitrary strings inside a module, exactly what we need to inject some HTML/JS payloads into our wasm module.
The data section is not the only section that can be used to store arbitrary strings. The custom section has been designed exactly with this goal in mind. For example, if a developer wants to store DWARF debug information inside a module, during compilation of the module, a bunch of custom sections will be embedded in the wasm file itself with a different DWARF section name for each (ex: .debuginfo, .debugline, etc.).
Injection of arbitrary strings & paylods using this technique will only require you to calculate the correct field length to ensure that the custom section is valid.
Since I'm not a polyglot expert, I've asked @angealbertini and he kindly provided me with this example HTML/JS payload. In short, the payload use InnerHTML to prevent the browser from parsing the entire file. More details about this payload and trick can be found here.
Finally, this payload needs to be embedded inside a WebAssembly module using either the data section or a custom section.
Let’s start with the simplest technique first. I chose to directly modify the WebAssembly text representation of the previous HelloWorld module by adding an extra line to it:
(data (i32.const 42) "PAYLOAD_HERE")
This line will create a new “segment” in the data section of the module, with the payload inside. Then, I’ve translated this wasm text file to a wasm module (named wasmpolyglotdata.wasm) using wat2wasm.
For this second technique, I just created a simple python script (available here) that takes my payload and my HelloWorld module in order to concatenate them into one single binary file (named wasmpolyglotcustom.wasm). Note that I've injected the custom section (with the HTML/JS payload inside) at the beginning of the final module, i.e. just after the WebAssembly header.
After you have done this, you can verify the internal structure of your module using wasmcodeexplorer - it should look somewhat similar to the following screenshot:
First, we need to check if our new WebAssembly modules are still valid:
For this you can use standalone tools like wasm-validate or you can directly try to instantiate those modules with a WebAssembly VM like
hello() and finally print the HelloWorld string.
Now try to change the polyglot file extension from .wasm to .html, open the file in your browser, and you should see the following alert:
If you try to directly fetch the wasm_polyglot.html file using
Failed to execute 'compile' on 'WebAssembly': Incorrect response MIME type. Expected 'application/wasm'.
You can bypass this MIME verification by fetching the module, storing the buffer content and finally compiling/instantiating the module using other WebAssembly APIs.
I hope that you learned and discvered some new WebAssembly tricks in this blogpost! If you want to dive deeper into polyglots after reading this, you can try to apply these techniques to create PDFs, GIFs and other polyglot WebAssembly modules. Thanks again to Ange Albertini for the payload and advice in regards to polyglots!
All the WebAssembly module, JS files and scripts shown in this blogpost are available in this github repository.
About the Author
Patrick is a security researcher focused on fuzzing, reverse engineering and vulnerability research targeting WebAssembly and Rust security.