WebAssembly: How to create polyglot HTML, JS, WebAssembly module
In this blogpost, I will first explain the WebAssembly binary format and its sections. Then, I’ll demonstrate how to create a valid polyglot wasm module that contain an html/js payload embedded using 2 different techniques. Finally, I’ll give you the link to the github repository if you want to try on your own and learn more about WebAssembly.
1. What's a polyglot file?
Let’s start by the definition of Wikipedia:
a polyglot is a computer program or script written in a valid form of multiple programming languages, which performs the same operations or output independent of the programming language used to compile or interpret it.
In my opinion, this definition is too restrictive and too specific to “Polyglot Programming”. A polyglot file, after being executed/parsed by different programs, will rarely lead to the same output. For example, Javascript/BMP polymorphic files have been used in the wild to hide malicious payloads that only get executed if the file is interpreted as JavaScript. If you want to discover more about polymorphic files, you should take a look at the following ressources:
- “Funky File Formats” talk by @angealbertini – slides, video
- Compilation of polyglots ressources – github
- 𝑻𝒓𝒖𝒆𝑷𝒐𝒍𝒚𝒈𝒍𝒐𝒕 project – website, github
2. The WebAssembly binary format, in short.
So, the first bytes of a valid WebAssembly module are the magic bytes ‘\0asm’ (i.e. null byte followed by asm
). Following, there is a 4 byte version number, fixed to the value 0x1 since the release of MVP 1.0.
This preamble of 2 fields is enough to create a valid WebAssembly module. Note that if we try to change the value of the version field some WebAssembly parsers and VMs will reject your module.
The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte section code (0-11) that encodes either a known section or a custom section. Each known section is optional and may appear at most once.
I'll not go into more detail on the content of each section, but if you would like to learn more I suggest you check out the following resources:
- WebAssembly Binary Encoding – link
- Introduction to WebAssembly – link
- Understanding WebAssembly text format – link
3. HelloWorld & the WebAssembly data section
Let’s start with a basic “HelloWorld”. WebAssembly modules can be compiled from C/C++/Rust/… source code or directly written using the WebAssembly text representation (wat/wast).
This module will return the offset of the string hello from WebAssembly !
when the function hello
will be called by a VM. This offset is a pointer to the string stored in linear memory, i.e. the memory shared by the wasm module and the WebAssembly VM.
As you can see, this data section allow us to store completely arbitrary strings inside a module, exactly what we need to inject some HTML/JS payloads into our wasm module.
4. WebAssembly custom section
The data section is not the only section that can be used to store arbitrary strings. The custom section has been designed exactly with this goal in mind. For example, if a developer wants to store DWARF debug information inside a module, during compilation of the module, a bunch of custom sections will be embedded in the wasm file itself with a different DWARF section name for each (ex: .debuginfo, .debugline, etc.).
Injection of arbitrary strings & paylods using this technique will only require you to calculate the correct field length to ensure that the custom section is valid.
5. Awesome HTML/JS payload needed!
Since I'm not a polyglot expert, I've asked @angealbertini and he kindly provided me with this example HTML/JS payload. In short, the payload use InnerHTML to prevent the browser from parsing the entire file. More details about this payload and trick can be found here.
Finally, this payload needs to be embedded inside a WebAssembly module using either the data section or a custom section.
6. Data section injection (First technique)
Let’s start with the simplest technique first. I chose to directly modify the WebAssembly text representation of the previous HelloWorld module by adding an extra line to it: (data (i32.const 42) "PAYLOAD_HERE")
This line will create a new “segment” in the data section of the module, with the payload inside. Then, I’ve translated this wasm text file to a wasm module (named wasmpolyglotdata.wasm) using wat2wasm.
7. Custom section injection (Second technique)
For this second technique, I just created a simple python script (available here) that takes my payload and my HelloWorld module in order to concatenate them into one single binary file (named wasmpolyglotcustom.wasm). Note that I've injected the custom section (with the HTML/JS payload inside) at the beginning of the final module, i.e. just after the WebAssembly header.
After you have done this, you can verify the internal structure of your module using wasmcodeexplorer - it should look somewhat similar to the following screenshot:
8. Is it working?
First, we need to check if our new WebAssembly modules are still valid:
- wasm_polyglot_data.wasm
- wasm_polyglot_custom.wasm
For this you can use standalone tools like wasm-validate or you can directly try to instantiate those modules with a WebAssembly VM like wasmer
, wasmtime
or WAVM
. Using Javascript, you can use the WebAssembly.validate()
or WebAssembly.instantiate()
APIs.
For verification, run a web server locally and open this script (picture on the right – source here). You should see some messages in the JavaScript console. In short, this script will fetch our polyglot wasm module, call the exported function hello()
and finally print the HelloWorld string.
Now try to change the polyglot file extension from .wasm to .html, open the file in your browser, and you should see the following alert:
If you try to directly fetch the wasm_polyglot.html file using instantiateStreaming()
, you will get a Javascript error message: Failed to execute 'compile' on 'WebAssembly': Incorrect response MIME type. Expected 'application/wasm'.
You can bypass this MIME verification by fetching the module, storing the buffer content and finally compiling/instantiating the module using other WebAssembly APIs.
9. Conclusion
I hope that you learned and discvered some new WebAssembly tricks in this blogpost! If you want to dive deeper into polyglots after reading this, you can try to apply these techniques to create PDFs, GIFs and other polyglot WebAssembly modules. Thanks again to Ange Albertini for the payload and advice in regards to polyglots!
All the WebAssembly module, JS files and scripts shown in this blogpost are available in this github repository.
About the Author
Patrick Ventuzelo
Patrick is a security researcher focused on fuzzing, reverse engineering and vulnerability research targeting WebAssembly and Rust security.