WebAssembly: How to create polyglot HTML, JS, WebAssembly module

Patrick Ventuzelo
January 03, 2020

In this blogpost, I will first explain the WebAssembly binary format and its sections. Then, I’ll demonstrate how to create a valid polyglot wasm module that contain an html/js payload embedded using 2 different techniques. Finally, I’ll give you the link to the github repository if you want to try on your own and learn more about WebAssembly.

If you are interested in learning more about WebAssembly security, checkout my February training in Berlin - we still have early-bird pricing!

1. What's a polyglot file?

Let’s start by the definition of Wikipedia:

a polyglot is a computer program or script written in a valid form of multiple programming languages, which performs the same operations or output independent of the programming language used to compile or interpret it.

In my opinion, this definition is too restrictive and too specific to “Polyglot Programming”. A polyglot file, after being executed/parsed by different programs, will rarely lead to the same output. For example, Javascript/BMP polymorphic files have been used in the wild to hide malicious payloads that only get executed if the file is interpreted as JavaScript. If you want to discover more about polymorphic files, you should take a look at the following ressources:

  • “Funky File Formats” talk by @angealbertini – slides, video
  • Compilation of polyglots ressources – github
  • 𝑻𝒓𝒖𝒆𝑷𝒐𝒍𝒚𝒈𝒍𝒐𝒕 project – website, github

2. The WebAssembly binary format,​ in short.

So, the first bytes of a valid WebAssembly module are the magic bytes ‘\0asm’ (i.e. null byte followed by asm). Following, there is a 4 byte version number, fixed to the value 0x1 since the release of MVP 1.0.

alt text

The WebAssembly module header

This preamble of 2 fields is enough to create a valid WebAssembly module. Note that if we try to change the value of the version field some WebAssembly parsers and VMs will reject your module.

alt text

A minimal wasm module in a single line

The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte section code (0-11) that encodes either a known section or a custom section. Each known section is optional and may appear at most once.

alt text

The known module sections of WebAssembly

I'll not go into more detail on the content of each section, but if you would like to learn more I suggest you check out the following resources:

  • WebAssembly Binary Encoding – link
  • Introduction to WebAssembly – link
  • Understanding WebAssembly text format – link

3. HelloWorld & the WebAssembly data section

Let’s start with a basic “HelloWorld”. WebAssembly modules can be compiled from C/C++/Rust/… source code or directly written using the WebAssembly text representation (wat/wast).

alt text

Disassembly of a simple HelloWorld

This module will return the offset of the string hello from WebAssembly ! when the function hello will be called by a VM. This offset is a pointer to the string stored in linear memory, i.e. the memory shared by the wasm module and the WebAssembly VM.

As you can see, this data section allow us to store completely arbitrary strings inside a module, exactly what we need to inject some HTML/JS payloads into our wasm module.

4. WebAssembly custom section

The data section is not the only section that can be used to store arbitrary strings. The custom section has been designed exactly with this goal in mind. For example, if a developer wants to store DWARF debug information inside a module, during compilation of the module, a bunch of custom sections will be embedded in the wasm file itself with a different DWARF section name for each (ex: .debuginfo, .debugline, etc.).

Injection of arbitrary strings & paylods using this technique will only require you to calculate the correct field length to ensure that the custom section is valid.

alt text

Custom sections need to have a name and a name length field.

5. Awesome HTML/JS payload needed!

alt text

An example HTML/JS payload to inject into the module

Since I'm not a polyglot expert, I've asked @angealbertini and he kindly provided me with this example HTML/JS payload. In short, the payload use InnerHTML to prevent the browser from parsing the entire file. More details about this payload and trick can be found here.

Finally, this payload needs to be embedded inside a WebAssembly module using either the data section or a custom section.

6. Data section injection (First technique)

Let’s start with the simplest technique first. I chose to directly modify the WebAssembly text representation of the previous HelloWorld module by adding an extra line to it: (data (i32.const 42) "PAYLOAD_HERE")

This line will create a new “segment” in the data section of the module, with the payload inside. Then, I’ve translated this wasm text file to a wasm module (named wasmpolyglotdata.wasm) using wat2wasm.

alt text

Payload injected into the data section using the wasm text format

7. Custom section injection (Second technique)

For this second technique, I just created a simple python script (available here) that takes my payload and my HelloWorld module in order to concatenate them into one single binary file (named wasmpolyglotcustom.wasm). Note that I've injected the custom section (with the HTML/JS payload inside) at the beginning of the final module, i.e. just after the WebAssembly header.

After you have done this, you can verify the internal structure of your module using wasmcodeexplorer - it should look somewhat similar to the following screenshot:

alt text

Our polyglot shown in wasmcodeexplorer

8. Is it working?

First, we need to check if our new WebAssembly modules are still valid:

  • wasm_polyglot_data.wasm
  • wasm_polyglot_custom.wasm

For this you can use standalone tools like wasm-validate or you can directly try to instantiate those modules with a WebAssembly VM like wasmer, wasmtime or WAVM. Using Javascript, you can use the WebAssembly.validate() or WebAssembly.instantiate() APIs.

alt text

A simple example script that will call the wasm function.

For verification, run a web server locally and open this script (picture on the right – source here). You should see some messages in the JavaScript console. In short, this script will fetch our polyglot wasm module, call the exported function hello() and finally print the HelloWorld string.

alt text

JS log of the polyglot getting instantiated and called using the WebAssembly API

Now try to change the polyglot file extension from .wasm to .html, open the file in your browser, and you should see the following alert:

alt text

The alert when the polyglot gets interpreted as HTML file by the browser.

If you try to directly fetch the wasm_polyglot.html file using instantiateStreaming(), you will get a Javascript error message: Failed to execute 'compile' on 'WebAssembly': Incorrect response MIME type. Expected 'application/wasm'.

You can bypass this MIME verification by fetching the module, storing the buffer content and finally compiling/instantiating the module using other WebAssembly APIs.

9. Conclusion

I hope that you learned and discvered some new WebAssembly tricks in this blogpost! If you want to dive deeper into polyglots after reading this, you can try to apply these techniques to create PDFs, GIFs and other polyglot WebAssembly modules. Thanks again to Ange Albertini for the payload and advice in regards to polyglots!

All the WebAssembly module, JS files and scripts shown in this blogpost are available in this github repository.

*Also, if you want to learn & discover more about Web Assembly security, check out my 4-day training in Berlin from February 10. to February 13. 2020!*

About the Author

Patrick Ventuzelo

Patrick is a security researcher focused on fuzzing, reverse engineering and vulnerability research targeting WebAssembly and Rust security.