OWASP FSTM, stage 4: Extracting the filesystem
Table of Contents
Many IoT devices run an embedded Linux operating system that can be found included in the firmware image, along with the corresponding file systems. This article discusses the identification and how to extracting the filesystem from a firmware image.
The file system contains the executables, configuration files, scripts and services run by the operating system, so accessing it allows an in-depth analysis of the operation and characteristics of an IoT device. The analysis can be divided into initial recognition phases, the identification of existing file systems in the firmware and their extraction or assembly.
The fourth stage of the OWASP Firmware Security Testing Methodology aims to identify the file systems that can be found in a firmware image, detect the format, and extract their contents for further analysis.
In the previous steps, the firmware of the IoT device under study has been obtained and analyzed. It is common to find embedded Linux systems in these firmware images, adapted to IoT devices, with specific software and file systems. Therefore, one of the most important phases of the analysis is the identification and extraction of the filesystem, which will contain the executables, configuration files, scripts, and services of the device.
Subsequent analysis of this file system provides detailed knowledge of the device’s boot process and operation, which can lead to the identification of vulnerable executables or services and delimit the attack surface.
The file systems contained in the firmware may be in clear text or may be compressed or encrypted. In the first two cases, it will only be necessary to identify the format and use the appropriate tool to extract or mount it in the analysis environment. For an encrypted file system, more research about the firmware and manufacturer will be needed.
The following sections of the article detail the general steps necessary to obtain the contents of the file system. Additionally, some good practices and a set of useful tools for file system analysis are also presented.
In the examples, both firmware images available in the IoTGoat project and images extracted from other IoT devices are used to illustrate some of the possible scenarios.
Firmware image format identification
Before trying to identify the sections with file systems, to understand their contents, it is useful to identify the format of the firmware image. The file utility, available on Linux systems, tries to find out the file type given as an argument.
$ file hola.txt
hola.txt: ASCII text
To do this, file runs three different types of tests on the file: information search with the stat system call, magic numbers search and language identification. More information about this can be found in the previous article.
In cases where the file system appears at the beginning of the extracted image, file can help to identify it:
$ file squashfs
Squashfs: Squashfs filesystem, little endian, version 4.0, xz compressed, 3946402
bytes, 1333 inodes, blocksize: 262144 bytes, created: Wed Jan 30 12:21:02 2019
Although, in most cases, the firmware will start with a bootloader image or a blank section.
Search for signatures and magic numbers
Searching for signatures and magic numbers that reference file types and formats is a very useful technique in identifying sections of firmware, as discussed in the previous article in the series, especially for file system searching.
A useful tool for this is the well-known strings, which displays character strings that can be interpreted as printable in a file:
% strings IoTGoat-raspberry-pi2.img
OWRT
…
hsqs5
7zXZ
0~*}
9E+_X{
JwG#g
5`ds
…
For the IoTGoat-raspberry-pi2.img firmware, the following interesting strings are found for file system lookup:
- hsqs: magic number of squashfs filesystems on little endian.
- 7zXZ: part of the magic number of files compressed with LZMA2
It can also be useful to search for magic numbers in hexadecimal, since, in some cases, magic numbers do not consist of printable characters. For this, you can use a hex editor, such as hexedit, which allows searching byte strings. Some magic numbers corresponding to common file systems in IoT devices are as follows:
– CramFS: 45 3D CD 28
– UBIFS: 31 18 10 06
– JFFS2: 85 19
– SquashFS: 73 71 73 68 (sqsh), 68 73 71 73 (hsqs)
In the IoTGoat-raspberry-pi2.img firmware itself there are also FAT16 and FAT32 tags, but these file systems do not contain files of interest. They are used to allow writing the image to a USB flash drive.
Other magic numbers of interest may be those related to compressed files, such as the following:
– zip: 50 4B 03 04 (PK..)
– rar: 52 61 72 21 1A 07 01 00 (Rar!….)
– 7z: 37 7A BC AF 27 1C (7z¼¯’.)
– xz: FD 37 7A 58 5A 00 (ý7zXZ.)
When searching for a signature or magic number, keep in mind that firmware images may be in little endian or big endian, which affects the byte order within the signature.
In addition, for certain file systems and compression formats, non-standard signatures may be encountered. Many device manufacturers use modified signatures to indicate the format. For example, the open-source DD-WRT firmware for routers may use the tqsh signature to indicate a SquashFS (big endian) file system.
Entropy study
In some cases, sections within the firmware may be encrypted or compressed. If compressed, it is common to find some signature identifying the format, although it does not always exist. However, identifying an encrypted section requires another type of analysis.
In information theory, the entropy of data source is a measure of the average amount of information obtained for every character. By the very design of encryption algorithms, a sample of encrypted information should have an entropy very close to 1, the maximum value, while sections of code and unencrypted data typically have a variable entropy ranging from 0.3 to 0.8. Compression algorithms also produce results with high entropy. A study of the entropy across a firmware image, therefore, can reveal encrypted or compressed sections.
The binwalk firmware analysis tool has an entropy study function, which produces a result like the following:
$ binwalk -E IoTGoat-raspberry-pi2.img
DECIMAL HEXADECIMAL ENTROPY
——————————————————————————–
0 0x0 Falling entropy edge (0.002664)
4718592 0x480000 Falling entropy edge (0.833424)
4997120 0x4C4000 Falling entropy edge (0.837713)
5095424 0x4DC000 Falling entropy edge (0.840429)
5341184 0x518000 Falling entropy edge (0.839935)
5570560 0x550000 Falling entropy edge (0.849444)
5636096 0x560000 Falling entropy edge (0.834985)
5799936 0x588000 Falling entropy edge (0.840472)
5849088 0x594000 Falling entropy edge (0.840706)
5996544 0x5B8000 Falling entropy edge (0.849569)
6275072 0x5FC000 Falling entropy edge (0.849042)
6373376 0x614000 Falling entropy edge (0.848267)
6553600 0x640000 Falling entropy edge (0.848343)
6701056 0x664000 Falling entropy edge (0.678427)6914048 0x698000 Rising entropy edge (0.965015)
6930432 0x69C000 Falling entropy edge (0.619229)
7356416 0x704000 Falling entropy edge (0.831099)
7487488 0x724000 Falling entropy edge (0.842073)
7585792 0x73C000 Falling entropy edge (0.836944)
7667712 0x750000 Falling entropy edge (0.593631)
7798784 0x770000 Falling entropy edge (0.667160)
12058624 0xB80000 Rising entropy edge (0.950634)
12075008 0xB84000 Falling entropy edge (0.560117)
29360128 0x1C00000 Rising entropy edge (0.998248)
In the terminal, the directions where the rising and falling edges of the entropy are located are shown, which can be useful to delimit the sections. The graph shows several sections of unencrypted information at the beginning and a section of encrypted or compressed information at the end.
If the high entropy section, which, according to binwalk results, starts at address 0x1C00000, is accessed with the hexedit hex editor, the following data are found:
The signature hsqs5, which has already been detected in the string search, indicates a squashfs file system at that address, while 7zXZ, a few lines further on, indicates compressed information in xz format. It is therefore not an encrypted region, but a compressed one.
The following example shows an entropy study for an encrypted firmware:
$ binwalk -E firmware
DECIMAL HEXADECIMAL ENTROPY
——————————————————————————–
0 0x0 Rising entropy edge (0.971675)
4716544 0x47F800 Rising entropy edge (0.976452)
In this case, on the one hand, we find only regions of high entropy barely separated from each other. In hexedit, at address 0x0, we find some unencrypted information preceding a region of random information, but no recognizable signature:
At address 0x47F800, a similar situation is found:
These cases indicate an encrypted section in the firmware. To resolve this and access the information they contain, further investigation into the manufacturer, encryption formats it may use, leaked keys and previous versions of the firmware will be helpful. In some cases, these versions are unencrypted and can provide a lot of information about how the device works, including the encryption it uses.
In more complex cases, you should wait for the dynamic and runtime analysis phases for more information.
Extracting the filesystem
Depending on the type of file system found in the firmware, different tools will be required to extracting the filesystem.
The binwalk tool attempts to automate the detection and extraction process for most file systems commonly found in firmware:
$ binwalk firmware
With this command, code, files, and file systems contained in the firmware sections can be obtained according to the binwalk engine. To do this, the tool traverses the image looking for matches with magic numbers, signatures and strings identifying sections within the firmware. The following is the result for the IoTGoat sample firmware:
$ binwalk IoTGoat-raspberry-pi2.img
DECIMAL HEXADECIMAL DESCRIPTION
——————————————————————————–
4253711 0x40E80F Copyright string: “copyright does *not* cover…
…
4329472 0x421000 ELF, 32-bit LSB executable, version 1 (SYSV)
4762160 0x48AA30 AES Inverse S-Box
4763488 0x48AF60 AES S-Box
…
12061548 0xB80B6C gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
12145600 0xB953C0 CRC32 polynomial table, little endian
12852694 0xC41DD6 xz compressed data…
29360128 0x1C00000 Squashfs filesystem, little endian, version 4.0, compression:xz, size: 3946402 bytes, 1333 inodes, blocksize: 262144 bytes, created: 2019-01-30 12:21:02
In this case, binwalk detects several compressed files and a SquashFS file system, which matches the previously detected signatures. Binwalk also has an automatic extraction function, which, while scanning the contents of the firmware, tries to extract them. This is achieved with the following command:
$ binwalk -e firmware
The -e option extracts the contents. The results are stored in _firmware/filesystem_type, where filesystem_type is the type of filesystem the tool has found.
binwalk can find and extract squashfs, ubifs, romfs, rootfs, jffs2, yaffs2, cramfs and initramfs systems, but, due to the signature-based analysis method and the use of different tools for each filesystem, false positives are also frequent. These are especially frequent with short signatures, of 1 or 2 bytes, which can appear in a firmware without this meaning that a section with this format has been found, so you should always check the binwalk results using a hex editor, such as hexedit, to inspect the area where the signature has been detected, especially if they do not match the information collected previously.
Also, binwalk can sometimes introduce errors when attempting to extract a section of the firmware, so if attempting to unzip or mount the archive results in formatting errors, it is useful to perform a manual extraction with the dd tool and unzip or mount the file system with the appropriate tool, as explained below.
For example, on a firmware extracted from another IoT device, binwalk yields the following result:
$ binwalk firmware.bin
DECIMAL HEXADECIMAL DESCRIPTION
——————————————————————————–
5107699 0x4DEFF3 MySQL MISAM compressed data file Version 8
8532033 0x823041 Intel x86 or x64 microcode, sig 0xfc208000, pf_mask 0xf0c100, 1C18-01-30, rev 0x22000000, size 1796
8951861 0x889835 bix header, header size: 64 bytes, header CRC: 0x79079084, created: 1970-01-01 04:59:12, image size: 33591409 bytes, Data Address: 0x10183013, Entry Point: 0x102001F, data CRC: 0xE0208000, image type: Binary Flat Device Tree Blob image name: “”
The first signature indicates a MySQL MISAM compressed data file, which is suspicious, both because of its location and its signature of only three bytes (0xFE 0xFE 0xFE 0x07). If you access the address with hexedit, you can see that the format of the preceding and following bytes does not correspond to that of such a document:
This type of error is very common. It is also possible that binwalk doesn’t have a modified signature registered for a common file type. In these cases, looking for the device manufacturer’s own signatures can be very useful. It may also happen that it is unable to extract a section but is able to detect its location in the firmware. For these cases, you can use binwalk information or manufacturer-specific information about formats to manually extract the section containing the file system with dd:
$ binwalk firmware.img
DECIMAL HEXADECIMAL DESCRIPTION
——————————————————————————–
0 0x0 uImage header, header size: 64 bytes, header CRC: 0x4EA03918, created: 2017-07-20 02:34:00, image size: 6164416 bytes, Data Address: 0x80000000, Entry Point: 0x80294000, data CRC: 0x8D40BD44, OS: Linux, CPU: MIPS, image type: OS Kernel Image, compression type: lzma, image name: “Linux Kernel Image-al-2.32”
64 0x40 LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes, uncompressed size: 2818364 bytes
851968 0xD0000 Squashfs filesystem, little endian, non-standard signature, version 3.0, size: 5309286 bytes, 781 inodes, blocksize: 65536 bytes, created: 2017-07-20 02:33:58$ dd if=firmware.img of=squashfs bs=1 skip=851968 count=5309286
5309286+0 records in
5309286+0 records out
5309286 bytes (5,3 MB, 5,1 MiB) copied, 7,25504 s, 732 kB/s
dd if=firmware.img of=squashroot bs=1 skip=851968 0,78s user 6,40s system 98% cpu 7,259 total
With the file system section separated, the appropriate tool must be used to extract the files.
For the squashfs format, the unsquashfs or sasquatch tools, available on Linux systems, can be used to decompress the file system:
$ sasquatch squashroot
SquashFS version [3.0] / inode count [781] suggests a SquashFS image of the same endianess
Non-standard SquashFS Magic: shsq
Parallel unsquashfs: Using 1 processor
Trying to decompress using default gzip decompressor…
Trying to decompress with lzma…
Detected lzma compression
688 inodes (901 blocks) to write[=================================================================/] 901/901 100%
created 533 files
created 93 directories
created 155 symlinks
created 0 devices
created 0 fifos
Other tools for common formats are:
- cpio for cpio formats.
- jefferson for jffs2 formats.
- uncramfs or cramfsck for cramfs formats.
As a result, you get the file system in a directory like squashfs-root.
It is also possible to find firmware images that directly contain partition tables with embedded file systems. This can occur on devices that require the use of systems such as FAT, NTFS, or ext. To detect this case, the fdisk tool is useful:
$ fdisk -l IoTGoat-raspberry-pi2.img
Disco IoTGoat-raspberry-pi2.img: 31,76 MiB, 33306112 bytes, 65051 sectors
Units: sectors de 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
E/S size (minimum/optimum): 512 bytes / 512 bytes
Disc label type: two
Disc identifier: 0x5452574fDisposit. Start Start End Sectors Size Id Type
iotgoat/IoTGoat-raspberry-pi2.img1 * 8192 49151 40960 20M c W95 FAT32 (LBA)
iotgoat/IoTGoat-raspberry-pi2.img2 57344 581631 524288 256M 83 Linux
In the IoTGoat example image, you can see a partition table with two file systems directly contained in the firmware: a FAT32 partition and a partition with the Linux system image.
The kpartx tool can be used to create virtual devices (loop devices) for the partitions contained in the table. To create devices with the partitions in the firmware, use the -a option:
$ sudo kpartx -a IoTGoat-raspberry-pi2.img
device-mapper: reload ioctl on loop0p2 (254:1) failed: Invalid argument
create/reload failed on loop0p2
$ lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
loop0
├─loop0p1 vfat FAT16 78D2-382B
├─loop0p1 vfat FAT16 78D2-382B
└─loop0p2 squashfs 4.0
Although there is an error with partition p2, 2 loop devices are created: loop0p1 and loop0p2. These partitions can be mounted in the directory tree with the mount tool:
$ sudo mount /dev/mapper/loop0p1 /mnt/iotgoat/fat
$ ls -l /mnt/iotgoat/fat
total 9366
-rwxr-xr-x 1 root root 22493 mar 29 2020 bcm2709-rpi-2-b.dtb
-rwxr-xr-x 1 root root 23588 mar 29 2020 bcm2710-rpi-3-b.dtb
-rwxr-xr-x 1 root root 23707 mar 29 2020 bcm2710-rpi-3-b-plus.dtb
-rwxr-xr-x 1 root root 22342 mar 29 2020 bcm2710-rpi-cm3.dtb
-rwxr-xr-x 1 root root 52116 mar 29 2020 bootcode.bin
-rwxr-xr-x 1 root root 133 mar 29 2020 cmdline.txt
-rwxr-xr-x 1 root root 30725 mar 29 2020 config.txt
-rwxr-xr-x 1 root root 18693 mar 29 2020 COPYING.linux
-rwxr-xr-x 1 root root 2622 mar 29 2020 fixup_cd.dat
-rwxr-xr-x 1 root root 6695 mar 29 2020 fixup.dat
-rwxr-xr-x 1 root root 5817564 mar 29 2020 kernel.img
-rwxr-xr-x 1 root root 1494 mar 29 2020 LICENCE.broadcom
drwxr-xr-x 2 root root 10240 mar 29 2020 overlays
-rwxr-xr-x 1 root root 678372 mar 29 2020 start_cd.elf
-rwxr-xr-x 1 root root 2864164 mar 29 2020 start.elf
When trying to mount the second partition, an error like the following occurs:
$ sudo mount /dev/loop0p2 /mnt/iotgoat/squashfs
mount: /mnt/iotgoat/squashfs: /dev/loop0p2 ya está montado o el punto de montaje está ocupado.
dmesg(1) may have more information after failed mount system call.
$ sudo dmesg | grep -v audit | tail
[ 7453.070938] loop2: detected capacity change from 0 to 7707
[ 8259.648960] /dev/loop0p2: Can’t open blockdev
[ 8281.520899] /dev/loop0p2: Can’t open blockdev
[ 8304.153145] loop0: detected capacity change from 0 to 65051
[ 8304.171992] loop0: p1 p2
[ 8304.172392] loop0: p2 size 524288 extends beyond EOD, truncated
[ 8304.240350] device-mapper: table: 254:1: loop0 too small for target: start=57344, len=524288, dev_size=65051
[ 8304.240355] device-mapper: core: Cannot calculate initial queue limits
[ 8304.240357] device-mapper: ioctl: unable to set up device queue for new table.
[ 8316.660386] /dev/loop0p2: Can’t open blockdev
In this case, it has been detected that there is a problem with the size of the squashfs partition that prevents mounting it as a loop device. However, if this partition is extracted to an archive, as described in previous sections, the archive can be mounted with the squashfuse tool:
$ sudo squashfuse -d squashfs /mnt/iotgoat/squashfs
FUSE library version: 2.9.9
nullpath_ok: 0
nopath: 0
utime_omit_ok: 0
unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
INIT: 7.36
flags=0x73fffffb
max_readahead=0x00020000
INIT: 7.19
flags=0x00000011
max_readahead=0x00020000
max_write=0x00020000
max_background=0
congestion_threshold=0
unique: 2, success, outsize: 40
In another terminal, you can see the result::
$ sudo ls -l /mnt/iotgoat/squashfs
total 1
drwxr-xr-x 2 root root 0 ene 30 2019 bin
drwxr-xr-x 2 root root 0 ene 30 2019 dev
-rwxrwxrwx 1 root root 797 ene 30 2019 dnsmasq_setup.sh
drwxr-xr-x 18 root root 0 ene 30 2019 etc
drwxr-xr-x 11 root root 0 ene 30 2019 lib
drwxr-xr-x 2 root root 0 ene 30 2019 mnt
drwxr-xr-x 2 root root 0 ene 30 2019 overlay
drwxr-xr-x 2 root root 0 ene 30 2019 proc
drwxr-xr-x 2 root root 0 ene 30 2019 rom
drwxr-xr-x 2 root root 0 ene 30 2019 root
drwxr-xr-x 2 root root 0 ene 30 2019 sbin
drwxr-xr-x 2 root root 0 ene 30 2019 sys
drwxrwxrwt 2 root root 0 ene 30 2019 tmp
drwxr-xr-x 7 root root 0 ene 30 2019 usr
lrwxrwxrwx 1 root root 3 ene 30 2019 var -> tmp
drwxr-xr-x 4 root root 0 ene 30 2019 www
This filesystem mount can also be performed for other formats by creating a loop device, either with kpartx or other tools such as losetup or directly, mount, and mounting the result at a point in the directory tree.
There are also certain cases where the manufacturer modifies the signatures and format of a file system to adapt it to their devices or to obfuscate it to make analysis more difficult. In these cases, automatic tools such as binwalk will probably not be able to obtain consistent results and a manual analysis of the file will be necessary.
The data obtained about the manufacturer during the previous phases can be of great help, as well as the analysis of the code that may have been found in the firmware. In some cases, there are forums specialized in a type of IoT devices where you can find information discovered by other researchers and even extraction tools, although it is not common.
After the work of analyzing and extracting the filesystem hosted in the firmware, it is possible to move on to the phase of analyzing its contents, where the operation and internal characteristics will be analyzed from a static point of view.
Conclusions
As we have seen, analyzing and extracting the filesystem is a fundamental phase in the analysis of the firmware of a device. One of the steps that can be carried out when conducting an IoT security audit.
There are different formats that can contain a file system in a firmware image. The most popular are squashfs and cramfs systems, but it is also common to find jffs2, ubifs, rom, cpio or compressed files. It is also possible to find, in some cases, file system images directly embedded in the firmware.
To analyze and extracting the filesystem, automatic tools such as binwalk are very useful, but it should be noted that they often fail and the results must be checked manually, with other tools such as file, strings, hexedit, dd and fdisk.
In cases where the firmware contains encrypted sections, it will be necessary to further investigate the manufacturer and the sections in clear or wait for the dynamic and runtime analysis phases. The results of this stage of the process will be of great help for the subsequent analysis, so it is always worthwhile to extract as much information as possible.
References
- https://github.com/scriptingxss/owasp-fstm
- https://www.kali.org/tools/firmware-mod-kit/
- https://github.com/OWASP/IoTGoat/
- https://www.pentestpartners.com/security-blog/how-to-do-firmware-analysis-tools-tips-and-tricks/
- https://blog.k3170makan.com/2018/06/reverse-engineering-primer-unpacking.html
This article is part of a series of articles about OWASP
- OWASP methodology, the beacon illuminating cyber risks
- OWASP: Top 10 Web Application Vulnerabilities
- IoT and embedded devices security analysis following OWASP
- OWASP FSTM, stage 1: Information gathering and reconnaissance
- OWASP FSTM, stage 2: Obtaining IOT device firmware
- OWASP FSTM, stage 3: Analyzing firmware
- OWASP FSTM, stage 4: Extracting the filesystem
- OWASP FSTM, stage 5: Analyzing filesystem contents
- OWASP FSTM step 6: firmware emulation
- OWASP FSTM, step 7: Dynamic analysis
- OWASP FSTM, step 8: Runtime analysis
- OWASP FSTM, Stage 9: Exploitation of executables
- IoT Security assessment
- OWASP API Security Top 10
- OWASP SAMM: Assessing and Improving Enterprise Software Security
- OWASP: Top 10 Mobile Application Risks