DriveFS Sleuth — Revealing The Hidden Intelligence
In the DriveFS Sleuth — Your Ultimate Google Drive File Stream Investigator! Medium article, we delved into the foundational research behind the DriveFS Sleuth tool. However, there remains a set of ongoing research objectives aimed at enhancing our comprehension of the disk forensic artifacts of the Google Drive File Stream application. One key area of focus is the exploration of binary blob data stored in its databases. In this narrative, we will undertake a detailed examination of these blobs to ascertain the nature of the data they encapsulate and explore potential ways to leverage this information in our investigation.
What is a protobuf and why is it important?
According to the official Google documentation, Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data — think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use specially generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.
Protocol buffers are a combination of the definition language (created in .proto
files), the code that the proto-compiler generates to interface with data, language-specific runtime libraries, and the serialization format for data written to a file (or sent across a network connection).
The Google Drive File Stream Application utilizes protocol buffers to store essential information about synchronized items where that information might not be available in plain text in the application’s databases. Through reverse engineering these binary blobs, it becomes possible to uncover critical details about the items.
Reversing Challenge
Deserializing a Protocol Buffers message without the corresponding .proto
file presents several challenges due to the self-describing nature of protobuf. As we described earlier, unlike some other serialization formats, protobuf messages do not inherently carry information about their structure. Here are some of the challenges associated with deserializing a protobuf without the .proto
file:
- Field Identification:
Protobuf relies on numeric field identifiers rather than field names. Without the.proto
file, it becomes difficult to associate these numeric identifiers with their corresponding fields. This can result in misinterpretation of the data and lead to incorrect deserialization. - Data Types:
Protobuf messages encode data types using a type-specific encoding. Without the.proto
file, it may be challenging to determine the correct data types for each field. This can lead to errors in deserialization, as the wrong data type may be assumed. - Message Structure:
Protobuf messages are hierarchical, and the structure is defined in the.proto
file. Without this file, reconstructing the hierarchical structure becomes challenging. This includes understanding nested messages and repeated fields. - Enum Values:
Enumerations in protobuf are defined in the.proto
file, and their numeric values may not be known without it. To map them back to their corresponding symbolic names, deserializing enums correctly requires knowledge of these numeric values. - Unknown Fields:
Protobuf allows unknown fields to be included in messages, which can be safely ignored during deserialization. Without the.proto
file, it may be difficult to differentiate between unknown fields and potential errors in deserialization. - Default Values:
Protobuf messages can have default values for fields. Without the.proto
file, determining these default values becomes challenging, potentially leading to incorrect assumptions about the data. - Custom Options and Extensions:
Protobuf supports custom options and extensions, which are also defined in the.proto
file. Without this information, it’s challenging to handle custom extensions and options correctly during deserialization.
Revealing The hidden information
Fortunately, the protodeep tool is accessible on GitHub at https://github.com/mxrch/ProtoDeep, and it is also available as a library on pypi.org, enabling smooth integration with DriveFS Sleuth. The tool’s authors have adeptly reverse-engineered Google Protobufs, even in the absence of the corresponding .proto
file. However, as expected, the field names are not recoverable without the .proto
file. This is where our research efforts were instrumental. We leveraged the knowledge we recovered about the synced items and compared them with the parsing output of ProtoDeep to deduce field names whenever feasible. Some values were inherently vague, requiring additional research for determination. The revealed information works as an enrichment to the items' metadata, the following sections describe the revealed information as of now.
MD5 Hashes
The proto
column of items
table in the metadata_sqlite_db
database, contains a blob of protobuf.
By dissecting this binary data blob we can reveal important information about the synced item, especially its MD5 hash which exists in index 48 of the protobuf.
The hash value of the synced items plays a significant role in augmenting additional intelligence to synchronized files, particularly in scenarios involving the misuse of Google Drive for malware distribution. DriveFS Sleuth now parses this information to enrich the items’ information.
Account Information
By parsing the value of the driveway_account
property in the properties
table of the metadata_sqlite_db
, DriveFS Sleuth has enriched the account information with information like the display name and the account photo at the time of the tirage.
The same information can be found at the account
property in the same table but with different indices, DriveFS Sleuth now depends on both driveway_account
and account
property rather than the driveway_account
property to get the display name and the account photo for backward compatibility.
The enriched information has been integrated into the HTML report generated by DriveFS Sleuth.
Indices 11 and 13 in the items proto contain the modified_date and the viewed_by_me_date respectively. There are other timestamps as well however, we still need to dig a little doop to understand them.
Deleted Items info
The deleted_items
table of the metadata_sqlite_db
contains a protobuf that contains all the relevant information about the deleted item.
DriveFS Sleuth parses out the deleted items’ protobuf and extracts the following information:
url_id
index:1
parent_url_id
: index:2
local_title
index:3
mime_type
index:4
trashed
index:7
modified_date
index:11
viewed_by_me_date
index:13
file_size
index:14
- file extension index:
45
MD5
index:48
- folder feature index:
50
- item properties index:
55[-\d]*
trashed-locally
index55–1
is the key,55–2
is the value,1
indicates deleted from the same machine,0
indicates otherwisetrashed-locally-metadata
index55–1
is the key,55–5
is the value, it contains the path from which the item has been locally deleted.trashed-locally-name
index55–1
is the key,55–4
is the value, it stores the name of the file in the Recycle Bin.is_owner
index:63
stable_id
index:88
parent_staable_id
index:89
DriveFS Sleuth
DriveFS Sleuth is a Python tool that leverages the knowledge we acquired from the conducted research to automate investigating Google Drive File Stream disk artifacts.
Your contribution is highly appreciated. I’m eager to hear your thoughts! Share your feedback and suggestions, or report issues on our GitHub repository. Your input is crucial in making DriveFS Sleuth even more robust. Consider starring the repo if you found it useful. 😉
A special shout-out to Ann Bransom for her outstanding contribution to the protobuf research. Ann, your efforts are truly appreciated! Thank you!👏