DriveFS Sleuth — Revealing The Hidden Intelligence

Amged Wageh
6 min readJan 1, 2024

In the DriveFS Sleuth — Your Ultimate Google Drive File Stream Investigator! Medium article, we delved into the foundational research behind the DriveFS Sleuth tool. However, there remains a set of ongoing research objectives aimed at enhancing our comprehension of the disk forensic artifacts of the Google Drive File Stream application. One key area of focus is the exploration of binary blob data stored in its databases. In this narrative, we will undertake a detailed examination of these blobs to ascertain the nature of the data they encapsulate and explore potential ways to leverage this information in our investigation.

What is a protobuf and why is it important?

According to the official Google documentation, Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data — think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use specially generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

ref: https://protobuf.dev/overview/

Protocol buffers are a combination of the definition language (created in .proto files), the code that the proto-compiler generates to interface with data, language-specific runtime libraries, and the serialization format for data written to a file (or sent across a network connection).

The Google Drive File Stream Application utilizes protocol buffers to store essential information about synchronized items where that information might not be available in plain text in the application’s databases. Through reverse engineering these binary blobs, it becomes possible to uncover critical details about the items.

Reversing Challenge

Deserializing a Protocol Buffers message without the corresponding .proto file presents several challenges due to the self-describing nature of protobuf. As we described earlier, unlike some other serialization formats, protobuf messages do not inherently carry information about their structure. Here are some of the challenges associated with deserializing a protobuf without the .proto file:

  1. Field Identification:
    Protobuf relies on numeric field identifiers rather than field names. Without the .proto file, it becomes difficult to associate these numeric identifiers with their corresponding fields. This can result in misinterpretation of the data and lead to incorrect deserialization.
  2. Data Types:
    Protobuf messages encode data types using a type-specific encoding. Without the .proto file, it may be challenging to determine the correct data types for each field. This can lead to errors in deserialization, as the wrong data type may be assumed.
  3. Message Structure:
    Protobuf messages are hierarchical, and the structure is defined in the .proto file. Without this file, reconstructing the hierarchical structure becomes challenging. This includes understanding nested messages and repeated fields.
  4. Enum Values:
    Enumerations in protobuf are defined in the .proto file, and their numeric values may not be known without it. To map them back to their corresponding symbolic names, deserializing enums correctly requires knowledge of these numeric values.
  5. Unknown Fields:
    Protobuf allows unknown fields to be included in messages, which can be safely ignored during deserialization. Without the .proto file, it may be difficult to differentiate between unknown fields and potential errors in deserialization.
  6. Default Values:
    Protobuf messages can have default values for fields. Without the .proto file, determining these default values becomes challenging, potentially leading to incorrect assumptions about the data.
  7. Custom Options and Extensions:
    Protobuf supports custom options and extensions, which are also defined in the .proto file. Without this information, it’s challenging to handle custom extensions and options correctly during deserialization.

Revealing The hidden information

Fortunately, the protodeep tool is accessible on GitHub at https://github.com/mxrch/ProtoDeep, and it is also available as a library on pypi.org, enabling smooth integration with DriveFS Sleuth. The tool’s authors have adeptly reverse-engineered Google Protobufs, even in the absence of the corresponding .proto file. However, as expected, the field names are not recoverable without the .proto file. This is where our research efforts were instrumental. We leveraged the knowledge we recovered about the synced items and compared them with the parsing output of ProtoDeep to deduce field names whenever feasible. Some values were inherently vague, requiring additional research for determination. The revealed information works as an enrichment to the items' metadata, the following sections describe the revealed information as of now.

MD5 Hashes

The proto column of items table in the metadata_sqlite_db database, contains a blob of protobuf.

By dissecting this binary data blob we can reveal important information about the synced item, especially its MD5 hash which exists in index 48 of the protobuf.

The hash value of the synced items plays a significant role in augmenting additional intelligence to synchronized files, particularly in scenarios involving the misuse of Google Drive for malware distribution. DriveFS Sleuth now parses this information to enrich the items’ information.

Account Information

By parsing the value of the driveway_account property in the properties table of the metadata_sqlite_db, DriveFS Sleuth has enriched the account information with information like the display name and the account photo at the time of the tirage.

The same information can be found at the account property in the same table but with different indices, DriveFS Sleuth now depends on both driveway_account and account property rather than the driveway_account property to get the display name and the account photo for backward compatibility.

The enriched information has been integrated into the HTML report generated by DriveFS Sleuth.

Indices 11 and 13 in the items proto contain the modified_date and the viewed_by_me_date respectively. There are other timestamps as well however, we still need to dig a little doop to understand them.

Deleted Items info

The deleted_items table of the metadata_sqlite_db contains a protobuf that contains all the relevant information about the deleted item.

DriveFS Sleuth parses out the deleted items’ protobuf and extracts the following information:

  • url_id index: 1
  • parent_url_id: index: 2
  • local_title index: 3
  • mime_type index: 4
  • trashed index: 7
  • modified_date index: 11
  • viewed_by_me_date index: 13
  • file_size index: 14
  • file extension index: 45
  • MD5 index: 48
  • folder feature index: 50
  • item properties index: 55[-\d]*
  • trashed-locally index 55–1 is the key, 55–2 is the value, 1 indicates deleted from the same machine, 0 indicates otherwise
  • trashed-locally-metadata index 55–1 is the key, 55–5 is the value, it contains the path from which the item has been locally deleted.
  • trashed-locally-name index 55–1 is the key, 55–4 is the value, it stores the name of the file in the Recycle Bin.
  • is_owner index: 63
  • stable_id index: 88
  • parent_staable_id index: 89

DriveFS Sleuth

DriveFS Sleuth is a Python tool that leverages the knowledge we acquired from the conducted research to automate investigating Google Drive File Stream disk artifacts.

Your contribution is highly appreciated. I’m eager to hear your thoughts! Share your feedback and suggestions, or report issues on our GitHub repository. Your input is crucial in making DriveFS Sleuth even more robust. Consider starring the repo if you found it useful. 😉

A special shout-out to Ann Bransom for her outstanding contribution to the protobuf research. Ann, your efforts are truly appreciated! Thank you!👏

--

--

Amged Wageh

A Sr. DFIR Consultant who has a proven track record of successfully leading investigations into cyber security incidents.