June 15, 2011

Sleuth Kit & Open Source Forensics Conference

Yesterday I spoke on the analysis of Web browser artifacts at the Sleuth Kit and Open Source Forensics Conference. By my estimation attendance was up about 50% over the previous year, which is fantastic to see. I won't review my talk except to note that is was awesome, but I'd like to share my notes on the other talks that I found particularly interesting or compelling.

Brian Carrier - Sleuth Kit and Autopsy 3 Updates

Starting the day, Brian Carrier gave an overview of the current and future states of the Sleuth Kit framework. The Sleuth Kit is the backbone of the open source forensic examiner's toolkit so any changes are quite interesting. Brian is moving towards making the Sleuth Kit even more accessible to developers looking to automate and extend functionality, and is finally killing the awful Autopsy front-end.

The future for Sleuth Kit is to move more towards a plug-in architecture, so rather than extracting out the Windows registry and parsing it with RegRipper externally, you should be able to have a RegRipper module that will populate results back into the tool. This is similar to the Virtual File System found in PyFLAG and ArxSys Digital Forensics Framework. There is a lot to be said for this model as it enables you to perform broad search and analysis across disparate data sources while maintaining source context and relevance.

Autopsy 3.0 will be a Java/Netbeans GUI that is currently Windows only, which may somehow worse than ugly framed HTML. ;) As long as I can avoid this and hit the backend directly I'll be okay. The first beta is planned for a July release so we'll get to see it then.

The final project Brian talked about was a Hadoop framework for leveraging cloud resources for media intake & analysis. This is still pretty raw but there are a lot of forensic tasks that can be solved or sped up with map-reduce. This should be available in some form later this summer, and I'm really looking forward to it.

Jon Stewart - Scripting with The Sleuth Kit

The Daily Show funnyman Jon Stewart took some time off from his busy schedule to come down to NoVA and talk about ways to script against the Sleuth Kit intelligently. He started off by apologizing for 6.5 years of EnScript (APOLOGY NOT ACCEPTED). Jon then showed some fairly simple C++ code that implements the new-ish TskAuto functionality in the Sleuth Kit to walk the file system and produce JSON objects per files. He also showed a small Python tool called 'fsrip' that he created using the Sleuth Kit to produce line-oriented JSON, which I am looking forward to experimenting with. Jon had a lot of good advice for aspiring scripters so if you are interested in developing forensic utilities I would recommend viewing his talk when it becomes available.

Harlan Carvey - Extending RegRipper

Harlan started out by introducing RegRipper for the folks in the audience that haven't used it (who I assume are purely theoretical). After that, he described a work-in-progress tool he's tentatively calling "Forensic Scanner" which has the goal of extending RegRipper's ideas to include more than just the Registry, to include file system indicators, Event Log entries, Scheduled Tasks, Prefetch files, and more. Forensic Scanner runs against a mounted file system and generates reports based on plugins in a similar manner to RegRipper. This sounds like it would provide a lot of the functionality that many examiners lean on EnScripts in EnCase for. It looks like it'll be a good way to formalize process and avoid missing items, and a great way to share knowledge and discoveries across distributed teams.

Simson Garfinkel - bulk_extractor: A Stream-Based Forensics Tool

Simson's presentations are always brain melting for mortals and this one was no different. He discussed the speed benefits of streaming the disk, front to back and processing from blocks-up rather than seeking randomly from a files-down perspective. Bulk_extractor is the tool developed based on this idea and is able to extract valuable evidence in "real time." It operates using "Named Entity Recognition" via highly parallel regular expression scanning. Bulk_extractor processes images, disks, or files, and extracts "features" into discrete text files.

One of the interesting design features is that "pages" read in by the tool overlap to avoid boundary false-negative problems commonly found in many carving utilities and other forensic tools. Another cool feature is that some scanners are recursive - this is especially useful in the case of scanners that deal with compressed data. With this architecture, the content of compressed files is available for subsequent processing by other text-focused scanners.

This sort of processing should work very well for a lot of investigation types, primarily those centered around hunts for specific types of data. This would encompass the bulk of law enforcement (and intelligence) examinations (ie, "what's here, what's important"), but I don't think it will be of much use in most intrusion examinations (ie, "what happened"). There is a RAR scanner in the works which I can see being useful for exfiltration analysis, though, and inline decompression of compressed blocks will be very helpful. Either way, it is very interesting work and will be very helpful for many members of the community.

Joshua James - Rapid Evidence Acquisition Project for Event Reconstruction (REAPER)

Joshua's presentation focused on the development of a system (REAPER) designed to provide usable forensic analysis capability to examiners in developing countries that may not have a lot of training or existing forensic knowledge. They aim to provide this through extreme automation - automated acquisition, processing, analysis, documentation, case management, with no no user interaction. This is the first project I've heard of that is actively using OCFA, which is interesting. I looked into the OCFA project several years ago but it required a lot of setup & a completely different workflow to utilize, and I never found the time to fully commit. This talk was heavily geared toward the sorts of examinations performed by law enforcement, so I didn't get a ton out of it but it seems to be useful research given the ever-growing backlog forming at most departments.

Vassil Roussev = The Gorilla Approach to Scaling & Integrating Open Source Forensic Tools: Learning From The Web

Sadly I missed Elizabeth Schweinberg's talk but she was scheduled up against Vassil Roussev, who talked about applying web technologies and advancements in scale to forensic analysis. You can understand why this is a topic of interest to me. He opened with an overview of processing challenges and scale issues. He mentioned some vendor chest-beating from Access Data, who boast distributed processing prowess capable of shredding through 1.28 TB in only 6 days, 5 hours! Holy smokes!

Vassil had many interesting points but one that struck me was his assertion that 80% of forensic analysis work is not forensic specific. As an example, text search is not a forensic problem - it is an information retrieval problem. Current forensic tools try to treat all problems as forensic-specific and don't import knowledge and useful solutions from these other domains. This is a mistake, and must be rectified to deal with the increased scale requirements. Big data problems are being solved by big data companies - Google, Amazon, Facebook, etc. These lessons can easily apply directly to the bulk of problems we are trying to solve in digital forensics.

Marcelo Silva - ForeIndex: A Framework for Analysis and Triage of Data Forensics

Marcelo discussed ForeIndex, a forensic distributed indexing framework developed as a partnership between the University of Brasilia & the Brazilian Federal Police. In a single criminal case, they had 250 computers to process, so you can understand the need for distributed processing above and beyond "putting the database on a different machine." They begin by extracting files from collected images by scripting the Sleuth Kit. These files are subsequently indexed in a distributed fashion via a Hadoop-based MapReduce, Lucene, and Tika. I had to duck out of this talk early to prep for my talk but he seemed to be describing a pretty standard Hadoop setup, which is still compelling when applied to forensics. I'm glad to see the open source community eclipsing the proprietary forensics world when it comes to pushing capabilities forward!

February 1, 2011

Wired article on Romanian cybercrime haven

Wired posted an interesting article on the centralized nature of Romanian cybercrime. I've done some work in this area [investigations side, not operations ;)] and the bulk of it rings pretty true to what I've seen. A particularly interesting segment:

Online thievery as a ticket to the good life spread from the early pioneers to scores of young men, infecting Râmnicu Vâlcea’s social fabric. The con artists were the ones with the nice cars and fancy clothes—the local kids made good. And just as in Silicon Valley, the clustering of operations in one place made it that much easier for more to get started. “There’s a high concentration of people offering the kinds of services you need to build a criminal scheme,” says Gary Dickson, an FBI agent who worked in Bucharest from 2005 to 2010. “If your specialty is auction frauds, you can find a money pick-up guy. If you’re a money pick-up guy, you can find a buyer for your services.”

January 19, 2011

Mac Memory Reader

The fine folks at ATC-NY have released Mac Memory Reader, a free tool for dumping memory from a running 32 or 64-bit OS X 10.4+ system. There aren't currently any free tools for analyzing the resulting output, but some of the structures were documented by Matthieu Suiche in this Blackhat DC 2010 Paper [PDF].

My book is more or less "done"

I've been quiet for a while, mostly because I've been spending the bulk of my "free" time working on a new book with Harlan Carvey: Digital Forensics With Open Source Tools (or, DFWOST for short). It is currently due to be released May 15, 2011.

In the book, we discuss operational aspects of using open source tools to perform an end-to-end forensic investigation, starting from basic file system analysis using the various tools of The Sleuth Kit, to analysis of artifacts of interest found within complex carrier files like ZIP archives and Microsoft Office Documents, to the installation and use of modern forensic apps like the Digital Forensics Framework. We approach this from a purely operational perspective. Each chapter should be full of things that you can implement and use right away. Taken together, you should hopefully be able to perform a complete investigation with open source tools.

I'll be using this site to continue discussing the topics I brought up in the book, and to discuss further additional topics that I wasn't able to get to. I'm also happy to field any questions anyone has about the book here as well. To that end, I've created a Google Group for discussion of the book or any topics related to the book. I hope the book is a useful resource for the forensics community.

September 23, 2010

Sleuth Kit 3.2 Beta Available

Brian Carrier just released a beta of The Sleuth Kit (TSK), which includes "new automation framework and new tools that can recover deleted files into the original directory structure, compare a directory to a disk image, and load image information into a sqlite database."

June 29, 2010

Post Humorous Relaunch

Hey kids -

After some downtime, Post Humorous is relaunching.  I'm back at Google so I'm taking the opportunity to move over to blogger/google apps/appengine/etc for my web presence.  Keep your eyes peeled for more.