June 15, 2011

Sleuth Kit & Open Source Forensics Conference

Yesterday I spoke on the analysis of Web browser artifacts at the Sleuth Kit and Open Source Forensics Conference. By my estimation attendance was up about 50% over the previous year, which is fantastic to see. I won't review my talk except to note that is was awesome, but I'd like to share my notes on the other talks that I found particularly interesting or compelling.

Brian Carrier - Sleuth Kit and Autopsy 3 Updates

Starting the day, Brian Carrier gave an overview of the current and future states of the Sleuth Kit framework. The Sleuth Kit is the backbone of the open source forensic examiner's toolkit so any changes are quite interesting. Brian is moving towards making the Sleuth Kit even more accessible to developers looking to automate and extend functionality, and is finally killing the awful Autopsy front-end.

The future for Sleuth Kit is to move more towards a plug-in architecture, so rather than extracting out the Windows registry and parsing it with RegRipper externally, you should be able to have a RegRipper module that will populate results back into the tool. This is similar to the Virtual File System found in PyFLAG and ArxSys Digital Forensics Framework. There is a lot to be said for this model as it enables you to perform broad search and analysis across disparate data sources while maintaining source context and relevance.

Autopsy 3.0 will be a Java/Netbeans GUI that is currently Windows only, which may somehow worse than ugly framed HTML. ;) As long as I can avoid this and hit the backend directly I'll be okay. The first beta is planned for a July release so we'll get to see it then.

The final project Brian talked about was a Hadoop framework for leveraging cloud resources for media intake & analysis. This is still pretty raw but there are a lot of forensic tasks that can be solved or sped up with map-reduce. This should be available in some form later this summer, and I'm really looking forward to it.

Jon Stewart - Scripting with The Sleuth Kit

The Daily Show funnyman Jon Stewart took some time off from his busy schedule to come down to NoVA and talk about ways to script against the Sleuth Kit intelligently. He started off by apologizing for 6.5 years of EnScript (APOLOGY NOT ACCEPTED). Jon then showed some fairly simple C++ code that implements the new-ish TskAuto functionality in the Sleuth Kit to walk the file system and produce JSON objects per files. He also showed a small Python tool called 'fsrip' that he created using the Sleuth Kit to produce line-oriented JSON, which I am looking forward to experimenting with. Jon had a lot of good advice for aspiring scripters so if you are interested in developing forensic utilities I would recommend viewing his talk when it becomes available.

Harlan Carvey - Extending RegRipper

Harlan started out by introducing RegRipper for the folks in the audience that haven't used it (who I assume are purely theoretical). After that, he described a work-in-progress tool he's tentatively calling "Forensic Scanner" which has the goal of extending RegRipper's ideas to include more than just the Registry, to include file system indicators, Event Log entries, Scheduled Tasks, Prefetch files, and more. Forensic Scanner runs against a mounted file system and generates reports based on plugins in a similar manner to RegRipper. This sounds like it would provide a lot of the functionality that many examiners lean on EnScripts in EnCase for. It looks like it'll be a good way to formalize process and avoid missing items, and a great way to share knowledge and discoveries across distributed teams.

Simson Garfinkel - bulk_extractor: A Stream-Based Forensics Tool

Simson's presentations are always brain melting for mortals and this one was no different. He discussed the speed benefits of streaming the disk, front to back and processing from blocks-up rather than seeking randomly from a files-down perspective. Bulk_extractor is the tool developed based on this idea and is able to extract valuable evidence in "real time." It operates using "Named Entity Recognition" via highly parallel regular expression scanning. Bulk_extractor processes images, disks, or files, and extracts "features" into discrete text files.

One of the interesting design features is that "pages" read in by the tool overlap to avoid boundary false-negative problems commonly found in many carving utilities and other forensic tools. Another cool feature is that some scanners are recursive - this is especially useful in the case of scanners that deal with compressed data. With this architecture, the content of compressed files is available for subsequent processing by other text-focused scanners.

This sort of processing should work very well for a lot of investigation types, primarily those centered around hunts for specific types of data. This would encompass the bulk of law enforcement (and intelligence) examinations (ie, "what's here, what's important"), but I don't think it will be of much use in most intrusion examinations (ie, "what happened"). There is a RAR scanner in the works which I can see being useful for exfiltration analysis, though, and inline decompression of compressed blocks will be very helpful. Either way, it is very interesting work and will be very helpful for many members of the community.

Joshua James - Rapid Evidence Acquisition Project for Event Reconstruction (REAPER)

Joshua's presentation focused on the development of a system (REAPER) designed to provide usable forensic analysis capability to examiners in developing countries that may not have a lot of training or existing forensic knowledge. They aim to provide this through extreme automation - automated acquisition, processing, analysis, documentation, case management, with no no user interaction. This is the first project I've heard of that is actively using OCFA, which is interesting. I looked into the OCFA project several years ago but it required a lot of setup & a completely different workflow to utilize, and I never found the time to fully commit. This talk was heavily geared toward the sorts of examinations performed by law enforcement, so I didn't get a ton out of it but it seems to be useful research given the ever-growing backlog forming at most departments.

Vassil Roussev = The Gorilla Approach to Scaling & Integrating Open Source Forensic Tools: Learning From The Web

Sadly I missed Elizabeth Schweinberg's talk but she was scheduled up against Vassil Roussev, who talked about applying web technologies and advancements in scale to forensic analysis. You can understand why this is a topic of interest to me. He opened with an overview of processing challenges and scale issues. He mentioned some vendor chest-beating from Access Data, who boast distributed processing prowess capable of shredding through 1.28 TB in only 6 days, 5 hours! Holy smokes!

Vassil had many interesting points but one that struck me was his assertion that 80% of forensic analysis work is not forensic specific. As an example, text search is not a forensic problem - it is an information retrieval problem. Current forensic tools try to treat all problems as forensic-specific and don't import knowledge and useful solutions from these other domains. This is a mistake, and must be rectified to deal with the increased scale requirements. Big data problems are being solved by big data companies - Google, Amazon, Facebook, etc. These lessons can easily apply directly to the bulk of problems we are trying to solve in digital forensics.

Marcelo Silva - ForeIndex: A Framework for Analysis and Triage of Data Forensics

Marcelo discussed ForeIndex, a forensic distributed indexing framework developed as a partnership between the University of Brasilia & the Brazilian Federal Police. In a single criminal case, they had 250 computers to process, so you can understand the need for distributed processing above and beyond "putting the database on a different machine." They begin by extracting files from collected images by scripting the Sleuth Kit. These files are subsequently indexed in a distributed fashion via a Hadoop-based MapReduce, Lucene, and Tika. I had to duck out of this talk early to prep for my talk but he seemed to be describing a pretty standard Hadoop setup, which is still compelling when applied to forensics. I'm glad to see the open source community eclipsing the proprietary forensics world when it comes to pushing capabilities forward!