Global Standardization of Forensics will Decrease the Bias Factor of Evidence Collection Procedures and Court Rulings

Interviews – 2018

Angus Marshall, Digital Forensic Scientist

via Angus Marshall
Angus, tell us a bit about yourself. What is your role, and how long have you been working in digital forensics?

Where to begin? I have a lot of different roles these days, but by day I’m a Lecturer in Cybersecurity – currently at the University of York, and also run my own digital forensic consultancy business. I drifted into the forensic world almost by accident back in 2001 when a server I managed was hacked. I presented a paper on the investigation of that incident at a forensic science conference and a few weeks later found myself asked to help investigate a missing person case that turned out to be a murder. There’s been a steady stream of casework ever since.

I’m registered as an expert adviser and most of my recent casework seems to deal with difficult to explain or analyse material. Alongside that, I’ve spent a lot of time (some might say too much) working on standards during my time on the Forensic Science Regulator’s working group on digital evidence and as a member of BSI’s IST/033 information security group and the UK’s digital evidence rep. on ISO/IEC JTC1 SC27 WG4, where I led the work to develop ISO/IEC 27041 and 27042, and contributed to the other investigative and eDiscovery standards.

You’ve recently published some research into verification and validation in digital forensics. What was the goal of the study?

It grew out of a proposition in ISO/IEC 27041 that tool verification (i.e. evidence that a tool conforms to its specification) can be used to support method validation (i.e. showing that a particular method can be made to work in a lab). The idea of the 27041 proposal is that if tool vendors can provide evidence from their own development processes and testing, the tool users shouldn’t need to repeat that. We wanted to explore the reality of that by looking at accredited lab processes and real tools. In practice, we found that it currently won’t work because the requirement definitions for the methods don’t seem to exist and the tool vendors either can’t or won’t disclose data about their internal quality assurance.

The effect of it is that it looks like there may be a gap in the accreditation process. Rather than having a selection of methods that are known to work correctly (as we see in calibration houses, metallurgical and chemical labs etc. – where the ISO 17025 standard originated) which can be chosen to meet a specific customer requirement, we have methods which satisfy much fuzzier customer requirements which are almost always non-technical in nature because the customers are CJS practitioners who simply don’t express things in a technical way.

We’re not saying that anyone is necessarily doing anything wrong, by the way, just that we think they’ll struggle to provide evidence that they’re doing the right things in the right way.

Where do we stand with standardisation in the UK at the moment?

Standardization is a tricky word. It can mean that we all do things the same way, but I think you’re asking about progress towards compliance with the regulations. In that respect, it looks like we’re on the way. It’s slower than the regulator would like. However, our research at York suggests that even the accreditations awarded so far may not be quite as good as they could be. They probably satisfy the letter of the regulator’s documents, but not the spirit of the underlying standard. The technical correctness evidence is missing.

ISO 17025 has faced a lot of controversy since it has been rolled out as the standard for digital forensics in the UK. Could you briefly outline the main reasons why?

Most of the controversy is around cost and complexity. With accreditation costing upwards of £10k for even a small lab, it makes big holes in budgets. For the private sector, where turnover for a small lab can be under £100k per annum, that’s a huge issue. The cost has to be passed on. Then there’s the time and disruption involved in producing the necessary documents, and then maintaining them and providing evidence that they’re being followed for each and every examination.

A lot of that criticism is justified, but adoption of any standard also creates an opportunity to take a step back and review what’s going on in the lab. It’s a chance to find a better way to do things and improve confidence in what you’re doing.

In your opinion, what is the biggest stumbling block either for ISO 17025 specifically, or for standardizing digital forensics in general?

Two things – as our research suggests, the lack of requirements makes the whole verification and validation process harder, and there’s the confusion about exactly what validation means. In ISO terms, it’s proof that you can make a process work for you and your customers. People still seem to think it’s about proving that tools are correct. Even a broken tool can be used in a valid process, if the process accounts for the errors the tool makes.

I guess I’ve had the benefit of seeing how standards are produced and learning how to use the ISO online browsing platform to find the definitions that apply. Standards writers are a lot like Humpty Dumpty. When we use a word it means exactly what we choose it to mean. Is there a way to properly standardise tools and methods in digital forensics?

It’s not just a UK problem – it’s global. There’s an opportunity for the industry to review the situation, now, and create its own set of standard requirements for methods. If these are used correctly, we can tell the tool makers what we need from them and enable proper objective testing to show that the tools are doing what we need them to. They’ll also allow us to devise proper tests for methods to show that they really are valid, and to learn where the boundaries of those methods are.

Your study also looked at some existing projects in the area: can you tell us about some of these? Do any of them present a potential solution?

NIST and SWGDE both have projects in this space, but specifically looking at tool testing. The guidance and methods look sound, but they have some limitations. Firstly, because they’re only testing tools, they don’t address some of the wider non-technical requirements that we need to satisfy in methods (things like legal considerations, specific local operational constraints etc.).

Secondly, the NIST project in particular lacks a bit of transparency about how they’re establishing requirements and choosing which functions to test. If the industry worked together we could provide some more guidance to help them deal with the most common or highest priority functions.

Both projects, however, could serve as a good foundation for further work and I’d love to see them participating in a community project around requirements definition, test development and sharing of validation information.

Is there anything else you’d like to share about the results?

We need to get away from thinking solely in terms of customer requirements and method scope. These concepts work in other disciplines because there’s a solid base of fundamental science behind the methods. Digital forensics relies on reverse-engineering and trying to understand the mind of a developer in order to work out how extract and interpret data. That means we have a potentially higher burden of proof for any method we develop. We also need to remember that we deal with a rate of change caused by human ingenuity and marketing, instead of evolution.

Things move pretty fast in DF, if we don’t stop and look at what we’re doing once in a while, we’ll miss something important.

Read Angus Marshall’s paper on requirements in digital forensics method definition here. Angus Marshall

The hottest topic in digital forensics at the moment, standardisation is on the tip of everyone’s tongues. Following various think pieces on the subject and a plethora of meetings at conferences, I spoke to Angus Marshall about his latest paper and what he thinks the future holds for this area of the industry. You can […]

via Angus Marshall talks about standardisation — scar

Advertisements

Forensic science — FBI Bullet-Lead Technique Dead Wrong — Intel Today

 

Forensic science — FBI Bullet-Lead Technique Dead Wrong

“For over thirty years, FBI experts testified about comparative bullet lead analysis (CBLA), a technique that was first used in the investigation into President Kennedy’s assassination. CBLA compares trace chemicals found in bullets at crime scenes with ammunition found in the possession of a suspect. (…) Although the FBI eventually ceased using CBLA, the Bureau’s conduct in first employing the technique and then defending it after it was challenged provides an insight into how forensic science sometimes works.”

Paul C. Giannelli

“We cannot afford to be misleading to a jury. We plan to discourage prosecutors from using our previous results in future prosecutions.”

Letter from Dwight E. Adams — then FBI lab Director — to FBI Director Robert S. Mueller III

Since the 1960s, testimony by representatives of the Federal Bureau of Investigation in thousands of criminal cases has relied on evidence from Compositional Analysis of Bullet Lead (CABL), a forensic technique that compares the elemental composition of bullets found at a crime scene to the elemental composition of bullets found in a suspect’s possession. Different from ballistics techniques that compare striations on the barrel of a gun to those on a recovered bullet, CABL is used when no gun is recovered or when bullets are too small or mangled to observe striations. Follow us on Twitter: @Intel_Today

RELATED POST: Syria — A Commentary on the Khan Shaykhun Chemical Attack Intelligence

RELATED POST: Significant Uncertainties in the Yield Estimate of North Korea H Bomb

RELATED POST: A SCIENTIFIC ANALYSIS OF NORTH KOREA SECOND NUCLEAR TEST (May 25 2009)

A True Story — In 1995, former Baltimore police Sgt. James A. Kulbicki was convicted of first-degree murder. The prosecutor convinced the jury that, in 1993, Kulbicki had killed his mistress —  22-year-old Gina Nueslein– with his off duty .38-caliber revolver.

The scientific evidence was “irrefutable”. The bullets recovered from the victim’s body and from the crime scene had been fired by his gun.

“I wonder what it felt like, Mr. Kulbicki, to have taken this gun, pressed it to the skull of that young woman and pulled the trigger, that cold steel,” the prosecutor asked rhetorically during closing arguments.

Forensic Science — In order to move along a stable straight trajectory, a bullet must spin on itself. To achieve such spin, spiralling “grooves” are machined in the inside of the weapon barrel.

The size of these “grooves” as well as the “lands”, the angle of the grooves, their number per length and the direction of rotation — clockwise or anticlockwise — generally permit to identify a type of weapon. For instance, Colt traditionally uses a left-hand twist while Smith & Wesson uses a right hand twist.

Moreover, specific imperfections of a barrel may allow in some case to match one bullet to a particular weapon. In the best-case scenario, two bullets fired by the same gun will not look alike but they are likely to show areas of resemblance.

When such test is not conclusive or not possible — because the bullets fragments are too small or because the gun is not recovered — it is still possible to analyze the lead content of the fragments and compare it to bullets known to belong to a suspect.

The Scientific Evidence Against Kulbicki

Maryland’s top firearms expert told the jury that the size of the bullet was compatible with Kulbicki’s gun and that he had cleaned the gun.

He added that he had not been able to identify the marks from the barrel.

Last, he testified that the lead content of the bullet that killed his mistress was identical to the content of bullets from a box belonging to Kulbicki.

“Out of the billions of bullets in the world, is this just a coincidence that this bullet ended up in the defendant’s off-duty weapon,” a prosecutor asked.

A prosecutor told the Jury that the evidence presented by the forensic experts was “a significant piece of evidence” and a “major link” to establish Kulbicki’s guilt.

The jurors agreed. Kulbicki was sentenced to life in prison without the possibility of parole.

False Testimony

Joseph Kopera, one of the forensic experts who testified at the trial, presented the formal reports to the defense.

But his working notes were not given to them either at the trial, or at the appeal, which Kulbicki lost.

These notes conflict with the report on all grounds.

Kopera testified that the fragments were consistent with a large-caliber, probably a .38.

His notes tell that the first fragment came from a medium caliber and that the origin of the second fragment could not be determined.

Kopera testified that the gun had been cleaned. His notes read, “Residue in barrel: Yes. Bore condition: Dirty.”

Kopera testified that he could not identify the grooves and lands on the fragments. His notes reveal that the fragment’s land width was 0.072 inches and its groove width was 0.083 inches.

Bullets fired from Kulbicki’s Smith & Wesson revolver had a land width of 0.100 inches and a groove width of 0.113 inches.

The difference is significant enough to state beyond doubts that Kulbicki’s gun did not fire the bullet that killed his mistress.

Kopera testified that he could not identify the twist. His notes indicate that he had detected a “slight left twist” while Kulbicki’s off-duty weapon makes right-twist markings.

Kopera testified that the lead content of the bullets were identical. It was not.

The amount of arsenic in the fragments significantly differed from the one contained in the bullets belonging to Kulbicki.

No Degree — At the trial, Kopera testified that he had an engineering degree from the Rochester Institute of Technology and a mechanical engineering degree from the University of Maryland. Neither institution has ever heard of him.

A Widely Used Technique

“Every critical part of Kopera’s testimony was false, misleading, based on improper assumptions or ignored exculpatory information,” Suzanne K. Drouet, a former Justice Department lawyer, told the judge in her recent motion seeking a new trial for Kulbicki.

“If this could happen to my client, who was a cop who worked within this justice system, what does it say about defendants who know far less about the process and may have far fewer resources to uncover evidence of their innocence that may have been withheld by the prosecution or their scientific experts?”

Following a 2004 National Academy of Sciences report that sharply criticized the FBI’s bullet-lead technique, the agency no longer relies on this method.

After retiring from the firearms section of the Maryland State Police, Kopera  committed suicide.

For more than 30 years, his expertise has helped secure countless convictions.

Nationwide, it has been estimated that the method has been used in more than 2,000 cases over four decades.

Several former FBI employees believe that a review of all cases where the CBLA method was used in testimony should be urgently conducted.

“It troubles me that anyone would be in prison for any reason that wasn’t justified. And that’s why these reviews should be done in order to determine whether or not our testimony led to the conviction of a wrongly accused individual,” said Adams, the former FBI lab director.

The second in command agree.

“I don’t believe that we can testify about how many bullets may have come from the same melt and our estimate may be totally misleading,” declared deputy lab director Marc LeBeau in a May 12, 2005, e-mail.

So far, the FBI has rejected such reviews on the basis that it would be very expensive. A sum of US$70,000 was mentioned.

Since 2005, the nonpartisan Forensic Justice Project, run by former FBI lab whistle-blower Frederic Whitehurst, has tried to force the bureau to release a list of bullet-lead cases under the Freedom of Information Act.

In academic circles, some experts have not hidden their anger toward the program and what seems to be an attempt to cover-up decades of fraudulent forensic sciences.

Clifford Spiegelman is a statistician at Texas A&M University. He reviewed the FBI’s statistical methods for the science academy.

“They said the FBI agents who went after Al Capone were the untouchables, and I say the FBI experts who gave this bullet-lead testimony are the unbelievables.”

Conclusion

Several lessons can be gleaned from the CBLA experience. In the conclusion of his excellent paper on the subject, Paul Giannelli wrote:

First, the failure to publish the empirical data that supports scientific conclusions is unacceptable. Scientists “are generally expected to exchange research data as well as unique research materials that are essential to the replication or extension of reported findings.”

Second, defense attorneys were unable to successfully challenge the evidence until William Tobin, the retired FBI expert, became a defense witness. This is not surprising because no defendant, no matter how rich, can conduct extensive empirical studies. A defense expert in a particular case can critique the bases of a prosecution expert’s opinion but can rarely replicate the research upon which that opinion rests.

Forensic Science: Last Week Tonight with John Oliver (HBO)

  Forensic science used in criminal trials can be surprisingly unscientific. Maybe a new television procedural could help change the public perception.

REFERENCES

Comparative Bullet lead Analysis: A Retrospective — Paul C. Giannelli

Comparative bullet-lead analysis – Wikipedia

=

 

“For over thirty years, FBI experts testified about comparative bullet lead analysis (CBLA), a technique that was first used in the investigation into President Kennedy’s assassination. CBLA compares trace chemicals found in bullets at crime scenes with ammunition found in the possession of a suspect. (…) Although the FBI eventually ceased using CBLA, the Bureau’s […]

via Forensic science — FBI Bullet-Lead Technique Dead Wrong — Intel Today

Digital Forensics as a Big Data Challenge — Forensic Focus – Articles

Digital Forensics as a Big Data Challenge

 

Abstract

Digital Forensics, as a science and part of the forensic sciences, is facing new challenges that may well render established models and practices obsolete. The dimensions of potential digital evidence supports has grown exponentially, be it hard disks in desktops and laptops or solid state memories in mobile devices like smartphones and tablets, even while latency times lag behind. Cloud services are now sources of potential evidence in a vast range of investigations and network traffic also follows a growing trend, and in cyber security the necessity of sifting through vast amount of data quickly is now paramount. On a higher level investigations – and intelligence analysis – can profit from sophisticated analysis of such datasets as social network structures, corpora of text to be analysed for authorship and attribution. All of the above highlights the convergence between so-called data science and digital forensics, to take the fundamental challenge of analysing vast amounts of data (“big data”) in actionable time while at the same time preserving forensic principles in order for the results to be presented in acourt of law. The paper, after introducing digital forensics and data science, explores the challenges above and proceeds to propose how techniques and algorithms used in big data analysis can be adapted to the unique context of digital forensics, ranging from the managing of evidence via Map-Reduce to machine learning techniques for triage and analysis of big forensic disk images and network traffic dumps. In the conclusion the paper proposes a model to integrate this new paradigm into established forensic standards and best practices and tries to foresee future trends.

1 Introduction

1.1 Digital Forensics

What is digital forensics? We report here one of the most useful definitions of digital forensics formulated. It was developed during the first Digital Forensics Research Workshop (DFRWS) in 2001 and it is still very much relevant today:

Digital Forensics is the use of scientifically derived and proven methods toward the preservation, collection, validation, identification, analysis, interpretation, documentation and presentation of digital evidence derived from digital sources for the purpose of facilitating or furthering the reconstruction of events found to be criminal, or helping to anticipate unauthorized actions shown to be disruptive to planned operations. [Pear01]

This formulation stresses first and foremost the scientific nature of digital forensics methods, in a point in time when the discipline was transitioning from being a “craft” to an established field and rightful part of the forensic sciences. At that point digital forensics was also transitioning from being mainly practised in separated environments such as law enforcement bodies and enterprise audit offices to a unified field. Nowadays this process is very advanced and it can be said that digital forensics principles, procedures and methods are shared by a large part of its practitioners, coming from different backgrounds (criminal prosecution, defence consultants, corporate investigators and compliance officers). Applying scientifically valid methods implies important concepts and principles to be respected when dealing with digital evidence. Among others we can cite:

  • Previous validation of tools and procedures. Tools and procedures should be validated by experiment prior to their application on actual evidence.
  • Reliability. Processes should yield consistent results and tools should present consistent behaviour over time.
  • Repeatability. Processes should generate the same results when applied to the same test environment.
  • Documentation. Forensic activities should be well-documented, from the inception to the end of evidence life-cycle. On one hand strict chain-of-custody procedures should be enforced to assure evidence integrity and the other hand complete documentation of every activity is necessary to ensure repeatability by other analysts.
  • Preservation of evidence – Digital evidence is easily altered and its integrity must be preserved at all times, from the very first stages of operations, to avoid spoliation and degradation. Both technical (e.g. hashing) and organizational (e.g. clear accountabilityfor operators) measures are to be taken.

These basic tenets are currently being challenged in many ways by the shifting technologicaland legal landscape practitioners have to contend with. While this paper shall not dwell much on the legal side of things, this is also obviously something that is always to be considered in forensics.

Regarding the phases that usually make up the forensic workflow, we refer here again to the only international standard available [ISO12] and describe them as follows:

  • Identification. This process includes the search, recognition and documentation of the physical devices on the scene potentially containing digital evidence. [ISO12]
  • Collection – Devices identified in the previous phase can be collected and transferred to an analysis facility or acquired (next step) on site.
  • Acquisition – This process involves producing an image of a source of potential evidence, ideally identical to the original.
  • Preservation – Evidence integrity, both physical and logical, must be ensured at all times.
  • Analysis – Interpretation of the data from the evidence acquired. It usually depends onthe context, the aims or the focus of the investigation and can range from malware analysis to image forensics, database forensics, and a lot more of application-specific areas.On a higher level analysis could include content analysis via for instance forensics linguistics or sentiment analysis techniques.
  • Reporting – Communication and/or dissemination of the results of the digital investigation to the parties concerned.

1.2 Data Science

Data Science is an emerging field basically growing at the intersection between statistical techniques and machine learning, completing this toolbox with domain specific knowledge, having as fuel big datasets. Hal Varian gave a concise definition of the field:

[Data science is] the ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it. [Vari09]

We can see here the complete cycle of data management and understand that data science in general is concerned with the collection, preparation, analysis, visualization, communication and preservation of large sets of information; this is a paraphrase of another insightful definition by Jeffrey Stanton of Syracuse University’s School of Information Studies. The parallels with the digital forensics workflow are clear but the mention in both definitions of visualization deserves to be stressed. Visualization is mostly never mentioned in digital forensics guidelines and standards but as the object of analysis moves towards “Big Data”, it will necessarily become one of the most useful tools in the analyst’s box, for instance in the prioritization phase but also for dissemination and reporting: visual communication is probably the most efficient way into a human’s brain but this channel is underused by most of today’s forensic practitioners.

If Data Science is concerned with “Big Data”, what is Big Data anyway? After all big is a relative concept and prone to change with time. Any data that is difficult to manage and work with, or in other words datasets so big that for them conventional tools – e.g. relational databases – are not practical or useful. [ISAC13] From the point of view of data science the challenges of managing big data can be summarized as three Vs: Volume (size), Velocity (needed for interactivity), Variety (different sources of data). In the next paragraph we shall see how these three challenges dovetail nicely with the digital forensics context.

2 Challenges

“Golden Age” is a common definition for the period in the history of digital forensics that went roughly from the 1990s to the first decade of the twenty-first century. During that period the technological landscape was dominated by the personal computer, and mostly by a single architecture – x86 plus Windows – and data stored in hard drives represented the vast majority of evidence, so much so that “Computer Forensics” was the accepted term for the discipline. Also the storage size allowed for complete bitwise forensic copies of the evidence for subsequent analysis in the lab. The relative uniformity of the evidence nature facilitated the development of the digital forensic principles outlined above and enshrined in several guidelines and eventually in the ISO/IEC 27037 standard. Inevitably anyway they lagged behind the real-world developments: recent years brought many challenges to the “standard model”, first among them the explosion in the average size of the evidence examined for a single case. Historical motivations for this include:

  • A dramatic drop in hard drive and solid state storage cost (currently estimated at $80 per Terabyte) and consequently an increase in storage size per computer or device;
  • Substantial increase in magnetic storage density and diffusion of solid-state removable media (USB sticks, SD and other memory cards etc) in smartphones, notebooks, cameras and many other kinds of devices;
  • Worldwide huge penetration of personal mobile devices like smartphones and tablets, not only in Europe and America, but also in Africa – where they constitute the main communication mode in many areas – and obviously in Asia;
  • Introduction and increasing adoption by individuals and businesses of cloud services – infrastructure services (IAAS), platform services (PAAS) and applications (SAAS) – made possible in part by virtualization technology enabled in turn by the modern multi-core processors;
  • Network traffic is ever more part of the evidence in cases and the sheer size of it has – again – obviously increased in the last decade, both on the Internet and on 3G-4G mobile networks, with practical but also ethical and political implications;
  • Connectivity is rapidly becoming ubiquitous and the “Internet of things” is near, especially considering the transition to IPv6 in the near future. Even when not networked, sensors are everywhere, from appliances to security cameras, from GPS receivers to embedded systems in cars, from smart meters to Industrial Control Systems.

To give a few quantitative examples of the trend, in 2008 the FBI Regional Computer Forensics Laboratories (RCFLs) Annual Report [FBI08] explained that the agency’s RCFLs processed 27 percent more data than they did during the preceding year; the 2010 Report gavean average case size of 0.4 Terabytes. According to a recent (2013) informal survey among forensic professionals on Forensic Focus, half of the cases involve more than on Terabyte of data, with one in five over five Terabytes in size.

The simple quantity of evidence associated to a case is not the only measure of its complexity and the growing in size is not the only challenge that digital forensics is facing: evidence is becoming more and more heterogeneous in nature and provenance, following the evolving trends in computing. The workflow phase impacted by this new aspect is clearly analysis where, even when proper prioritization is applied, it is necessary to sort through diverse categories and source of evidence, structured and unstructured. Data sources themselves are much more differentiated than in the past: it is common now for a case to include evidence originating from personal computers, servers, cloud services, phones and other mobile devices, digital cameras, even embedded systems and industrial control systems.

3 Rethinking Digital Forensics

In order to face the many challenges but also to leverage the opportunities it is encountering, the discipline of digital forensics will have to rethink in some ways established principles and reorganize well-known workflows, even include and use tools not previously considered viable for forensic use – concerns regarding the security of some machine learning algorithms has been voiced, for instance in [BBC+08]. On the other hand forensic analysts’ skills need to be rounded up to make better use of these new tools in the first place, but also to help integrate them in forensic best practices and validate them. The dissemination of “big data” skills will have to include all actors in the evidence lifecycle, starting with Digital Evidence First Responders (DEFRs), as identification and prioritization will see their importance increased and skilled operators will be needed from the very first steps of the investigation.

3.1 Principles

Well-established principles shall need to undergo at least a partial extension and rethinking because of the challenges of Big Data.

  • Validation and reliability of tools and methods gain even more relevance in a big data scenarios because of the size and variety of datasets, coupled with the use of cutting-edge algorithms that still need validation efforts, including a body of test work first on methods and then on tools in controlled environments and on test datasets before their use in court.
  • Repeatability has long been a basic tenet in digital forensics but most probably we will be forced to abandon it, at least in its strictest sense, for a significant part of evidence acquisition and analysis. Already repeatability stricto sensu is impossible to achieve in nearly all instances of forensic acquisition of mobile devices, and the same applies to cloud forensics. When Machine Learning tools and methods become widespread, reliance on previous validation will be paramount. As an aside, this stresses once more the importance of using open methods and tools that can be independently and scientifically validated as opposed to black box tools or – worse – LE-reserved ones.
  • As for documentation, its importance for a sound investigation is even greater when we see non-repeatable operations and live analysis routinely be part of the investigation process. Published data about validation results of tools and methods used – or at least pointers to it – should be an integral part of the investigation report.

3.2 Workflow

Keeping in mind how the forensic principles may need to evolve, we present here a brief summary of the forensics workflow and how each phase may have to adapt to big data scenarios. ISO/IEC 27037 International Standard covers the identification, collection, acquisition and preservation of digital evidence (or, literally, “potential” evidence). Analysis and disposal are not covered by this standard, but will be in future – in development – guidelines in the 27xxx series.

Identification and collection

Here the challenge is selecting evidence in a timely manner, right on the scene. Guidelines for proper prioritization of evidence should be further developed, abandoning the copy-all paradigm and strict evidence integrity in favour of appropriate triage procedures: this implies skimming through all the (potential) evidence right at the beginning and selecting relevant parts. First responders’ skills will be even more critical that they currently are and, in corporate environments, also preparation procedures.

Acquisition

When classic bitwise imaging is not feasible due to the evidence size, prioritization procedures or “triage” can be conducted, properly justified and documented because integrity is not absolute anymore and the original source has been modified, if only by selecting what to acquire. Visualization can be a very useful tool, both for low-level filesystem analysis and higher level content analysis. Volume of evidence is a challenge because dedicated hardware is required for acquisition – be it storage or online traffic – while in the not so distant past an acquisition machine could be built with off-the-shelf hardware and software. Variety poses achallenge of a slightly different kind, especially when acquiring mobile devices, due to the huge number of physical connectors and platforms.

Preservation

Again, preservation of all evidence in a secure way and complying with legal requirements calls for quite a substantial investment for forensic labs working on a significant number of cases.

Analysis

Integrating methods and tools from data science implies surpassing the “sausage factory” forensics still widespread today, where under-skilled operators rely heavily on point and click all-in-one tools to perform the analysis. Analysts shall need to include a plurality of tools in their panoply and not only that, but understand and evaluate the algorithms and implementations they are based upon. The absolute need for highly skilled analysts and operators is clear, and suitable professional qualifications will develop to certify this.

Reporting

The final report for an analysis conducted using data science concepts should contain accurate evaluations of tools, methods used, including data from the validation process and accurate documentation is even more fundamental as strict repeatability becomes very hard to uphold.

3.3 Some tools for tackling the Big Data Challenge

At this stage, due also to the fast-changing landscape in data science, it is hard to systematically categorize its tools and techniques. We review here some of them.

Map-Reduce is a framework used for massive parallel tasks. This works well when the data-sets do not involve a lot of internal correlation. This does not seem to be the case for digital evidence in general but a task like file fragment classification is suited to be modelled in aMap-Reduce paradigm. Attribution of file fragments – coming from a filesystem image or from unallocated space – to specific file types is a common task in forensics: machine learning classification algorithms – e.g. logistic regression, support vector machines – can be adapted toM-R if the analyst forgoes the possible correlations among single fragments. A combined approach where a classification algorithm is combined for instance with a decision tree method probably would yeld higher accuracy.

Decision trees and random forests are fruitfully brought to bear in fraud detection software, where the objective is to find in a vast dataset the statistical outliers – in this case anomalous transactions, or in another application, anomalous browsing behaviour.

In audio forensics unsupervised learning techniques under the general definition of “blind signal separation” give good results in separating two superimposed speakers or a voice from background noise. They rely on mathematical underpinning to find, among possible solutions, the least correlated signals.

In image forensics again classification techniques are useful to automatically review big sets of hundreds or thousands of image files, for instance to separate suspect images from the rest.

Neural Networks are suited for complex patter recognition in network forensics. A supervised approach is used, where successive snapshots of the file system are used to train the network to recognize normal behaviour of an application. After the event the system can be used to automatically build an execution timeline on a forensic image of a filesystem. [KhCY07] Neural Networks have also been used to analyse network traffic but in this case the results still do not present high levels of accuracy.

Natural Language Processing (NLP) techniques, including Bayesian classifiers and unsupervised algorithms for clustering like k-means, has been successfully employed for authorship verification or classification of large bodies of unstructured texts, emails in particular.

4 Conclusion

The challenges of big data evidence already at present highlight the necessity of revising tenets and procedures firmly established in digital forensics. New validation procedures, analysts’ training, and analysis workflow shall be needed in order to confront the mutated landscape. Furthermore, few forensic tools implement for instance machine learning algorithms or, from the other side, most machine learning tools and libraries are not suitable and/or validated for forensic work, so there still exists a wide space for development of innovative tools leveraging machine learning methods.

References

[BBC+08] Barreno, M. et al.: “Open Problems in the Security of Learning”. In: D. Balfanzand J. Staddon, eds., AISec, ACM, 2008, p.19-26
[FBI08] FBI: “RCFL Program Annual Report for Fiscal Year 2008”, FBI 2008. http://www.fbi.gov/news/stories/2009/august/rcfls_081809
[FBI10] FBI: “RCFL Program Annual Report fir Fiscal Year 2010”, FBI 2010.
[ISAC13] ISACA: “What Is Big Data and What Does It Have to Do with IT Audit?”,ISACA Journal, 2013, p.23-25
[ISO12] ISO/IEC 27037 International Standard
[KhCY07] Khan, M. and Chatwin, C. and Young, R.: “A framework for post-event timelinereconstruction using neural networks” Digital Investigation 4, 2007
[Pear01] Pearson, G.: “A Road Map for Digital Forensic Research”. In: Report fromDFRWS 2001, First Digital Forensic Research Workshop, 2001.
[Vari09] Varian, Hal in: “The McKinsey Quarterly”, Jan 2009

About the Author

Alessandro Guarino is a senior Information Security professional and independent researcher. He is the founder and principal consultant of StudioAG, a consultancy firm based in Italy and active since 2000, serving clients both in the private and public sector and providing cybersecurity, data protection and compliance consulting services. He is also a digital forensics analyst and consultant, as well as expert witness in Court. He holds an M.Sc in Industrial Engineering and a B.Sc. in economics, with a focus on Information Security Economics. He is an ISO active expert in JTC 1/SC 27 (IT Security Techniques committee) and contributed in particular to the development of cybersecurity and digital investigation standards. He represents Italy in the CEN-CENELEC Cybersecurity Focus Group and ETSI TC CYBER. He is the chair of the recently formed CEN/CENELEC TC 8 “Privacy management in products and services”. As an independent researcher, he delivered presentations at international conferences and published several peer-reviewed papers.

Find out more and get in touch with the author at StudioAG.

by Alessandro Guarino, StudioAG Abstract Digital Forensics, as a science and part of the forensic sciences, is facing new challenges that may well render established models and practices obsolete. The dimensions of potential digital evidence supports has grown exponentially, be it hard disks in desktops and laptops or solid state memories in mobile devices like smartphones […]

via Digital Forensics as a Big Data Challenge — Forensic Focus – Articles