A Scalable Approach to Source Camera Identification over Hadoop
A Scalable Approach to Source Camera Identification over Hadoop.In this paper, we explore the possibility to solve a commonly-known digital image forensics problem, the Source Camera Identification (SCI) problem, using a distributed approach. The SCI problem requires to recognize the camera used to acquire a given digital image, distinguishing even among cameras of the same brand and model. The solution we present is based on the algorithm by Lukas Fridrich, as it is recognized by many as the reference solution for this problem, and is formulated according to the MapReduce paradigm, as implemented by the Hadoop framework. The first implementation we coded was straightforward to obtain as we leveraged the ability of the Hadoop framework to turn a stand-alone Java application into a distributed one with very few interventions on its original source code. However, our first experimental results with this code were not encouraging.
Thus, we conducted a careful profiling activity that allowed us to pinpoint some serious performance issues arising with this vanilla porting of the algorithm. We then developed several optimizations to improve the performance of the Lukas algorithm by taking better advantage of the Hadoop framework. The out coming implementations have been subject to a thorough experimental analysis, conducted using a cluster of 33 commodity PCs and a data set of 5, 160 images. The experimental results show that the performance of our optimized implementations scale well with the number of computing nodes while exhibiting performance that are, at most, two times slower than the maximum speedup theoretically achievable.
Similar IEEE Project Titles
- Hadoop based enhanced cloud architecture for bioinformatic algorithms
- LsPS: A Job Size-Based Scheduler for Efficient Assignments in Hadoop
- Data-Driven Computer Go Based on Hadoop
- Translation Memory for a Machine Translation System Using the Hadoop Framework
- Signature based malware detection for unstructured data in Hadoop