Category Archives: projects using hadoop

  • -

Projects Using Hadoop

 Projects Using Hadoop

Projects using hadoop helps on processing large amount data using cluster commodity hardware.



  • Big Data as an e-Health Service.                                                                                                                                                                                                                                                                                                                               Big Data is transforming healthcare, business, and ultimately society itself, as e-Health becomes one of key driving factors during the innovation process. We investigate BDeHS (Big Data e-Health Service) to fulfill the Big Data applications in the e-Health service domain. In this paper we explain why the existingBig Data technologies such as Hadoop, MapReduce, STORM and the like cannot be simply applied toe-Health services directly. We then describe the additional capabilities as required in order to make BigData services for e-Health become practical. Next we report our design of the BDeHS architecture that supplies data operation management capabilities, regulatory compliance, and e-Health meaningful usages.


  • Transforming Big Data into Smart Data: Deriving value via harnessing Volume, Variety, and Velocity using semantic techniques and technologies.                                                                                                                                                                                                                                                                              Big Data has captured a lot of interest in industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on the challenges of the four V’s of Big Data: Volume, Variety, Velocity, and Veracity, and technologies that handle volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc). However, the most important feature of Big Data, the raison d’etre, is none of these 4 V’s — but value. In this talk, I will forward the concept of Smart Data that is realized by extracting value from a variety of data, and how Smart Data for growing variety (e.g., social, sensor/IoT, health care) of Big Data enable a much larger class of applications that can benefit not just large companies but each individual. This requires organized ways to harness and overcome the four V-challenges. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.


  • Big data processing framework of road traffic collision using distributed                                                                                                                                                                                                                                             The traffic information is a big data comes from varying sources, such as, social sites, mobile phone GPS signals and so on. The Hadoop and HBase can store and analyze real-time collision data in a distributed processing framework. This framework can be designed as flexible and scalable framework using distributed CEP that process massive real-time traffic data and ESB that integrates other services. In this paper, we propose a new architecture for distributed processing that enables big data processing on the road traffic data and its related information analysis. We tested the proposed framework on road traffic data on 400km from Seoul to Busan freeway section in Korea. By integrating freeway traffic big data and collision data over a seven-year period (1TB Size), we obtained the collision probability data.


  • Design of handover self-optimization using big data analytics                                                                                                                                                                                                                                                                     As the cloud computing platform and the cluster file system become more mature so that more applications are developed on the distributed parallel processing, they could be usefully applied to theanalytics of the cellular network service quality. Thus, in this paper, handover optimization is selected as the use of big data analysis techniques to improve the service quality of the cellular network. Handoveris one of the most critical issues for the quality of cellular network services, while configuration of neighbor cell list of each cell is one very important job. A traditional way is that network operators manually configure the neighbor cell list for each cell and cannot rapidly respond to changes of the network to provide handover optimization. In this paper, a method for handover self-optimization is introduced, including initialization of self-configuration of neighbor cell list and self-optimization to further refine the neighbor cell list based on performance measurements, in order to improve the success rate of handover, thus improving the service quality of the cellular network. Because of needs of a large scale measurement data analyzing, a big data platform is built.All the self-optimization algorithms are deployed on Hadoop computing platform to enhance efficiency.


  • A scalable machine learning online service for big data real-time analysis                                                                                                                                                                                                                                            This work describes a proposal for developing and testing a scalable machine learning architecture able to provide real-time predictions or analytics as a service over domain-independent big data, working on top of the Hadoop ecosystem and providing real-time analytics as a service through a RESTful API. Systems implementing this architecture could provide companies with on-demand tools facilitating the tasks of storing, analyzing, understanding and reacting to their data, either in batch or stream fashion; and could turn into a valuable asset for improving the business performance and be a key market differentiator in this fast pace environment. In order to validate the proposed architecture, two systems are developed, each one providing classical machine-learning services in different domains: the first one involves a recommender system for web advertising, while the second consists in a prediction system which learns from gamers’ behavior and tries to predict future events such as purchases or churning. An evaluation is carried out on these systems, and results show how both services are able to provide fast responses even when a number of concurrent requests are made, and in the particular case of the second system, results clearly prove that computed predictions significantly outperform those obtained if random guess was used.


  • Scalable big data computing for the personalization of machine learned models and its application to automatic speech recognition service                                                                                                                                                                                                                                                                                            We observe that the recent advances in big data computing have empowered model-based services such as speech recognition, face recognition, context-aware service, and many other services. Various sources of user’s logs can be utilized in remodeling or adapting existingmodels to improve the quality of service. We propose a system that can support store/retrieve data and process them in a scalable manner. Recently advances in ASR and big data technologies drive more personalized services in many areas of services. A speaker adaptation is one good example which requires huge computation cost in creating a personalized acoustic model and corresponding language model over 100s millions of Samsung product users. We propose a personalized andscalable ASR system powered by the big data infrastructure which bringsdata-driven personalized opportunities to voice-enabled services such as voice-to-text transcriber, voice-enabled web search in a peta bytes scale. We verify the feasibility of speaker adaptation based on 107 testers’ recordings and obtain about 10% of recognition accuracy. We study an optimal set of execution environments by executing jobs running either on Hadoop 1 or Hadoop 2 cluster, and move forward performance optimization strategies: workflow compaction, file compression, best file system selection among several distributed file systems. We devise a metric for the cost of personalized model creation to compare the efficiency of one cluster with the other cluster, and it provides the estimated total execution time for the given number of machines. We finally introduce our in-house object storage and data storage design, and their high performance compared to state-of-the art systems, optimized for voice-enabled services to effectively support small and large files


Hadoop Solutions offers projects using Hadoop – Topics/Thesis/Projects @ best in market. Enquire us for more details.

Work Progress

PHD - 24

M.TECH - 125

B.TECH -95

BIG DATA -110.


ON-GOING Hadoop Projects





Achievements – Hadoop Solutions


Twitter Feed

Customer Review

Hadoop Solutions 5 Star Rating: Recommended 4.9 - 5 based on 1000+ ratings. 1000+ user reviews.