Abstract

1 IntroductionIn this paper is explained one of the concepts and technologies behind the Big Data, the Map Reduce, the ways that Hadoop can be interrogated and then used. In Business Intelligence there are some forever issues with computation, transformation and analysis speed. Once the explosion in amount of appeared, predictions and mining are not separate disciplines. Therefore, customers need to be able to go beyond simple reports and see new ways to understand the and detect trends and opportunities. When thinking of the advantages big can bring, there are a lot, but only two of them are the more important. One of them is regarding the financial benefits and outcome and the second one is about the entire process flow, starting from the organization part to delivering part.Some organizations researching big says that terabyte storage of structured is currently most cheaply provided with big technologies such as Hadoop clusters. For example for a company with the cost of storing one terabyte for a year was $37,000 for a traditional relational database, $5,000 for a database appliance, and $2,000 for a Hadoop cluster. Of course, these figures cannot be directly comparable, because the more traditional technologies may be somehow more safe and easily administrated [1].Big Data is a concept that promise to help in all that areas, using the three V's, volume, velocity and variety.> Volume: big is that Ocean of data that we mentioned about in the rows above. It is represented by information that can came from every possible sensor, and some even say that we people are also sensors and gatherers for big [9] The challenges of having such a big quantity of is that is very hard to sustain it, to store it, to analyze it and ultimately to use it.> Velocity: is all about the speed of traveling from one point to another and the speed of processing it. Sometimes it is crucial for the manager to be able to decide in a very little time on a variety of issues [2]. The most important issue is that the resources that analyses is limited compared to the volume of data, but the requests of information is unlimited and usually information gets through at least one bottleneck.> Variety, the third characteristic is represented by the types of that are stored. Because there are many types of sensors and sources, the that came from them is vary very much in size and type. It is very complicated to analyze text, images and sounds in the same context and get a result that can be relied on. And then is the issue of dark data, that sits in the organization and is unused and also is not free.> There are one new dimension that were added to the existing ones: Veracity Veracity is the hardest thing to achieve with big data, because due to the Volume of information and the variety of its type is hard to identify the useful and accurate form the dirty data. The biggest problem is that the dirty data can lead very easy to an avalanche of errors, incorrect results and can affect the Velocity attribute of Big Data. The main purpose of the Big Data can be corrupted and all the information can lead to a useless and very expensive Big Data environment if there is not a good cleaning team. The Veracity attribute is in its self also an objective for the Big Data developers. If the cannot be accurate, is redundant or is unreliable, the whole Company can have a big problem, especial the companies that use big to sell information like the marketing ones, or the ones that make market studies. Many social media responses to campaigns could be coming from a small number of disgruntled past employees or persons employed by competition to post negative comments.2MapReduce ModelMap reduce is a programming model of a concept which is used for generating and processing large sets. The computation takes a set of output key/value pairs and has two levels of processing. …

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.