Contact us

Contact us for more information about Jumbune, professional services and product support.

Send Message

Analytics over erroreonus bad data quality generates maintenance and repair costs. Beyond these economic aspects, poor data quality can also affect customer satisfaction, reputation or even strategic decisions which may result in monetary loss to the organization. It is therefore very important to be able to measure the quality of data residing in their data lake.Usually Data Scientist spends 60% of their efforts in cleaning the data so that they can create effective data science models over it.

Data quality is entire a process, it’s not something you do just once. Jumbune helps at every stage, making it easy to profile and identify problems, manage the entire data quality life cycle to maintain a high level of data quality. Jumbune Data Analysis framework provides users much needed visibility into the quality and profiles of the data present on the Hadoop distributed file system. Users can assess the quality of the data within the dataset over a period of time for consistancy and logic, also profile them to quickly categorise them based on the present data. The user need not write any code and entire functionality is carried out without any data movement.With Jumbune Data Quality, you can manage the entire data quality life cycle: cleansing, profiling, standardizing, matching and monitoring.It is a fast, easy way to improve your data lifespan.

Jumbune unblemished module Analyze Data conform to Data Quality Validation Process.Analyze Data module have very comprehensive feature like Data Validation, Data Profiling, Data Quality TimeLine, Data Cleansing.

Data Validation - The key dimensions to address in a data management strategy are accuracy, integrity, consistency, completeness, validity, timeliness, accessibility, cleanliness, relevancy and profiling. Jumbune Data validation module provides deep insights into the quality of the data present on your Data Lakes.A measure to evaluate the correctness of collected (or acquired) data sets. It supports various data formats such as plain text, JSON and XML documents as well.Jumbune Data Analysis framework provides users much needed visibility into the quality and profiles of the data.It analyze TB’s of data in comparatively less time.Finds anomalies using generic categories of validations: Null, Regex and Data Type.

Data Quality Time Lines - With Jumbune Analyze data module, you can increase business value by ensuring that all key initiatives and processes are fueled with relevant, timely, and trustworthy data.The data validation tasks can be scheduled and Data Quality Timelines are used to infer the health of the data over a period of time. It validates the data on the HDFS based on custom defined set of rules as per the business. The rules can be in the form of null checks, data types and/or regular expressions expressing the business form of data.

Data Cleansing - Data cleaning, or data scrubbing, is an essential part of ensuring data is complete, accurate, and relevant. The procedures of data cleaning work to remove incorrect or inaccurate data. This module deals with detecting and removing errors and inconsistencies from data in order to improve the quality.It provides Dead Letter Channel facility where inconsistent data is stored for future purpose.Jumbune also provides reports which depicts counts of clean and dirty filesets.

Data Profiling - Data Profiling evaluates the structure of the data set in the enterprise data lake according to the set of business rules.It computes various profiles that helps user to become familiar with data. Also helps to know whether existing data can be used for more analytics.Information obtained during data profiling such as data type, length, discrete values, uniqueness, occurrence of null values,typical string patterns can be used to discover problems such as illegal values, misspellings, missing values, varying value representation, and duplicates.HDFS data can also be profiled over a set of rules or without one to obtain some quick insights over the ingested data using the Data Profiling tool. The profiles can be used later for getting a high level view of the data values.

About Us
Reload

Jumbune, a product from Impetus Technologies, that helps to optimize and analyze Big Data applications running on enterprise clusters. It is built on open source and highly scalable with deep insights into performance of Hadoop applications and clusters.