Release 1.2.0 (17 January 2018)
New features available
- Added support for Windows Azure HDInsight Service.
- Added support for secured (kerberized) clusters.
- Added support for TEZ execution engine in Resource Utilization Metering.
- Added support for Amazon EC2 spot instances auto discovery.
- Ability to auto discover the worker nodes present in the cluster.
- Added Relative and Absolute Queue utilization summary with respect to defined and total cluster capacity respectively.
- Added support for Resource Manager HA.
- Added support for https transport on jetty.
- Added detailed charge back report option depicting the detail analysis of the cluster usage for a particular user.
- Added alert for monitoring maximum number of items hdfs path may contain.
- Added the capability to capture actual user details in resource utilization metering when hive impersonation is disabled.
- Added filtering options in metered queue, long running and high resource usage applications.
- Added queue information in long running and high resource usage apps.
- Added functionality to save widget and time setting, also added reset button to set default settings. Deployment
- Created a sample cluster json during installation of jumbune. Agent
- Added the capability to configure the agent log directory location.
Bug Fixes & Performance Improvements
- In Alert section, under replicated block numbers were displaying in decimal format.
- Improved setting options of widgets.
- Optimized Failed Applications alert.
- Implemented lazy loading in capacity utilization & Optimized Jobs Capacity Utilization time.
- Made some visualization changes in instantaneous queue utilization, data center.
- Job profiling graph display issue when no reducer is run.
- Optimized long running, influx http client, resource manager api calls and performance issues.
- Stopped request of user queue utilization when its already pending. Optimize Job
- Not considering reduce phase optimization when running map only job.
- IllegalArgumentException is observed when capability per node is chosen as defined in FS.
- Out of memory is observed during Reduce phase.
- Unbounded cluster occuring due to low vcores available. Data Quality
- JSON Data Validation not working, null pointer validation is observed.
- In Data validation, while checking number of fields violations if last field is blank or null it is neglected.
- Multiple job execution issue when switching from recent tabs while the first job is executing.
- Incorrect and inconsistent result of data validation is shown if data consists of null and not null values.
Release 1.1.0 (12 May 2017)
- Resource Utilization Metering - This features helps organizations using multi-tenant cluster to charge their customers on the basis of resources consumed by them. Consumption is derived on the basis of resources i.e. V Cores and memory used with respect to different execution engines.
- Relative pool utilization by users - It enables organizations to analyze the usage of different resources in queues by submitted users. It gives the visibility of the customers who are doing maximum utilization of queue and resources.
- Queue Pool Utilization summary - This feature helps organizations to analyze the queue utilization by their customers monthly. It helps in re-defining the maturity level of queues, which in turn leads to appropriate resource utilization in Data Lake.
- Data Cleansing - Data Cleansing module helps to classify data on the basis of user defined criteria’s. It detects irrelevant data from data set based upon data violation constraints and persist both data files on HDFS. This feature gives assurance, that this clean data when ingested will help the application to run fine at first shot.
- Spark Recommendations - Optimal configurations like executor memory, executor cores, executor instances etc. are recommended for spark jobs.
- Hybrid Cluster support - Jumbune is enhanced to support hybrid cluster i.e. comprising of some on-premise nodes and some compute nodes on cloud. This helps organization to analyze/optimize their jobs on newly built hybrid cluster.
- Offline statistics capture - This feature enables Jumbune to capture offline statistics of cluster resources/queues. This feature benefits organization to monitor all the offline statistics on Analyze Cluster at any point of time.
- Export - This feature is introduced on Analyze Cluster module to facilitate user in persisting all the queue/daemon statistics. It can be used as analysis tool to run/schedule new jobs in future, effectively.
- Analyze Cluster - Reflect configuration change immediately for component “High Resource Usage Applications” on UI Showing capacity scheduler statistics instead of Fair scheduler, though Fair scheduler is configured Recommendation not getting generated for all nodes, though recommended configurations are done only for one node of cluster. "Queue Utilization" statistics are not getting captured for Fair Scheduler configuration, when “Background Process” is turned ON
- Optimize Job - Optimize Job when ran with manual option shows graph only with one iteration Text Box to give recommendation shows time in sec and min both as example. Please correct it to show only minutes Legends should be shown parallel to graph on result page Optimize Job fails when ran with "Defined in FS" option Optimize Job fails when Capacity scheduler is configured with child queues Job is not getting tripped in the given maximum time Job are getting logged with Agent user and not with the user who submitted the job in real-time
- Data Cleansing - On Result page ‘ROOT’ is coming as one straight line Data Quality Timeline : When job is submitted to execute every hour, no entry is getting logged in CRON file Data Validation: Module names shown on Resource Manager UI for executed jobs should be specific to running module i.e Data Cleasing for “Data Cleasing” module, Data Profiling for “Data Profiling” module Data Validation: "parameter" text box should be optional and tool tip should be updated to help users provide correct values.
- Manage Cluster - “Alert tab” to update and show enable/disable in place of “true/false”.
- Dashboard - Correct message for License to and License From in “License” section.
- Deployment - Influx DB port is not getting configured with the port given on deployment time.