Hive is an all-in-one project management tool developed to “help teams move faster” regardless of how they work. Features are created based on users’ requests and are updated weekly, making Hive the world’s first democratic software platform. It’s best known for its capabilities in project management, time management, team collaboration, automation, and an array of integrations with third-party software. Hive is free to use for solo users and with premium versions available to teams and enterprises.
Capabilities |
|
---|---|
Segment |
|
Deployment | Cloud / SaaS / Web-Based, Mobile Android, Mobile iPad, Mobile iPhone |
Support | 24/7 (Live rep), Chat, Email/Help Desk, FAQs/Forum, Knowledge Base, Phone Support |
Training | Documentation |
Languages | English |
It is very simple to use because you fill like you use simple SQL language for querying data. When I just started I didn't have any experience with Hive and in like one week I was able to query big data and do some analysis. In a month I was able to administrate data and create my own databases with the useful data. . .
Not so many implemented functions in the Hive. There are very useful Window functions but it's not enough. . . It's not that simple to modify data inside a table. . .
Analyze every day and every hour or even every minute user experience, user behavior in application or web client , etc . . .
Apache Hive is a tool built on top of Hadoop for analyzing large, unstructured data sets. Most BI and SQL developer tools can connect to Hive as easily as to any other database.
Unable to cancel a running query. Query tuning is difficult compared to RDBMS
We had a requiement to scan a large dataset for our predection algorithm. Initially we used RDBMS but the performace was very slow and user where not happy with it. We replaced RDBMS with the Hive and we are able to see a drastic improvment in the performance.
hiveql is more like SQL and really easy to learn
doesnt work good if you want a low latency queries
performance for 1TB of data
If you know SQL you will be able to get Hive really quickly. Lots of the same functionality but not exactly SQL. Easy to create tables and start writing queries allowing you to dive deeper into your data.
As with all Hadoop tools lots of knobs to tweak. Takes a good bit of time optimize and finely tune your Hive install.
Putting structure on unstructured. Once we chose hive to accomplish the aforementioned task we were able to bring our data to our data scientists quickly. An easier degree of acceptance to the Big Data idea.
- Easy to use interface - multiple clients (CLIs) - easy to debug issues with the help of fully descriptive logs - constantly the product is being improved to meet all the DB developer requirements - can be accessed from multiple applications - access through knox for additional security - no indexing - multiple file formats - the tez architecture
- authentication gaps - issues when routing through zookeeper - not as matured tool as the regular database tools
- BI team is helping all the enterprise users to ingest and access data from hadoop - most of the users are well versed with standard sql tools - to make hadoop enterprise wide solution we are training all users with hive
Hive has a simple and intuitive interface and gets the job done.
So far Hive has met and exceeded all my expectations.
Working on a Hadoop system to determine recruiters that are spamming members too much.
It's performance using distributed computation
Limited options for query performance optimization
It is very good for OLAP related tasks
Leverage sql skills to perform operations on data stored in hadoop.
Works on map reduce algorithm, so the retrieval of data is a little slow.
Allowed business users to query data using sql skills.
The best thing about HIVE is that anyone that is familiar with SQL can take advantage of HIVE's ability to run map reduce jobs. Newer version of HIVE is getting better at supporting windowing functions and fleshing out any inconsistencies. So far the documentation is good enough for getting me through my tasks and there is still on-going support for this product, which is a pretty good sign to me.
Older versions of HIVE sucks. There are lots of limitations that will force you to write HiveQL queries that are not straight forward and, even potentially, inefficient. For example, no support for window functions and no equality comparisons on joins can make your life very difficult so you will need to fall back to using some whacky full joins or self joins to accomplish the same task.
We are using HIVE as a data warehouse. One of the benefits of HIVE is that it can break your SQL queries into a series of map reduce jobs, so its supposed to speed up your queries if given enough compute nodes.
Hive is the best out there for answering ad-hoc queries in parallel paradigm. It works very well with Hadoop Echo system (mainly integrates perfectly with HDFS). - Easy to use as it implements most of SQL functions.
- Needs more optimization for complex queries (like caching, auto-partitioning,etc ...) to speed up the latency of the queries. - Tuning the hive parameters is really challenging for the users. The default settings don't work with the large queries. - Hive is perfect if 90-95% of the queries are read-only. It is not suitable for applications with heavily updates
Get quick insights from big data in case of the customers' data don't fit on one machine. It helps a lot for data preparation (i.e. creating temporary tables), that can be consumed by other machine learning solutions like Spark to build machine learning models that add more business values.
For all its processing power, Pig requires programmers to learn something on top of SQL. It requires learning and mastering something new. Hive statements are remarkably similar to SQL and despite the limitations of Hive Query Language (HQL) in terms of the commands that it understands, it is still very useful. Hive provides an excellent open source implementation of MapReduce. It works well when it comes to processing data stored in a distributed manner, unlike SQL which requires strict adherence to schemas while storing data.
Despite the working differences, once you enter the Hive world from SQL, similarity in language ensures smooth transition but it is important to note the differences in constructs and syntax, else you’re in for frustrating times.
data extracting, processing and analysis. It's fast.
Stable product; Easy to use; Multiple computation engines - Tez, MR; Almost all SQL capabilities;
Delete support is still not there even though they are nearly there.
Primary Querying engine for Data Analytics
Provides quick results based on a hadoop database, easy to use interface with simple set up steps
Some quirks with HiveQL may require referencing the documentation, but there is a lot of similarity with other SQL based languages.
Data analytics, making vast amounts of data available for general BI uses
The best part is being able to use a familiar syntax.
Doesn't support all MYSQL use-cases (understandably).
Ad-hoc queries on ETL'd production data.
performing SQL-like queries, Partitioning Tables, De-normalizing data, Compress map/reduce output are best benefits
For some cases you cannot do complicated operations using Hive e.g. when output of one job acts as input to the other job (SequenceFileFormat file) or writing query on an image file, Hive is not useful.
Hive helps in resolving big data problems
Hadoop does not have native query language, but Hive is a great addition to use on top of hadoop. I could point to a data stored in hadoop to a specific table and could use normal queries like I usually do in SQL. We can join and do aggregations etc. Makes life pretty simple.
It can be very slow as it runs on map reduce jobs underneath. Data cannot be updated but we will have to do a rewrite.
For people who are used to write SQL queries would have a very good time using Hive on top of hadoop for files stored in HDFS.
If you are data analyst and expert in SQL then use Hive. Hive is very easy to work with especially if you are a SQL person. I use both hive and pig at work. I use hive mainly for ad hoc quires and reports. For BI reports Hive is the best since you can reuse all the SQL that you have done for traditional data warehouses. Also with Hive Server2 you get a real JDBC support so you can plug your BI tools to it. Many more SQL features like cubes, rollups, windowing, lag, lead, etc are being added to Hive through Hortonworks Stinger initiative. Hive also produces very compact code, which is always good for reading and debugging.
I would suggest to use hive for large projects, where you want to implement SQL-like data access, schemas, metadata, partitions, server-based deployment, jdbc, etc. Pig is a good language and can be very handy for immediate tasks or small projects. i would recommend PIG for small projects .
Hive Hadoop provides the users with strong and powerful statistics functions. Hive Hadoop is like SQL, so for any SQL developer the learning curve for Hive will almost be negligible. Hive Hadoop can be integrated with HBase for querying the data in HBase whereas this is not possible with Pig. In case of Pig, a function named HbaseStorage () will be used for loading the data from HBase. Hive Hadoop has gained popularity as it is supported by Hue. Hive Hadoop has various user groups such as CNET, Facebook, and Digg and so on.
Hive can tell us the detailed progress of a query, and can incorporate UDFs in different languages
The query speed is way to slow, and it does not support positional arguments in GROUP BY and ORDER BY
We use Hive to run our nightly workflows on HDFS in batch for data aggregation and analysis.
Hive syntax is almost exactly like sql, so for someone already familiar with sql it takes almost no effort to pick up hive. It can perform a wide variety of analyses over very large sets of data and requires very little tuning if you are willing to wait a while for the results.
Hive can be a bit slow in comparison to other languages like Pig. It also does not have as rich of a scripting language. This is what makes it the second choice language for most data analysis jobs at LinkedIn.
We are trying to mine data from massive data sets for a wide variety of purposes (debugging production issues, creating business metrics, models, and forecasts among other things). We have been able to do this very easily using our data warehouse and a combo of hive and Pig.