How to Create an External Table in Hive

Introduction

In Hive terminology, external tables are tables not managed with Hive. Their purpose is to facilitate importing of data from an external file into the metastore.

The external table data is stored externally, while Hive metastore only contains the metadata schema. Consequently, dropping of an external table does not affect the data.

In this tutorial, you will learn how to create, query, and drop an external table in Hive.

How to Create an External Table in Hive

Prerequisites

  • Ubuntu 18.04 LTS or later
  • Access to command line with sudo privileges
  • Apache Hadoop installed and running
  • Apache Hive installed and running

Note: This tutorial uses Ubuntu 20.04. However, Hive works the same on all operating systems. This means the process of creating, querying and dropping external tables can be applied to Hive on Windows, Mac OS, other Linux distributions, etc.


Creating an External Table in Hive – Syntax Explained

When creating an external table in Hive, you need to provide the following information:

  • Name of the table – The create external table command creates the table. If a table of the same name already exists in the system, this will cause an error. To avoid this, add if not exists to the statement. Table names are case insensitive.
  • Column names and types – Just like table names, column names are case insensitive. Column types are values such as int, char, string, etc.
  • Row format – Rows use native or custom SerDe (Serializer/Deserializer) formats. Native SerDe will be used if the row format is not defined, or if it is specified as delimited.
  • Field termination character – This is a char type character which separates table values in a row.
  • Storage format – You can specify storage formats such as textfile, sequencefile, jsonfile, etc.
  • Location – This is the HDFS directory location of the file containing the table data.

The correct syntax for providing this information to Hive is:

create external table if not exists [external-table-name] (
[column1-name] [column1-type], [column2-name] [column2-type], …)
comment '[comment]'
row format [format-type]
fields terminated by '[termination-character]'
stored as [storage-type]
location '[location]';

Create a Hive External Table – Example

For the purpose of a practical example, this tutorial will show you how to import data from a CSV file into an external table.

Step 1: Prepare the Data File

1. Create a CSV file titled ‘countries.csv’:

sudo nano countries.csv

2. For each country in the list, write a row number, the country’s name, its capital city, and its population in millions:

1,USA,Washington,328
2,France,Paris,67
3,Spain,Madrid,47
4,Russia,Moscow,145
5,Indonesia,Jakarta,267
6,Nigeria,Abuja,196
Editing a CSV file in Nano.

3. Save the file and make a note of its location.

Step 2: Import the File to HDFS

1. Create an HDFS directory. You will use this directory as an HDFS location of the file you created.

hdfs dfs -mkdir [hdfs-directory-name]

2. Import the CSV file into HDFS:

hdfs dfs -put [original-file-location] [hdfs-directory-name]
Using hdfs dfs put command to copy the file in the HDFS directory

3. Use the -ls command to verify that the file is in the HDFS folder:

hdfs dfs -ls [hdfs-directory-name]
Using hdfs dfs ls command to list the contents of an HDFS directory

The output displays all the files currently in the directory.


Note: To learn more about HDFS, refer to What Is HDFS? Hadoop Distributed File System Guide.


Step 3: Create an External Table

1. After you import the data file to HDFS, initiate Hive and use the syntax explained above to create an external table.

Creating an external table in Hive

2. To verify that the external table creation was successful, type:

select * from [external-table-name];

The output should list the data from the CSV file you imported into the table:

Using the select command to see the contents of an external table

3. If you wish to create a managed table using the data from an external table, type:

create table if not exists [managed-table-name](
[column1-name] [column1-type], [column2-name] [var2-name], …)
comment '[comment]';
Creating a managed table in Hive

4. Next, import the data from the external table:

insert overwrite table [managed-table-name] select * from [external-table-name];
Using the insert command to import data from an external table to a managed table

5. Verify that the data is successfully inserted into the managed table.

select * from [managed-table-name];
Using the select command to see if the content of the external table successfully transferred to the managed table

How to Query a Hive External Table

To display all the data stored in a table, you will use the select * from command followed by the table name. Hive offers an expansive list of query commands to let you narrow down your searches and sort the data according to your preferences.

For example, you can use the where command after select * from to specify a condition:

select * from [table_name] where [condition];

Hive will output only the rows which satisfy the condition given in the query:

Using conditions to narrow down the output of the select command in Hive

Instead of the asterisk character which stands for “all data”, you can use more specific determiners. Replacing the asterisk with a column name (such as CountryName, from the example above) will show you only the data from the chosen column.

Here are some other useful query functions and their syntax:

FunctionSyntax
Query a table according to multiple conditionsselect * from [table_name] where [condition1] and [condition2];
Order table dataselect [column1_name], [column2_name] from [table_name] order by [column_name];
Order table data in descending orderselect [column1_name], [column2_name] from [table_name] order by [column_name] desc;
Show the row countselect count(*) from [table_name];

How to Drop a Hive External Table

1. Dropping an external table in Hive is performed using the same drop command used for managed tables:

drop table [table_name];

The output will confirm the success of the operation:

Dropping an external table in Hive

2. Querying the dropped table will return an error:

Querying a dropped external table returns an error.

However, the data from the external table remains in the system and can be retrieved by creating another external table in the same location.

Conclusion

After reading this tutorial, you should have general understanding of the purpose of external tables in Hive, as well as the syntax for their creation, querying, and dropping.

Was this article helpful?
YesNo
Marko Aleksic
Marko Aleksić is a Technical Writer at phoenixNAP. His innate curiosity regarding all things IT, combined with over a decade long background in writing, teaching and working in IT-related fields, led him to technical writing, where he has an opportunity to employ his skills and make technology less daunting to everyone.
Next you should read
Data Warehouse Architecture Explained
October 29, 2020

A data warehouse is a complex system that stores historical and cumulative data used for forecasting...
Read more
How to Create a Table in Hive
October 28, 2020

Apache Hive is a data warehousing tool used to perform queries and analyze structured data in Apache Hadoop...
Read more
What is HDFS? Hadoop Distributed File System Guide
August 27, 2020

Want to learn more about HDFS? This guide explains what the Hadoop Distributed File System is, how it works...
Read more
How to Install Apache Hive on Ubuntu
June 23, 2020

This tutorial shows you how to install, configure, and perform basic commands in Apache Hive. Improve your...
Read more
  • © 2021 Copyright phoenixNAP | Global IT Services. All Rights Reserved.