Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL (HCatalog tables on HDFS). In order to deploy the ETL UDFs, you need to set up the connectivity between EXASOL and Hadoop. This SOL describes the network requirements to do this.
All EXASOL nodes need access to either the Hive Metastore (recommended) or to WebHCatalog:
TheHive Metastoretypically runs on port9083of the Hive Metastore server (hive.metastore.uris property in Hive). It uses a native Thrift API, which is faster than WebHCatalog.
TheWebHCatalog server(formerly called Templeton) typically runs on port50111on a specific server (templeton.port property).
All EXASOL nodes need access to the namenode and all datanodes, either via the native HDFS interface (recommended) or via the HTTP REST API (WebHDFS or HttpFS)
HDFS(recommended): The namenode service typically runs on port 8020 (fs.defaultFS property), the datanode service on port50010or1004in Kerberos environments (dfs.datanode.address property)
WebHDFS: The namenode service for WebHDFS typically runs on port50070on each namenode (dfs.namenode.http-address property), and on port50075(dfs.datanode.http.address property) on each datanode. If you use HTTPS, the ports are50470for the namenode (dfs.namenode.https-address) and50475for the datanode (dfs.datanode.https. address).
HttpFS: Alternatively to WebHDFS you can use HttpFS, exposing the same REST API as WebHDFS. It typically runs on port14000of each namenode. The disadvantage compared to WebHDFS is that all data are streamed through a single service, whereas webHDFS redirects to the datanodes for the data transfer.
Kerberos: If your Hadoop uses Kerberos authentication, the UDFs will authenticate using a keytab file. Each EXASOL node needs access to the Kerberos KDC (key distribution center), running on port88. The KDC is configured in the kerberos config file, which is used for the authentication.