Command To Generate Ssh Key In Hadoop

  

On each node, you generate SSH keys. For example using this command: ssh-keygen -t rsa -b 4096 -C someemail@.example.com Then you replicate the keys to all the nodes: ssh-copy-id hadoop@master ssh-copy-id hadoop@slave-01 ssh-copy-id hadoop@slave-02 etc. This needs to be done on each node (every node should have all the keys). Hope this help! Create an SSH key pair. Use the ssh-keygen command to create public and private key files. The following command generates a 2048-bit RSA key pair that can be used with HDInsight: ssh-keygen -t rsa -b 2048 You're prompted for information during the key creation process. For example, where the keys are stored or whether to use a passphrase. Apr 04, 2014 This entry was posted in Hadoop SSH and tagged hadoop cluster setup ssh hadoop configure ssh hadoop passwordless ssh install SSH on Ubuntu need for ssh passwordless ssh configuration Setting Up Passwordless SSH SSH Installation on Linux SSH Installation on Ubuntu what is ssh on April 4, 2014 by Siva. Aug 28, 2016  sudo gedit /etc/ssh/sshdconfig. Note: Set PubkeyAuthentication to Yes. Once the changes are made reload ssh. Sudo /etc/init.d/ssh reload. First, we have to generate an SSH key for the hduser user. To generate SSH key: su hduser. Enter the password of the hduser. Hadoop uses SSH (to access its nodes) which would normally require the user to.

-->

This command works on Linux, MacOS, and Windows 10. Unless you have reason to change it, leave the default location of /.ssh/idrsa. If the command says the key already exists, you can either overwrite it or continue onto the next step with your existing key. Mar 28, 2018  In Hadoop we will not use SSH, instead initially itself we will create password less SSH. Which means we avoid ssh login as Hadoop needs to login multiple time in its operations. Excel + Tableau: A beautiful partnership. Combine Excel with Tableau’s analytical power to transform your raw data into actionable insights. #Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X AZURE.SELECTOR Windows; Linux, Unix, OS X; Secure Shell (SSH) allows you to remotely perform operations on your Linux-based HDInsight clusters using a command-line interface.

Learn how to use Secure Shell (SSH) to securely connect to Apache Hadoop on Azure HDInsight. For information on connecting through a virtual network, see Azure HDInsight virtual network architecture and Plan a virtual network deployment for Azure HDInsight clusters.

The following table contains the address and port information needed when connecting to HDInsight using an SSH client:

AddressPortConnects to..
<clustername>-ssh.azurehdinsight.net22Primary headnode
<clustername>-ssh.azurehdinsight.net23Secondary headnode
<clustername>-ed-ssh.azurehdinsight.net22edge node (ML Services on HDInsight)
<edgenodename>.<clustername>-ssh.azurehdinsight.net22edge node (any other cluster type, if an edge node exists)

Replace <clustername> with the name of your cluster. Replace <edgenodename> with the name of the edge node.

If your cluster contains an edge node, we recommend that you always connect to the edge node using SSH. The head nodes host services that are critical to the health of Hadoop. The edge node runs only what you put on it. For more information on using edge nodes, see Use edge nodes in HDInsight.

Tip

When you first connect to HDInsight, your SSH client may display a warning that the authenticity of the host can't be established. When prompted select 'yes' to add the host to your SSH client's trusted server list.

If you have previously connected to a server with the same name, you may receive a warning that the stored host key does not match the host key of the server. Consult the documentation for your SSH client on how to remove the existing entry for the server name.

SSH clients

Linux, Unix, and macOS systems provide the ssh and scp commands. The ssh client is commonly used to create a remote command-line session with a Linux or Unix-based system. The scp client is used to securely copy files between your client and the remote system.

Microsoft Windows doesn't install any SSH clients by default. The ssh and scp clients are available for Windows through the following packages:

  • OpenSSH Client. This client is an optional feature introduced in the Windows 10 Fall Creators Update.

  • Bash on Ubuntu on Windows 10.

  • Azure Cloud Shell. The Cloud Shell provides a Bash environment in your browser.

  • Git.

There are also several graphical SSH clients, such as PuTTY and MobaXterm. While these clients can be used to connect to HDInsight, the process of connecting is different than using the ssh utility. For more information, see the documentation of the graphical client you're using.

Authentication: SSH Keys

SSH keys use public-key cryptography to authenticate SSH sessions. SSH keys are more secure than passwords, and provide an easy way to secure access to your Hadoop cluster.

If your SSH account is secured using a key, the client must provide the matching private key when you connect:

  • Most clients can be configured to use a default key. For example, the ssh client looks for a private key at ~/.ssh/id_rsa on Linux and Unix environments.

  • You can specify the path to a private key. With the ssh client, the -i parameter is used to specify the path to private key. For example, ssh -i ~/.ssh/id_rsa sshuser@myedge.mycluster-ssh.azurehdinsight.net.

  • If you have multiple private keys for use with different servers, consider using a utility such as ssh-agent (https://en.wikipedia.org/wiki/Ssh-agent). The ssh-agent utility can be used to automatically select the key to use when establishing an SSH session.

Important

If you secure your private key with a passphrase, you must enter the passphrase when using the key. Utilities such as ssh-agent can cache the password for your convenience.

Create an SSH key pair

Use the ssh-keygen command to create public and private key files. The following command generates a 2048-bit RSA key pair that can be used with HDInsight:

You're prompted for information during the key creation process. For example, where the keys are stored or whether to use a passphrase. After the process completes, two files are created; a public key and a private key.

  • The public key is used to create an HDInsight cluster. The public key has an extension of .pub.

  • The private key is used to authenticate your client to the HDInsight cluster.

Important

You can secure your keys using a passphrase. A passphrase is effectively a password on your private key. Even if someone obtains your private key, they must have the passphrase to use the key.

Create HDInsight using the public key

Creation methodHow to use the public key
Azure portalUncheck Use cluster login password for SSH, and then select Public Key as the SSH authentication type. Finally, select the public key file or paste the text contents of the file in the SSH public key field.
Azure PowerShellUse the -SshPublicKey parameter of the New-AzHdinsightCluster cmdlet and pass the contents of the public key as a string.
Azure CLIUse the --sshPublicKey parameter of the az hdinsight create command and pass the contents of the public key as a string.
Resource Manager TemplateFor an example of using SSH keys with a template, see Deploy HDInsight on Linux with SSH key. The publicKeys element in the azuredeploy.json file is used to pass the keys to Azure when creating the cluster.

Authentication: Password

SSH accounts can be secured using a password. When you connect to HDInsight using SSH, you're prompted to enter the password.

Warning

Microsoft does not recommend using password authentication for SSH. Passwords can be guessed and are vulnerable to brute force attacks. Instead, we recommend that you use SSH keys for authentication.

Important

The SSH account password expires 70 days after the HDInsight cluster is created. If your password expires, you can change it using the information in the Manage HDInsight document.

Create HDInsight using a password

Creation methodHow to specify the password
Azure portalBy default, the SSH user account has the same password as the cluster login account. To use a different password, uncheck Use cluster login password for SSH, and then enter the password in the SSH password field.
Azure PowerShellUse the --SshCredential parameter of the New-AzHdinsightCluster cmdlet and pass a PSCredential object that contains the SSH user account name and password.
Azure CLIUse the --ssh-password parameter of the az hdinsight create command and provide the password value.
Resource Manager TemplateFor an example of using a password with a template, see Deploy HDInsight on Linux with SSH password. The linuxOperatingSystemProfile element in the azuredeploy.json file is used to pass the SSH account name and password to Azure when creating the cluster.

Change the SSH password

For information on changing the SSH user account password, see the Change passwords section of the Manage HDInsight document.

Authentication domain joined HDInsight

If you're using a domain-joined HDInsight cluster, you must use the kinit command after connecting with SSH local user. This command prompts you for a domain user and password, and authenticates your session with the Azure Active Directory domain associated with the cluster.

You can also enable Kerberos Authentication on each domain joined node (for example, head node, edge node) to ssh using the domain account. To do this edit sshd config file:

uncomment and change KerberosAuthentication to yes

Use klist command to verify whether the Kerberos authentication was successful.

For more information, see Configure domain-joined HDInsight.

Connect to nodes

The head nodes and edge node (if there's one) can be accessed over the internet on ports 22 and 23.

  • When connecting to the head nodes, use port 22 to connect to the primary head node and port 23 to connect to the secondary head node. The fully qualified domain name to use is clustername-ssh.azurehdinsight.net, where clustername is the name of your cluster.

  • When connecting to the edge node, use port 22. The fully qualified domain name is edgenodename.clustername-ssh.azurehdinsight.net, where edgenodename is a name you provided when creating the edge node. clustername is the name of the cluster.

Important

The previous examples assume that you are using password authentication, or that certificate authentication is occurring automatically. If you use an SSH key-pair for authentication, and the certificate is not used automatically, use the -i parameter to specify the private key. For example, ssh -i ~/.ssh/mykey sshuser@clustername-ssh.azurehdinsight.net.

Once connected, the prompt changes to indicate the SSH user name and the node you're connected to. For example, when connected to the primary head node as sshuser, the prompt is sshuser@<active-headnode-name>:~$.

Connect to worker and Apache Zookeeper nodes

The worker nodes and Zookeeper nodes aren't directly accessible from the internet. They can be accessed from the cluster head nodes or edge nodes. The following are the general steps to connect to other nodes:

  1. Use SSH to connect to a head or edge node:

  2. From the SSH connection to the head or edge node, use the ssh command to connect to a worker node in the cluster:

    To retrieve a list of the node names, see the Manage HDInsight by using the Apache Ambari REST API document.

If the SSH account is secured using a password, enter the password when connecting.

If the SSH account is secured using SSH keys, make sure that SSH forwarding is enabled on the client.

Note

Another way to directly access all nodes in the cluster is to install HDInsight into an Azure Virtual Network. Then, you can join your remote machine to the same virtual network and directly access all nodes in the cluster.

For more information, see Plan a virtual network for HDInsight.

Configure SSH agent forwarding

Important

The following steps assume a Linux or UNIX-based system, and work with Bash on Windows 10. If these steps do not work for your system, you may need to consult the documentation for your SSH client.

  1. Using a text editor, open ~/.ssh/config. If this file doesn't exist, you can create it by entering touch ~/.ssh/config at a command line.

  2. Add the following text to the config file.

    Replace the Host information with the address of the node you connect to using SSH. The previous example uses the edge node. This entry configures SSH agent forwarding for the specified node.

  3. Test SSH agent forwarding by using the following command from the terminal:

    This command returns information similar to the following text:

    If nothing is returned, then ssh-agent isn't running. For more information, see the agent startup scripts information at Using ssh-agent with ssh (http://mah.everybody.org/docs/ssh) or consult your SSH client documentation.

  4. Once you've verified that ssh-agent is running, use the following to add your SSH private key to the agent:

    If your private key is stored in a different file, replace ~/.ssh/id_rsa with the path to the file.

  5. Connect to the cluster edge node or head nodes using SSH. Then use the SSH command to connect to a worker or zookeeper node. The connection is established using the forwarded key.

Copy files

The scp utility can be used to copy files to and from individual nodes in the cluster. For example, the following command copies the test.txt directory from the local system to the primary head node:

Since no path is specified after the :, the file is placed in the sshuser home directory.

The following example copies the test.txt file from the sshuser home directory on the primary head node to the local system:

Important

scp can only access the file system of individual nodes within the cluster. It cannot be used to access data in the HDFS-compatible storage for the cluster.

Use scp when you need to upload a resource for use from an SSH session. For example, upload a Python script and then run the script from an SSH session.

For information on directly loading data into the HDFS-compatible storage, see the following documents:

  • HDInsight using Azure Storage.

  • HDInsight using Azure Data Lake Storage.

Next steps

#Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X

[AZURE.SELECTOR]

Secure Shell (SSH) allows you to remotely perform operations on your Linux-based HDInsight clusters using a command-line interface. This document provides information on using SSH with HDInsight from Linux, Unix, or OS X clients.

[AZURE.NOTE] The steps in this article assume you are using a Linux, Unix, or OS X client. These steps may be performed on a Windows-based client if you have installed a package that provides ssh and ssh-keygen, such as Bash on Ubuntu on Windows.

If you do not have SSH installed on your Windows-based client, use the steps in Use SSH with Linux-based HDInsight (Hadoop) from Windows for information on installing and using PuTTY.

##Prerequisites

  • ssh-keygen and ssh for Linux, Unix, and OS X clients. This utilities are usually provided with your operating system, or available through the package management system.

  • A modern web browser that supports HTML5.

OR

  • Azure CLI.

    [AZURE.INCLUDE use-latest-version]

##What is SSH?

SSH is a utility for logging in to, and remotely executing, commands on a remote server. With Linux-based HDInsight, SSH establishes an encrypted connection to the cluster headnode and provides a command line that you use to type in commands. Commands are then executed directly on the server.

###SSH user name

An SSH user name is the name you use to authenticate to the HDInsight cluster. When you specify an SSH user name during cluster creation, this user is created on all nodes in the cluster. Once the cluster is created, you can use this user name to connect to the HDInsight cluster headnodes. From the headnodes, you can then connect to the individual worker nodes.

###SSH password or Public key

An SSH user can use either a password or public key for authentication. A password is just a string of text you make up, while a public key is part of a cryptographic key pair generated to uniquely identify you.

A key is more secure than a password, however it requires additional steps to generate the key and you must maintain the files containing the key in a secure location. If anyone gains access to the key files, they gain access to your account. Or if you lose the key files, you will not be able to login to your account.

A key pair consists of a public key (which is sent to the HDInsight server,) and a private key (which is kept on your client machine.) When you connect to the HDInsight server using SSH, the SSH client will use the private key on your machine to authenticate with the server.

##Create an SSH key

Use the following information if you plan on using SSH keys with your cluster. If you plan on using a password, you can skip this section.

  1. Open a terminal session and use the following command to see if you have any existing SSH keys:

    Look for the following files in the directory listing. These are common names for public SSH keys.

    • id_dsa.pub
    • id_ecdsa.pub
    • id_ed25519.pub
    • id_rsa.pub
  2. If you do not want to use an existing file, or you have no existing SSH keys, use the following to generate a new file:

    You will be prompted for the following information:

    • The file location - The location defaults to ~/.ssh/id_rsa.

    • A passphrase - You will be prompted to re-enter this.

      [AZURE.NOTE] We strongly recommend that you use a secure passphrase for the key. However, if you forget the passphrase, there is no way to recover it.

    After the command finishes, you will have two new files, the private key (for example, id_rsa) and the public key (for example, id_rsa.pub).

/fifa-14-serial-key-generator-pc.html. ##Create a Linux-based HDInsight cluster

When creating a Linux-based HDInsight cluster, you must provide the public key created previously. From Linux, Unix, or OS X clients, there are two ways to create an HDInsight cluster:

  • Azure Portal - Uses a web-based portal to create the cluster.

  • Azure CLI for Mac, Linux and Windows - Uses command-line commands to create the cluster.

Each of these methods will require either a password or a public key. For complete information on creating a Linux-based HDInsight cluster, see Provision Linux-based HDInsight clusters.

###Azure Portal

When using the Azure Portal to create a Linux-based HDInsight cluster, you must enter an SSH USER NAME, and select to enter a PASSWORD or SSH PUBLIC KEY.

If you select SSH PUBLIC KEY, you can either paste the public key (contained in the file with the .pub extension) into the SSH PublicKey field, or select Select a file to browse and select the public key file.

[AZURE.NOTE] The key file is simply a text file. The contents should appear similar to the following:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCelfkjrpYHYiks4TM+r1LVsTYQ4jAXXGeOAF9Vv/KGz90pgMk3VRJk4PEUSELfXKxP3NtsVwLVPN1l09utI/tKHQ6WL3qy89WVVVLiwzL7tfJ2B08Gmcw8mC/YoieT/YG+4I4oAgPEmim+6/F9S0lU2I2CuFBX9JzauX8n1Y9kWzTARST+ERx2hysyA5ObLv97Xe4C2CQvGE01LGAXkw2ffP9vI+emUM+VeYrf0q3w/b1o/COKbFVZ2IpEcJ8G2SLlNsHWXofWhOKQRi64TMxT7LLoohD61q2aWNKdaE4oQdiuo8TGnt4zWLEPjzjIYIEIZGk00HiQD+KCB5pxoVtp user@system

This creates a login for the specified user, by using the password or public key you provide.

###Azure Command-Line Interface for Mac, Linux and Windows

You can use the Azure CLI for Mac, Linux and Windows to create a new cluster by using the azure hdinsight cluster create command.

For more information on using this command, see Provision Hadoop Linux clusters in HDInsight using custom options.

##Connect to a Linux-based HDInsight cluster

From a terminal session, use the SSH command to connect to the cluster headnode by providing the address and user name:

  • SSH address - There are two addresses that may be used to connect to a cluster using SSH:

    • Connect to the headnode: The cluster name, followed by -ssh.azurehdinsight.net. For example, mycluster-ssh.azurehdinsight.net.

    • Connect to the edge node: If your cluster is R Server on HDInsight, the cluster will also contain an edge node that can be accessed using RServer.CLUSTERNAME.ssh.azurehdinsight.net, where CLUSTERNAME is the name of the cluster.

  • User name - The SSH user name you provided when you created the cluster.

The following example will connect to the primary headnode of mycluster as the user me:

If you used a password for the user account, you will be prompted to enter the password.

If you used an SSH key that is secured with a passphrase, you will be prompted to enter the passphrase. Otherwise, SSH will attempt to automatically authenticate by using one of the local private keys on your client.

[AZURE.NOTE] If SSH does not automatically authenticate with the correct private key, use the -i parameter and specify the path to the private key. The following example will load the private key from ~/.ssh/id_rsa:

ssh -i ~/.ssh/id_rsa me@mycluster-ssh.azurehdinsight.net

If you are connecting to using the address for the headnode, and no port is specified, SSH will default to port 22, which will connect to the primary headnode on the HDInsight cluster. If you use port 23, you will connect to the secondary. For more information on the headnodes, see Availability and reliability of Hadoop clusters in HDInsight.

###Connect to worker nodes

The worker nodes are not directly accessible from outside the Azure datacenter, but they can be accessed from the cluster headnode via SSH.

If you use an SSH key to authenticate your user account, you must complete the following steps on your client:

  1. Using a text editor, open ~/.ssh/config. If this file doesn't exist, you can create it by entering touch ~/.ssh/config in the terminal.

  2. Add the following to the file. Replace CLUSTERNAME with the name of your HDInsight cluster.

    This configures SSH agent forwarding for your HDInsight cluster.

  3. Test SSH agent forwarding by using the following command from the terminal:

    This should return information similar to the following:

    If nothing is returned, this indicates that ssh-agent is not running. Consult your operating system documentation for specific steps on installing and configuring ssh-agent, or see Using ssh-agent with ssh.

  4. Once you have verified that ssh-agent is running, use the following to add your SSH private key to the agent:

    If your private key is stored in a different file, replace ~/.ssh/id_rsa with the path to the file.

Use the following steps to connect to the worker nodes for your cluster.

[AZURE.IMPORTANT] If you use an SSH key to authenticate your account, you must complete the previous steps to verify that agent forwarding is working.

  1. Connect to the HDInsight cluster by using SSH as described previously.

  2. Once you are connected, use the following to retrieve a list of the nodes in your cluster. Replace ADMINPASSWORD with the password for your cluster admin account. Replace CLUSTERNAME with the name of your cluster.

    This will return information in JSON format for the nodes in the cluster, including host_name, which contains the fully qualified domain name (FQDN) for each node. The following is an example of a host_name entry returned by the curl command:

  3. Once you have a list of the worker nodes you want to connect to, use the following command from the SSH session to the server to open a connection to a worker node:

    Replace USERNAME with your SSH user name and FQDN with the FQDN for the worker node. For example, workernode0.workernode-0-e2f35e63355b4f15a31c460b6d4e1230.j1.internal.cloudapp.net.

    [AZURE.NOTE] If you use a password to authentication your SSH session, you will be prompted to enter the password again. If you use an SSH key, the connection should finish without any prompts.

  4. Once the session has been established, the terminal prompt will change from username@hn#-clustername to username@wk#-clustername to indicate that you are connected to the worker node. Any commands you run at this point will run on the worker node.

  5. Once you have finished performing actions on the worker node, use the exit command to close the session to the worker node. This will return you to the username@hn#-clustername prompt.

Hadoop Command Line

Connect to a Domain-joined HDInsight cluster

Domain-joined HDInsight integrates Kerberos with Hadoop in HDInsight. Because the SSH user is not an Active Direcotry domain user, this user account cannot run Hadoop commands from SSH shell on a domain-joined cluster directly. You must run kinit first.

To run Hive queries on a Domain-joined HDInsight cluster using SSH

  1. Connect to a Domain-joined HDInsight cluster using SSH. For instrocutions, see Connect to a Linux-based HDInsight cluster.

  2. Run kinit. It will ask you for a domain user name and domain user password. For more information on configure domain users for domain-joined HDInsight clusters, see Configure Domain-joined HDInisight clusters.

  3. Open the Hive console by enter:

    Then you can run Hive commands.

##Add more accounts

Generate Ssh Key Windows

  1. Generate a new public key and private key for the new user account, as described in the Create an SSH key section.

    [AZURE.NOTE] The private key should either be generated on a client that the user will use to connect to the cluster, or securely transferred to such a client after creation.

  2. From an SSH session to the cluster, add the new user with the following command:

    This will create a new user account, but will disable password authentication.

  3. Create the directory and files to hold the key by using the following commands:

  4. When the nano editor opens, copy and paste in the contents of the public key for the new user account. Finally, use Ctrl-X to save the file and exit the editor.

  5. Use the following command to change ownership of the .ssh folder and contents to the new user account:

  6. You should now be able to authenticate to the server with the new user account and private key.

Command To Generate Ssh Key In Hadoop Data

##SSH tunneling

Hadoop Commands In Unix

SSH can be used to tunnel local requests, such as web requests, to the HDInsight cluster. The request will then be routed to the requested resource as if it had originated on the HDInsight cluster headnode.

[AZURE.IMPORTANT] An SSH tunnel is a requirement for accessing the web UI for some Hadoop services. For example, both the Job History UI or Resource Manager UI can only be accessed using an SSH tunnel.

For more information on creating and using an SSH tunnel, see Use SSH Tunneling to access Ambari web UI, ResourceManager, JobHistory, NameNode, Oozie, and other web UI's.

##Next steps

Now that you understand how to authenticate by using an SSH key, learn how to use MapReduce with Hadoop on HDInsight.