Big Data Hadoop Ecosystems
Lab #1 Setup and General Notes
Dr. Gasan Elkhodari
Installing Hadoop VM on your laptop ( Windows users) Hardware Requirements: 64 bit OS, Windows Laptop with SSD, with 50 Gb of free space and at least 8GB of memory. Hadoop/Linux sandbox requires at least 8 Gb of memory to run correctly.
Windows 10 1. Download the VM Sandbox image (the executable file) from:
https://ucumberlands.box.com/s/kk6a8mcqvupq6durdnxy7dc85squoji9
2. Download the VMware station player (Free license for individual use) from: https://www.vmware.com/products/workstation-player/workstation-player-evaluation.html
3. VMware station player installation: Play the below video and follow the configuration instructions. Don’t download the products mentioned in the video. The focus is on the configuration steps for VMware station.
https://www.youtube.com/watch?v=4XBXJpYPkUk
4. Start the VM
https://ucumberlands.box.com/s/kk6a8mcqvupq6durdnxy7dc85squoji9
https://www.vmware.com/products/workstation-player/workstation-player-evaluation.html
https://www.youtube.com/watch?v=4XBXJpYPkUk
Installing Hadoop VM on your laptop ( Mac users)
1. Download the VM Sandbox image (the executable file) from: https://ucumberlands.box.com/s/kk6a8mcqvupq6durdnxy7dc85squoji9
2. Download the Virtualbox from: https://www.virtualbox.org/wiki/Downloads
3. Install Virtualbox. Play the below video and follow the configuration instructions. Don’t download the products mentioned at the video. The focus is on the configuration steps for VMware station. https://www.youtube.com/watch?v=BeCtjd86YXo
4. Start the VM
https://ucumberlands.box.com/s/kk6a8mcqvupq6durdnxy7dc85squoji9
https://www.virtualbox.org/wiki/Downloads
https://www.youtube.com/watch?v=BeCtjd86YXo
Lab #1 – General Note
This Lab uses the a Virtual Machine running the CentOS Linux distribution. This VM has CDH
(Cloudera’s Distribution, including Apache Hadoop) installed in Pseudo-Distributed mode. Pseudo-
Distributed mode is a method of running Hadoop whereby all Hadoop daemons run on the same
machine. It is, essentially, a cluster consisting of a single machine. It works just like a larger Hadoop
cluster, the only difference (apart from speed, of course!) being that the block replication factor is
set to 1, since there is only a single Data Node available.
Lab#1 – HDFS Setup
Enable services and set up any data required for the course. You must run this script before starting the Lab.
$ $DEV1/scripts/training_setup_dev1.sh
Lab#1 HDFS Setup - Continue
Lab#1 – Access HDFS with Command Line
• Assignment
1) Move the data folder “KB” that is under the location “/home/training/training_materials/data” to the Hadoop file system /loudacre
Hints:
• Use ‘hdfs dfs -mkdir’ command to create a new directory ‘/loudacre’ in the HDFS file system
• Use ‘hdfs dfs –put’ command line to move the data from the local Linux file system into HDFS
file system
• Use ‘hdfs dfs –cat’ to view the data you just moved into HDFS
• Output View one the files you just moved by ’hdfs dfs –cat’, take screenshot and upload it in the designated assignment folder.
Example:
Lab#1 – Access HDFS with Command Line