When playing around with distributed technologies like hadoop or databases, it becomes at some point important to have a number of machines available to perform tests in a truly distributed environment. In this note I am going to explain how to setup a virtual cluster using virtual box, so that you can simulate such an environment on a single laptop.
Each host has two network interfaces.
Network Setup Sketch:
vboxnet0 inet
[GW]
HOST: |
[wlan0|auto,NAT]----------+
[vboxnet0|.1]-----+ |
| |
VLB1: | |
[eth0|auto]-------|-------+
[eth1|.101]-------+ |
| |
VLB2: | |
[eth0|auto]-------|-------+
[eth1|.102]-------+
openssh, emacs
)/etc/network/interfaces
.Setup ssh access.
I like to be able to get a remote shell by simply typing ssh <hostname>
.
Use .ssh/config
to set the default user name (user), and copy the ssh key
as described e.g. at https://help.ubuntu.com/community/SSH/OpenSSH/
cat ~/.ssh/id_rsa.pub | ssh VLB "cat >> .ssh/authorized_keys"
Excerpt from /etc/network/interfaces
on the virtual hosts:
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet dhcp
Excerpt from ~/.ssh/config
on the hosts:
Host VLB
HostName 192.168.56.101
User user
Before shutting down the root VM for cloning execute the following command on the shell:
sudo rm -f /etc/udev/rules.d/70-persistent-net.rules
This will erase the network card configuration. Now shutdown the VM and clone the virtual machine in virtual box. Select ‘Reinitialize the MAC addresses of all network cards’.
We need different mac addresses to have both cards in the same
network. As a result the linux kernel will detect the network cards as
new interfaces and give them new names (eth1
, eth2
) - and not be
automatically activated and configured on boot.
First adapt your ~/.ssh/config
and etc/hosts
to list both machines
as VLB1
and VLB2
. Then cofigure the remote hostnames:
echo "VLB1" | ssh VLB1 "cat | sudo tee /etc/hostname"
echo "127.0.0.1 VLB1" | ssh VLB1 "cat | sudo tee -a /etc/hosts"
Similarly for VLB2.
Remark: A drawback of this approach is that each time the 2nd command
is executed a new line is appended at to /etc/hosts. In particular the
command is not idempotent. An alternative variant would be to use
sed
for a global string replacement, which has similar issues. sed
s/VLB/VLB1/g
transforms VLB -> VLB1 -> VLB11
. Maybe sed
s/VLB$/VLB1/g
could work.
Also it would be nice to set /etc/hosts correctly on the remote, but this gets too far. It seems zookeper is the right tool for this kind of problems.
To see if all hosts are sucessfully connected to the network, run a ip-range scan. E.g.
nmap 192.168.56.*
You should see three machines at .1
, .101
, .102
.
Convenience script to start the whole “cluster” at once:
#!/bin/bash
nohup vboxheadless -s LinuxBox &
nohup vboxheadless -s LinuxBox_1 &