Hadoop is not the only software supported. HOD can also create HBase databases, IPython notebooks, and set up a Spark environment.
There are two main benefits:
- Users can run jobs on a traditional batch cluster. This is good for small to medium Hadoop jobs where the framework is used but having a 'big data' cluster isn't required. At this point the performance benefits of a parallel file system outweigh the 'share nothing' architecture of a HDFS style file system.
- Users from different groups can run whichever version of Hadoop they like. This removes the need for painful upgrades to running Yarn clusters and hoping all users' jobs are backwards compatible.
hod connect
command) where they can interact with their services.The prerequisites are:
- A cluster using Torque.
- environment-modules to manage the environment
- Python and various libraries.
- mpi4py
- eg. on fedora
yum install -y mpi4py-mpich2
- If you build this yourself, you will probably need to set the $MPICC environment variable.
- vsc-base - Used for command line parsing.
- vsc-mympirun - Used for setting up the MPI job.
- pbs_python - Used for interacting with the PBS (aka Torque) server.
- netifaces
- netaddr
- Java
- Oracle JDK or OpenJDK - both installable with Easybuild
- Hadoop binaries
- eg. the Cloudera distribution versions (used to test HOD)
http://hod.readthedocs.io/en/latest/
https://archive.fosdem.org/2016/schedule/event/hpc_bigdata_hanythingondemand/
No comments:
Post a Comment