Humio Self Hosted Installation

Today, I am going to install humio on our self-hosted server. This will be a single node setup.

Hardware Requirement

Hardware requirements are mostly based on how much data we will ingest and how many concurrent searches we will be running. Some things to remember.

You need to be able to hold 48 hours of compressed data in 80% of your RAM.
You want enough hyper-threads/vCPUs (each giving you 1GB/s search) to be able to search 24 hours of data in less than 10 seconds.
You need disk space to hold your compressed data. Never fill your disk more than 80%.
Separate disk for Kafka data (not done in this article). It will also be wise to setup alert when disks get greater than 80%.

For this article, we are going to use a 32GB RAM machine with 4 core. OS will be Oracle Linux 7. 200GB /data partition needed for log file location and single WEB LAN setup. 80GB on / partition. No swap space is needed. In addition to port 22 for SSH, the Humio node requires port 8080 opened to incoming traffic for handling requests by the web application (i.e., the Humio User Interface) and API.

Preparing the Humio Server

1. Humio needs to be able to keep a lot of files open for sockets and actual files from the file system.

create a file named 99-humio-limits.conf in the /etc/security/limits.d/ sub-directory. And add this lines

# Raise limits for files:
humio soft nofile 250000
humio hard nofile 250000

Create another file with a text editor, this time in the /etc/pam.d/ sub-directory, and name it common-session. Copy these lines into it:

# Apply limits:
session required pam_limits.so

2. User Kafka

To run Kafka component for humio server, I will create user kafka. This is a non-administrative user.

useradd -r kafka

3. User zookeeper

Kafka requires Zookeeper for coordination. So we will add a non-administrative zookeeper user.

useradd -r zookeeper

4. User humio

create a non-administrative user named, humio to run Humio software in the backgorund. You can do this by executing the following from the command-line:

useradd -r humio

Download Software

1. JDK11 – From oracle official JDK 11 site. At the time of this article, 11.0.11. version was available so we are going to use this version. Download the latest jdk-11.X.XX_linux-x64_bin.tar.gz from the site.

2. Kafka 2.l4.0 – Humio recommended version for Kafka is 2.4.0. Downloaded Kafka from the official site. For this article, I used kafka_2.12-2.4.0.tgz (Scala 2.12) which is recommended from Kafka official site.

3. Zookeper 3.4.x – Humio recommended version for Zookeeper is 3.4.x. From apache zookeeper archive we can get zookeeper 3.4.14 version. This will work fine.

4. Humio – From Humio’s official site download the software. It’s a pretty simple process. Humio gives a 30-day license initially after that you need to buy license from them. Also, the jar file can be downloaded from the repo.

Installing JDK on Humio server

copy jdk11 rpm file to humio server. In my case:

yum install jdk-11.0.11_linux-x64_bin.rpm

We will also change securerandom.source value to random to urandom in /usr/java/jdk-11.0.11/conf/security/java.security file

securerandom.source=file:/dev/./urandom

Installing Kafka

Humio uses Apache Kafka internally for queuing incoming messages and for storing shared state when running Humio in a cluster setup. Now, untar the kafka file to /opt directory and create log and data directory for kafka in /data/kafka partition. Follow this commands

tar zxf kafka_x.x.x.x.tgz mkdir -p /data/kafka/log/kafka
mkdir -p /data/kafka/kafka-data
chown -R kafka:kafka /data/kafka

ln -s /opt/kafka_x.x.x.x /opt/kafka

Now, edit kafka server.properties in kafka/config sub-directory. Replace the values of this properties as given below:

broker.id=1
log.dirs=/data/kafka/kafka-data delete.topic.enable = true

chown -R kafka:kafka /opt/kafka_x.x.x.x

Here, broker.id value to match the server number (myid) you set when configuring Zookeeper (next section).

Now, we will create a service file for starting Kafka. Create a file named, kafka.service in the /etc/systemd/system/ sub-directory. Then add the following lines to the service file:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
LimitNOFILE=800000
Environment="LOG_DIR=/data/kafka/log/kafka"
Environment="GC_LOG_ENABLED=true"
Environment="KAFKA_HEAP_OPTS=-Xms4G -Xmx8G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
Restart=on-failure

[Install]
WantedBy=multi-user.target

We haven’t created any zookeeper.service file yet. So if we try to run this service without installing and creating zookeeper service. It will fail.

Installing Zookeeper

Kafka requires Zookeeper for coordination. ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.

Humio documentation recommended 3.4.x version of Zookeeper. Now, untar the previosuly downloaded Zookeeper file and create a symbolic link to /opt/zookeeper like so:

tar -zxf zookeeper-x.x.x-bin.tar.gz
ln -s /opt/zookeeper-x.x.x-bin /opt/zookeeper

Create a data directory for Zookeeper in /data partition:

mkdir -p /data/zookeeper/data

Create the Zookeeper configuration file in the ./conf/ sub-directory. Name the file, zoo.cfg. Copy the lines below into that file:

tickTime = 2000
dataDir = /data/zookeeper/data
clientPort = 2181
initLimit = 5
syncLimit = 2
maxClientCnxns=60
autopurge.purgeInterval=1
admin.enableServer=false
4lw.commands.whitelist=*
server.1=<HOST_IP_ADDRESS>:2888:3888
admin.enableServer=false

server.1 address is marked bold because you need to replace the boldly marked address with the assigned IP of humio server.

Create a myid file in the data sub-directory with just the number 1 as its contents. cd to /opt/zookeeper, then you can start Zookeeper to verify that the configuration is working:

bash -c 'echo 1 > /data/zookeeper/data/myid'

Now, to see if the configurations are all ok start Zookeeper by logging in through the command line interface like so:

# ./bin/zkCli.sh
Connecting to localhost:2181
2021-05-03 01:57:29,917 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2021-05-03 01:57:29,920 [myid:] - INFO [main:Environment@100] - Client environment:host.name=ppshumio01.therap.net
2021-05-03 01:57:29,920 [myid:] - INFO [main:Environment@100] - Client environment:java.version=11.0.11
2021-05-03 01:57:29,921 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2021-05-03 01:57:29,921 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk-11.0.11
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:os.version=4.14.35-2047.502.4.1.el7uek.x86_64
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root
2021-05-03 01:57:29,922 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/opt/zookeeper-3.4.14
2021-05-03 01:57:29,923 [myid:] - INFO [main:ZooKeeper@442] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@7276c8cd
Welcome to ZooKeeper!

The results you see should look something like the above. To exit, hit Ctrl-c once the status is reported as connected.

There’s a little more configuring to do. Stop Zookeeper and change the ownership of the zookeeper directory like so, adjusting for the version number you installed:

./bin/zkServer.sh stop
chown -R zookeeper:zookeeper /opt/zookeeper-x.x.x
chown -R zookeeper:zookeeper /data/zookeeper/data

Create a Zookeeper service file named zookeeper.service in the /etc/systemd/system/ sub-directory. Use below lines:

[Unit]
Description=Zookeeper Daemon
Documentation=//zookeeper.apache.org
Requires=network.target
After=network.target

[Service]
Type=forking
WorkingDirectory=/opt/zookeeper
User=zookeeper
Group=zookeeper
ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfg
TimeoutSec=30
Restart=on-failure

[Install]
WantedBy=default.target

ZooKeeper is not a memory-intensive application when handling only data stored by Kafka. Zookeeper service doesn’t have the heap size setting. So it can use 25% of the system memory(up to 8GB). To limit, in zookeeper/conf/ directory create java.env file and add this line (limiting to 4GB)

export JVMFLAGS="-Xms4096m -Xmx4096m $JVMFLAGS"

Now you’re ready to start the Zookeeper service. Enter the first line below to start it. When it finishes, enter the second line to check that it’s running and there are no errors reported:

systemctl start zookeeper
systemctl status zookeeper

systemctl enable zookeeper

Now, zookeeper service is running properly. We can run kafka service without any issue. So

systemctl start kafka
systemctl status kafka

systemctl enable kafka

Install Humio

Next, create the Humio system directories and give the humio user ownership of them:

mkdir -p /opt/humio /etc/humio/vector /data/humio/log/humio /data/humio/data

chown -R humio:humio /opt/humio /etc/humio/vector
chown -R humio:humio /data/humio

Now, to install humio software Copy the server-x.x.x.jar file to /opt/humio directory.

cd /opt/humio/
ln -s /opt/humio/server-x.x.x.jar /opt/humio/server.jar

Adjust that line for the correct directory and file name, based on the version at the time. Also, we created /etc/vector directory to store

Now, create the Humio configuration file, server.conf in the /etc/humio directory. There are a few environment variables you will need to enter in this configuration file in order to run Humio on a single server or instance.

BOOTSTRAP_HOST_ID=1
DIRECTORY=/data/humio/data
HUMIO_AUDITLOG_DIR=/data/humio/log
HUMIO_DEBUGLOG_DIR=/data/humio/log
HUMIO_PORT=8080
ELASTIC_PORT=9200
ZOOKEEPER_URL=<IP_ADDRESS>:2181
KAFKA_SERVERS=<IP_ADDRESS>:9092
EXTERNAL_URL=//<HOSTNAME>.<DOMAIN>:8080
PUBLIC_URL=//<HOSTNAME>.<DOMAIN>:8080
HUMIO_SOCKET_BIND=0.0.0.0
HUMIO_HTTP_BIND=0.0.0.0

Creating humio.service in the /etc/systemd/system/ sub-directory. Add these lins to that file:

[Unit]
Description=Humio service
After=network.service

[Service]
Type=notify
Restart=on-abnormal
User=humio
Group=humio
LimitNOFILE=250000:250000
EnvironmentFile=/etc/humio/server.conf
WorkingDirectory=/data/humio
ExecStart=/usr/bin/java -server -XX:+UseParallelOldGC -Xms4G -Xmx4G -XX:MaxDirectMemorySize=8G -Xss2M --add-exports java.base/jdk.internal.util=ALL-UNNAMED -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc*,gc+jni=debug:file=/data/humio/gc/gc_humio.log:time,tags:filecount=5,filesize=102400 -Dhumio.auditlog.dir=/data/humio/log -Dhumio.debuglog.dir=/data/humio/log -jar /opt/humio/server.jar

[Install]
WantedBy=default.target

Change the ownership of the Humio files and start the Humio service. To change the ownership, execute the following two lines from the command-line:

chown -R humio:humio /opt/humio /etc/humio/
chown -R humio:humio /data/humio

Now, we will start humio with systemctl utility :

systemctl start humio

Just to be sure Humio is running and everything is fine, check it with the journalctl tool. You can do this by entering the following from the command-line:

journalctl -fu humio

If there are no errors, open a web browser and enter the domain name or IP address with port 8080. For example, you would enter something like //<HOSTNAME>.<DOMAIN>:8080 in the browser’s address field. You will see something like this:

Congratulations! You just installed humio on your local machine.

sihamsharif$ _