Digital Hive Architecture Overview

Digital Hive Architecture Overview

Overview

Digital Hive is a lightweight, web-based application that is both easy to install as well as to maintain and manage. Other than the Microsoft C++ redistributable (https://www.microsoft.com/en-ca/download/details.aspx?id=40784), there are no software prerequisites. As part of the installation process, the Digital Hive installer manages all the required components.

 

The Digital Hive solution is comprised of four main components:

·      Tomcat

·      PostgreSql (Digital Hive content store)

·      Elasticsearch (Digital Hive Search engine)

·      PredictionIO (Digital Hive Machine Learning engine)

 

From a hardware requirements perspective, the minimum software requirements are straightforward:

·      Linux or Windows Operating Systems

·      Minimum 16GB of RAM

·      4 CPUs

·      At least 100GB of available disk space (additional disk space may be required over time as the application grows)

 

Digital Hive is designed to satisfy various architectural requirements, as well as scale to meet high volumes of users.

Single Server Installation

A single server (default full installation) is the easiest deployment method. Even with the minimum system requirements, hundreds of users can be supported.

 

 


The single server deployment comprises of the Tomcat application, a single PostgreSQL instance, a Machine Learning engine that drives the user recommendations, and two Elasticsearch nodes that handle the indexing and search operations within Digital Hive. Note, no BI or reporting data is ever persisted within the Digital Hive content store, so there are no special, or additional, requirements around data security necessary.

 

All browser-based communication between users/admins and Digital Hive is secured via https. There is a default self-signed certificate as part of the installation, and it is recommended that a proper domain-based SSL certificate be leveraged post installation. Communication between Digital Hive and the content systems, whether in the Cloud or on premises, can also be configured to leverage SSL.

 

From an authentication perspective, Digital Hive can operate in a few different modes. The first is the ability to maintain application specific user accounts, and groups, that are persisted, in an encrypted format, within the PostgreSQL content store. There is also the ability to leverage existing Active Directory users and groups, or to integrate with a SAML provider. Through these authentication modes, both basic authentication (username / password) as well as Single Sign On (SSO) is possible.

Distributed Installation

The Digital Hive architecture is flexible enough to handle the distribution of components across numerous servers, to achieve different objectives. For redundancy, or high availability, the following installation pattern could be utilized.

 

 


The diagram above demonstrates a fault tolerant deployment type that comprises of two full installations of Digital Hive. Each server has Tomcat, the Machine Learning engine, multiple Elasticsearch nodes, as well as a PostgreSQL instance. The difference here is that one PostgreSQL instance serves as a primary with the other instance running in standby mode in case the first instance becomes unresponsive. Similar to the single server deployment model, no proprietary reporting data is ever persisted within either PostgreSQL instance.

 

This diagram also represents a possible Cloud deployment and has an additional VPN component included in case access to an on-premise content system, residing behind a firewall, was required. For Cloud deployments, to reduce management efforts, Virtual Machine images, or Kubernetes ‘containers’ could be created for the automation of scaling activities.

 

Like the single server installation, SSL communication is enabled, but to achieve the scalability, this type of installation would require the addition of a Load Balancer between the users and the Digital Hive servers.

Cloud Deployment

Digital Hive was built with flexibility in mind, so no specific Cloud technologies were leveraged so installation can occur in any Cloud hosting solution. The two Cloud hosting platforms, that are used for Development, testing, as well as internal Production workloads, at Digital Hive are Microsoft Azure and Amazon AWS.

 

The VM instances that are typically used for the installation of Digital Hive are the Azure Standard D4s v3 and AWS t3.xlarge instance types.

 

There are no specific considerations that need to be made for installing Digital Hive on the Cloud, except there will have to be a rule established on any firewalls to allow incoming traffic on port 9443 (which is the default port that Digital Hive operates)

Required Port Numbers for Installation

The Digital Hive installation will require that certain ports be available for accessing the user interface, as well as for the internal communication that need to occur between the various Digital Hive components. This is the list of default ports that are utilized:

 

·      Application:                        9443, 9080, 9006, 9010

·      Content Store:                   5432

·      Search:                                 9200, 9201, 9400, & 9401

·      Message Queue:               61616

·      Machine Learning:           7070 & 8000

 

With no configuration changes, these will be the default required ports, but the port numbers utilized can be customized.

Supported Content Connectors

Out of the box, Digital Hive supports a variety of different connectors that enable communication between the Digital Hive server and various, market leading, business systems.

  

Business Intelligence Platforms

·      IBM Cognos Analytics

·      Qlik View

·      IBM Planning Analytics

·      Salesforce CRM

·      Looker

·      SAP Analytics Cloud

·      Microsoft PowerBI

·      Microsoft SSRS

·      MicroStrategy

·      Oracle BI (OBIEE)

·      Qlik Sense

      ·    Qlik Cloud (2024.2)

·      SAP Business Objects

·      Splunk

·      Tableau

·      ThoughtSpot

·      TIBCO Spotfire

      ·    Good Data (2024.3)

 

 

Content Systems

·      Box

·      Microsoft SharePoint

·      File System

·      Microsoft SharePoint Online

·      Google Drive

 

·      Microsoft OneDrive

 

 

For scenarios where a connector is required that isn’t available out of the box, Digital Hive also enables organizations to build their own connectors, using the Connector SDK, and plug the new system into the Digital Hive architecture.


    • Related Articles

    • Digital Hive Installation and Configuration Overview

      Overview This article serves as a collection of all the steps required to install and configure the Digital Hive solution. Prerequisites Digital Hive can be installed on either Windows or Linux based systems. From a server sizing perspective, the ...
    • Discoverable Search Feature Overview

      Overview By default, the only content that appears for a user in Digital Hive is the content that the user would see if they logged into the BI system(s) as themselves. In other words, Digital Hive will not expose content to a user that they are not ...
    • Digital Hive Frequently Asked Questions (FAQ)

      Installation Can Digital Hive be installed on premise or in the Cloud? Digital Hive can be installed using either on-premise hardware, in a Private Cloud, or a hybrid architecture that leverages both Cloud and on-premise. There is currently no ...
    • Installing Digital Hive 2024.3 on Linux

      Overview The 2024.3 Digital Hive release brings a lot of new capabilities to the market. In order to deliver some of the new features, changes to the underlying technology stack that underpins the Digital Hive solution had to be made. The 2024.3 ...
    • Custom Sort Order for Digital Hive Folders

      Overview As of the 2024.3 Digital Hive release, it is now possible to create a custom sort order for contents within a Digital Hive folder. There are certain scenarios where alphabetical or time based sorting is not sufficient and a custom sort order ...