Digital Hive Architecture Overview

Digital Hive Architecture Overview

Overview

Digital Hive is a lightweight, web-based application that is both easy to install as well as to maintain and manage. Other than the Microsoft C++ redistributable (https://www.microsoft.com/en-ca/download/details.aspx?id=40784), there are no software prerequisites. As part of the installation process, the Digital Hive installer manages all the required components.

 

The Digital Hive solution is comprised of four main components:

·      Tomcat

·      PostgreSql (Digital Hive content store)

·      Elasticsearch (Digital Hive Search engine)

·      PredictionIO (Digital Hive Machine Learning engine)

 

From a hardware requirements perspective, the minimum software requirements are straightforward:

·      Linux or Windows Operating Systems

·      Minimum 16GB of RAM

·      4 CPUs

·      At least 100GB of available disk space (additional disk space may be required over time as the application grows)

 

Digital Hive is designed to satisfy various architectural requirements, as well as scale to meet high volumes of users.

Single Server Installation

A single server (default full installation) is the easiest deployment method. Even with the minimum system requirements, hundreds of users can be supported.

 

 


The single server deployment comprises of the Tomcat application, a single PostgreSQL instance, a Machine Learning engine that drives the user recommendations, and two Elasticsearch nodes that handle the indexing and search operations within Digital Hive. Note, no BI or reporting data is ever persisted within the Digital Hive content store, so there are no special, or additional, requirements around data security necessary.

 

All browser-based communication between users/admins and Digital Hive is secured via https. There is a default self-signed certificate as part of the installation, and it is recommended that a proper domain-based SSL certificate be leveraged post installation. Communication between Digital Hive and the content systems, whether in the Cloud or on premises, can also be configured to leverage SSL.

 

From an authentication perspective, Digital Hive can operate in a few different modes. The first is the ability to maintain application specific user accounts, and groups, that are persisted, in an encrypted format, within the PostgreSQL content store. There is also the ability to leverage existing Active Directory users and groups, or to integrate with a SAML provider. Through these authentication modes, both basic authentication (username / password) as well as Single Sign On (SSO) is possible.

Distributed Installation

The Digital Hive architecture is flexible enough to handle the distribution of components across numerous servers, to achieve different objectives. For redundancy, or high availability, the following installation pattern could be utilized.

 

 


The diagram above demonstrates a fault tolerant deployment type that comprises of two full installations of Digital Hive. Each server has Tomcat, the Machine Learning engine, multiple Elasticsearch nodes, as well as a PostgreSQL instance. The difference here is that one PostgreSQL instance serves as a primary with the other instance running in standby mode in case the first instance becomes unresponsive. Similar to the single server deployment model, no proprietary reporting data is ever persisted within either PostgreSQL instance.

 

This diagram also represents a possible Cloud deployment and has an additional VPN component included in case access to an on-premise content system, residing behind a firewall, was required. For Cloud deployments, to reduce management efforts, Virtual Machine images, or Kubernetes ‘containers’ could be created for the automation of scaling activities.

 

Like the single server installation, SSL communication is enabled, but to achieve the scalability, this type of installation would require the addition of a Load Balancer between the users and the Digital Hive servers.

Cloud Deployment

Digital Hive was built with flexibility in mind, so no specific Cloud technologies were leveraged so installation can occur in any Cloud hosting solution. The two Cloud hosting platforms, that are used for Development, testing, as well as internal Production workloads, at Digital Hive are Microsoft Azure and Amazon AWS.

 

The VM instances that are typically used for the installation of Digital Hive are the Azure Standard D4s v3 and AWS t3.xlarge instance types.

 

There are no specific considerations that need to be made for installing Digital Hive on the Cloud, except there will have to be a rule established on any firewalls to allow incoming traffic on port 9443 (which is the default port that Digital Hive operates)

Required Port Numbers for Installation

The Digital Hive installation will require that certain ports be available for accessing the user interface, as well as for the internal communication that need to occur between the various Digital Hive components. This is the list of default ports that are utilized:

 

·      Application:                        9443, 9080, 9006, 9010

·      Content Store:                   5432

·      Search:                                 9200, 9201, 9400, & 9401

·      Message Queue:               61616

·      Machine Learning:           7070 & 8000

 

With no configuration changes, these will be the default required ports, but the port numbers utilized can be customized.

Supported Content Connectors

Out of the box, Digital Hive supports a variety of different connectors that enable communication between the Digital Hive server and various, market leading, business systems.

  

Business Intelligence Platforms

·      IBM Cognos Analytics

·      Qlik View

·      IBM Planning Analytics

·      Salesforce CRM

·      Looker

·      SAP Analytics Cloud

·      Microsoft PowerBI

·      Microsoft SSRS

·      MicroStrategy

·      Oracle BI (OBIEE)

·      Qlik Sense

·      SAP Business Objects

·      Splunk

·      Tableau

·      ThoughtSpot

·      TIBCO Spotfire

 

 

Content Systems

·      Box

·      Microsoft SharePoint

·      File System

·      Microsoft SharePoint Online

·      Google Drive

 

·      Microsoft OneDrive

 

 

For scenarios where a connector is required that isn’t available out of the box, Digital Hive also enables organizations to build their own connectors, using the Connector SDK, and plug the new system into the Digital Hive architecture.


    • Related Articles

    • Digital Hive Frequently Asked Questions (FAQ)

      Installation Can Digital Hive be installed on premise or in the Cloud? Digital Hive can be installed using either on-premise hardware, in a Private Cloud, or a hybrid architecture that leverages both Cloud and on-premise. There is currently no ...
    • Digital Hive Load Balancing

      Question Due to the number of users and the critical nature of the BI reports, it is desirable to introduce more capacity into the Digital Hive environment. Can Digital Hive be installed across multiple servers? Answer Digital Hive can be installed ...
    • Digital Hive Auditing Explained

      Question The Digital Hive Control Center doesn't seem to provide any reports or views around usage of the platform. Is there a way to create reports to show metrics like the amount of reports executed by platform, the number of searches being made, ...
    • Digital Hive Logging

      Question Is there any logging available to help troubleshoot issues? Answer Digital Hive has several types of log files available for troubleshooting issues. Here are the various Digital Hive log files and what they contain. Theia Log This is the ...
    • Digital Hive Designer Elements Used for Authoring Hives

      Overview Digital Hive provides a very robust, zero code, authoring experience that enables authors to create visually stunning and impactful end user applications, called Hives. As part of the authoring process, there are different elements that can ...