Digital Hive is a lightweight, web-based application that is both easy to install as well as to maintain and manage. Other than the Microsoft C++ redistributable (https://www.microsoft.com/en-ca/download/details.aspx?id=40784), there are no software prerequisites. As part of the installation process, the Digital Hive installer manages all the required components.
The Digital Hive solution is comprised of four main components:
· Tomcat
· PostgreSql (Digital Hive content store)
· Elasticsearch (Digital Hive Search engine)
· PredictionIO (Digital Hive Machine Learning engine)
From a hardware requirements perspective, the minimum software requirements are straightforward:
· Linux or Windows Operating Systems
· Minimum 16GB of RAM
· 4 CPUs
· At least 100GB of available disk space (additional disk space may be required over time as the application grows)
Digital Hive is designed to satisfy various architectural requirements, as well as scale to meet high volumes of users.
A single server (default full installation) is the easiest deployment method. Even with the minimum system requirements, hundreds of users can be supported.
The single server deployment comprises of the Tomcat application, a single PostgreSQL instance, a Machine Learning engine that drives the user recommendations, and two Elasticsearch nodes that handle the indexing and search operations within Digital Hive. Note, no BI or reporting data is ever persisted within the Digital Hive content store, so there are no special, or additional, requirements around data security necessary.
All browser-based communication between users/admins and Digital Hive is secured via https. There is a default self-signed certificate as part of the installation, and it is recommended that a proper domain-based SSL certificate be leveraged post installation. Communication between Digital Hive and the content systems, whether in the Cloud or on premises, can also be configured to leverage SSL.
From an authentication perspective, Digital Hive can operate in a few different modes. The first is the ability to maintain application specific user accounts, and groups, that are persisted, in an encrypted format, within the PostgreSQL content store. There is also the ability to leverage existing Active Directory users and groups, or to integrate with a SAML provider. Through these authentication modes, both basic authentication (username / password) as well as Single Sign On (SSO) is possible.
The Digital Hive architecture is flexible enough to handle the distribution of components across numerous servers, to achieve different objectives. For redundancy, or high availability, the following installation pattern could be utilized.
The diagram above demonstrates a fault tolerant deployment type that comprises of two full installations of Digital Hive. Each server has Tomcat, the Machine Learning engine, multiple Elasticsearch nodes, as well as a PostgreSQL instance. The difference here is that one PostgreSQL instance serves as a primary with the other instance running in standby mode in case the first instance becomes unresponsive. Similar to the single server deployment model, no proprietary reporting data is ever persisted within either PostgreSQL instance.
This diagram also represents a possible Cloud deployment and has an additional VPN component included in case access to an on-premise content system, residing behind a firewall, was required. For Cloud deployments, to reduce management efforts, Virtual Machine images, or Kubernetes ‘containers’ could be created for the automation of scaling activities.
Like the single server installation, SSL communication is enabled, but to achieve the scalability, this type of installation would require the addition of a Load Balancer between the users and the Digital Hive servers.
Digital Hive was built with flexibility in mind, so no specific Cloud technologies were leveraged so installation can occur in any Cloud hosting solution. The two Cloud hosting platforms, that are used for Development, testing, as well as internal Production workloads, at Digital Hive are Microsoft Azure and Amazon AWS.
The VM instances that are typically used for the installation of Digital Hive are the Azure Standard D4s v3 and AWS t3.xlarge instance types.
There are no specific considerations that need to be made for installing Digital Hive on the Cloud, except there will have to be a rule established on any firewalls to allow incoming traffic on port 9443 (which is the default port that Digital Hive operates)
The Digital Hive installation will require that certain ports be available for accessing the user interface, as well as for the internal communication that need to occur between the various Digital Hive components. This is the list of default ports that are utilized:
· Application: 9443, 9080, 9006, 9010
· Content Store: 5432
· Search: 9200, 9201, 9400, & 9401
· Message Queue: 61616
· Machine Learning: 7070 & 8000
With no configuration changes, these will be the default required ports, but the port numbers utilized can be customized.
Out of the box, Digital Hive supports a variety of different connectors that enable communication between the Digital Hive server and various, market leading, business systems.
Business Intelligence Platforms
· IBM Cognos Analytics | · Qlik View |
· IBM Planning Analytics | · Salesforce CRM |
· Looker | · SAP Analytics Cloud |
· Microsoft PowerBI · Microsoft SSRS · MicroStrategy · Oracle BI (OBIEE) · Qlik Sense · Qlik Cloud (2024.2) | · SAP Business Objects · Splunk · Tableau · ThoughtSpot · TIBCO Spotfire · Good Data (2024.3) |
|
|
Content Systems
· Box | · Microsoft SharePoint |
· File System | · Microsoft SharePoint Online |
· Google Drive
| · Microsoft OneDrive |
|
|
For scenarios where a connector is required that isn’t available out of the box, Digital Hive also enables organizations to build their own connectors, using the Connector SDK, and plug the new system into the Digital Hive architecture.