<iframe src=”Open Source Embedded Analytics for SaaS”> </iframe>

--

I have two questions I wanted to ask you.

Are the apps developed by your company deployed as software as a service (SaaS)? According to the Cisco Global Cloud Index (GCI) study, SaaS has been maintaining the majority share among global cloud services and will reach 75 percent of the total cloud workload by 2021 [1], so it is safe to say that a probability that your company deploys the applications as SaaS is very high.

And my second question is how do you access data captured in your application? In its ‘The Complete Guide to Embedded Analytics’ [2], Log Analytics offers four different models of offering analytics with Inline Analytics being the most popular choice for embedded analytics. Inline Analytics is implemented as iFrames with analytics hosted in a separate tab or page.

Analytics integration models for SaaS

Do you want to monetize the data captured in your system? But how to select a BI tool that allows embedded/inline analytics? The selection of analytics as a service tool depends on various factors, namely, a tool:

(1) customization or ability to align it with the look and feel of your application;

(2) usability or customer seamless experience between using the BI and the application itself;

(3) functionality such as auto-refresh, interactivity, etc.;

(4) multitenancy or ability to set up the required level of security;

(5) scalability or ability to scale to large datasets, as well as ability to allow a higher number of users to be served by as time goes by;

(6) the data structure or ability to support existing/future data structure;

(7) performance or impact it will have on the application performance [3].

Embedded software vendors offer up a range of prices with some of them costing as much as $200 a month or much more depending, in most of the cases, on the number of pages renders or a number of data sources connected. Names worth mentioning are Cumul, Zoomdata, Datapine, Datawrapper, Cluvio, Power BI Embedded, Qlik Sense, Holisitcs, Yurbi, Keen, and others. Another path to pursue is to implement an open-source dashboarding solution. Some of the vendors offer community editions with limited functionality, e.g. Knowage, Pyramidanalytics, Redash, Metabase, etc. Just keep in mind that although community editions are technically “free”, there is still some cost associated with them, such as the storage cost or development cost. Nevertheless, many companies still opt for the open-source tool because of cost savings they gain which are especially relevant in the case of embedded analytics as paying per user could cause you to break the bank. The objective of the research behind the article was to review and access the existing open source license tools in lieu of their embedding functionality and provide the guide to starting your self-hosted open-source analytics.

We started with compiling the list of community edition tools by the number of times they are mentioned across the web and filtering tools that help to create professional-looking reports and dashboards and meet the above tool selection criteria. Three options — Redash, Metabase, and Superset were evaluated in-depth.

<iframe src=”Open Source Embedded Analytics for SaaS”> Redash </iframe>

Redash (in the past, re:dash), a company founded upon open source technologies back in 2014 by Arik Fraimovich, grew to offer both, enterprise and community editions in order “to democratize data and make data-driven decision making easy”. To embed a dashboard follow the following steps:

1. Launch a pre-configured Amazon ubuntu instance (minimum EC2 type of t2.small) from Redash here and add a security group to control the network traffic that can reach your EC2 instance (inbound ports of 22, 80, and 443).

2. Login using the public IP of the EC2 instance and create an initial user.

3. Add a data source, create & publish a query, add visualization and add it to a dashboard as a Widget.

4. Create a public link and embed it to your iframe

 <iframe src=”publicIPofEC2Instance.com” width=”720" height=”391"></iframe>

Tool limitations observed:

· Redash is a SQL-based tool which means that all aggregation such sum, average, minimum, maximum, etc. should be done on the SQL level.

· Visualizations and dashboards are not easily or in some cases, for example, changing a look and feel of tooltips, changing the dashboard layout by adjusting the widget’s size, or adding custom color schema, etc. are not customizable at all.

· Interactivity is very limited, for example, you cannot add a drill-down or filter controls.

· If you have a large dataset, the service slows down.

<iframe src=”Open Source Embedded Analytics for SaaS”> Metabase </iframe>

Metabase, like Redash, is a company with open-source roots that offers both, enterprise and community licenses to allow “for everyone in your company to ask questions and learn from data”. Listen to the interview with Sameer Al-Sakran, the Co-founder and CEO of Metabase here. To embed a dashboard follow the following steps:

1. Launch an Amazon ubuntu instance (minimum EC2 type of t2.small) and add a security group to control the network traffic that can reach your EC2 instance (inbound ports of 3000 and 22).

2. Create a new folder with your project —

mkdir yourProjectName; cd yourProjectName;

3. Download Java 8 to your instance

sudo apt update; sudo apt install openjdk-8-jre-headless;

4. Download — metabase.jar

wget http://downloads.metabase.com/v0.34.0/metabase.jar;

5. Get a screen up and running —

screen; java -jar metabase.jar. 

You should be able to access your Metabase account using the publicIPofYourInstance:3000.

5. Visit Admin >> Databases to add a new data source, create a new Dashboard, generate a ‘new question’ and add it to the newly created dashboard.

6. Enable public sharing and copy iframe to your html code:

<iframe src=”publicUrl” frameborder=”0" width=”800" height=”600" allowtransparency></iframe>

Tool limitations observed:

· If you have a large dataset, the service slows down. In addition, Metabase will show only the first 10,000 rows, so you need to plan how you write your SQL statements.

· Limited customization functionality, for example, a predefined color palette.

<iframe src=”Open Source Embedded Analytics for SaaS > Superset </iframe>

Superset (originally named Panoramix and then Caravel) was initially designed by Airbnb but later it got open sourced for the community and still remains open-source only. You can listen to the history of Superset in the speech by Max Beauchemin, a Creator of Apache Superset.

  1. Launch an Amazon ubuntu instance (minimum EC2 type of t2.small) and add a security group to control the network traffic that can reach your EC2 instance (inbound ports of 8080 and 22).
  2. Install dependency packages and relevant Python libraries on your VM:
sudo apt-get update; sudo apt-get install build-essential libssl-dev libffi-dev python3.6-dev python-pip libsasl2-dev libldap2-dev sasl2-bin
sudo apt install python3-pip
pip3 install cchardet==1.0.0 psycopg2-binary sqlalchemy==1.2.18 #add a Python database package following https://superset.incubator.apache.org/installation.html if you want to add another database type, not postgresql
sudo pip3 install superset #to download superset library
fabmanager create-admin — app superset #to create an admin role
pip3 install apache-superset — upgrade
superset db upgrade

3.find. -name "*config.py" #to find the superset/config.py file, e.g. in my case, it was located in ./.local/lib/python3.6/site-packages/superset/config.py

And modify lines:

PUBLIC_ROLE_LIKE_GAMMA = True
SESSION_COOKIE_SAMESITE = None # One of [None, ‘Lax’, ‘Strict’]

4.Exit the editor and run:

superset init
superset load_examples #to load the sample datasets
screen
fabmanager run — app superset #to launch the web interface

5.You should be able to access your Superset account using the publicIPofYourInstance:8080.

6.Add a Database >> Table and share a table with a public role.

7.Visualize the output in the desired way.

8.Extract the code snippet to embed it into your application.

Tool limitations observed:

  • The installation and tool maintenance requires knowledge of the Unix and Python, e.g. you need to install the required Python package for the database you want to connect.
  • The community edition is not white-labeled.
  • It seems it is not possible to embed a dashboard. A way to overcome it is to add links to individual graphs in div.
  • There is limited interactivity within and between visualizations.
  • Superset will show only the first 10,000 rows, so you need to plan how you write your SQL statements — however, it might be possible to change the number of rows retrieved by adjusting the superset/config.py file.

Remember that the main objective of embedding dashboards to your SaaS is to bring data closer to end-users into their day-to-day workflows and decision-making processes, and as long as you deliver it, it doesn’t matter whether you buy a commercial BI tool or use an open-source solution.

Reference:

[1] Cisco Global Cloud Index: Forecast and Methodology, 2016–2021

[2] The Complete Guide to Embedded Analytics (2014) by Logi Analytics. Accessed 1/06/2019

[3] Embedding Analytics in Modern Applications. How to Provide Distraction-Free Insights to End Users (2016) by Courtney Webster

--

--

Eka Ponkratova (@thatdatabackpacker)
Eka Ponkratova (@thatdatabackpacker)

Written by Eka Ponkratova (@thatdatabackpacker)

I’m a data consultant, interacting closely with you to get data to work for you www.linkedin.com/in/eponkratova

Responses (1)