Open Source for Analytics

As you may know by now, I am a big fan of open source tools and / or software that can be used by everyone. This approach is quite remarkable in a society that promotes capitalism. You create something that you think worthy enough to share but you do it for free when you could charge for it.

Let’s be honest, the reason may not always be out of generosity. I will actually try to look at the current state of open source for analytics, as I see it, and what are the problems and the good side of this way of working.

What is open source

First I will explain what is the open source exactly because it can be easily confuse with free software. It has the same origin but it is quite different.
I can only recommend to read this article : https://dzone.com/articles/free-software-vs-open-source-vs-freeware-whats-the but I will do a quick summary below.

Free software

This type of software allows you to do anything you want with it, even improving the version and profiting from it. But it doesn’t mean that the software is free itself.
it may be that you need to pay to obtain that software but once that is done, you have access to a total freedom on what you can do with it.

There are 4 pillars for a software to be considered free:

  1. The freedom to use the software for any use case without any restrictions. A 30 days free trials doesn’t make a software free.
  2. The freedom to study how the software works and custom it legally.
  3. The freedom to freely re-distribute the software to assist someone in need. Important note; you can do profit on the redistribution.
  4. The freedom to enhance the performance of the software and release your enhancements for the community to benefit—both programmers and non-programmers. You can do this at a cost or at no cost.

As you understood, free software may cost something to get acquire but once you get that software, there are nothing stopping you from doing whatever you want with it.

Good to know, the MIT Licence and the GNU General Public License v2 are free-software licences. So when you see those licences, you know now what to do with these software.

Open Source

The Open Source Initiative, the organisation that has founded this project define an open-source software as following these criteria:

  1. Free redistribution of the software.
  2. The source code should be publicly available.
  3. The software can be modified and distributed in a different format from the original software.
  4. The software should not discriminate against persons or groups.
  5. The software should not restrict the usage of other software.

As you can see, there are some similarities between these 2 definition but they are not the same.

The idea behind the free software foundation was more to be the base layer of some software that can be redistributed later on. More in a low-level environment, the open-source organisation is more turn towards collaboration in order to improve the same software.

From one you can say that children will be born and have their own life (see the Linux distribution system) and the other is to modify and improve the current project.

Current State

For web analytics, there are some implications for this, there are 2 types of tool available now. The open source ones and the enterprise ones.
Due to my web analytics journey, some broad interest, believed value and chances I am now mostly working with Open Source solution or with free tools but not only. Adobe Analytics is not open-source.

You can think that I am talking about Python and JavaScript as examples for open sources but it can also be stated that Adobe Launch is now open-source.
Overall, there is a wind in most of analytical tools to go to open-source or to let Open Source language be used for that purpose.

Not so long ago most of the analytical work was done with SAS or SPSS. However, in a world where more and more smart people are being educated, developing your software or language without getting help from the community is very hard.

It requires a lot of resources and you may miss some easy quick win or feedbacks that the community can give directly to the product.
In my opinion, these companies need to develop some drastic shift on their business model and how they are approaching the market.
The free solution are really getting a big part of the market, and the strategy to develop a proper response is quite hard.
Here are different possibilities that I can see:

  • Increase your tool functionalities
    • By doing that, you need developers and investment
    • It can make the price rises and create a gap for free software to be used for small companies
  • Increase tool possibilities
    • By that I mean no new possibilities but improve the current ones based on customer feedback.
    • By doing that you may not improve the tool enough to justify its price.
  • Specialized for a specific audience
    • I feel that this is maybe the best strategy on the long run.
      It makes your tool more or less a common tool for this specific field but it may only delay the inevitable.

As I said earlier, the proprietary tool that are being used are not threaten by the amount of free and open source projects that are being developed. I do believe that there are still a lot of room to grow for tool such as SAS (vs python or R), Tableau (vs PowerBI or R ggplot or python viz), SPSS (vs python or R), Google Tag Manager (vs Launch) but as the time pass, and the user are being more and more trained on versatile programming language, I do believe that the price for these tools will be less and less justify.

Adding to that, there is also some companies that have hard time paying for the licences.
Not only it will get harder to justify the price but even now, when the price can still be justify quite easily, it is quite hard to make companies paying it.

However, there are not only good things for open-source software and it rarely equals the full capacity of what a license tool can offer.
Here are some advantages or disadvantage of open-source tools:

  • There is no real-support.
    Even for big community like python or R, it can take some time to debug something very specific.
    When you have your whole company analytics depending on something and it breaks… community help may not be able to cut it.
    To be honest, it can also happen with paid licence tool.
  • It is usually a lot more techy than to use Open Source Software.
    Licences don’t go automatically to create profit, it helps the tool to improve and one thing that is really hard and often less focus on in Open Source community is the UX design.
    License products have the luxury to dedicate a proper team for a good UX and a good UX can change a lot of things.
  • Licences products often have free byproducts and support associated.
    With Open Source, it may be that the core is free but additional patch or functionality are paid.
    Bad example but one that everyone will understand : Google Analytics is not open source but it is free. However if you want to have unsampled data, you need to pay for it and it isn’t cheap.
    Imagine if you have built your whole analytics history with Google, you want to go for unsampled data analysis… then you are kind of stuck.
    When you start to spend money, you usually consider what can be get for the same price and sometimes the paid option of the free product is not worth the comparison with an only paid solution.
  • Often Licenses product provide official support. With Open Source tools, they are experts. The difference is slim but it can be big if you have big challenges in a very political companies.
    Experts are, for better or worse, consider smart guys but support is showing a way that is in the interest to use the tool to its best capacities. Not to please a department.
    It happens that I need to make my client consider the implication of their decision to other functionalities of the product. What is best for Analytics may not be recommended with Audience Manager, or at least required some cautious.
  • There is someone responsible if thins go wrong (and it is outside the company).
    This is an internal safeguard for companies so no one (in the company) is badly impacted if a tool needs to be ditched at some point.
    Reality is always a bit more nuance but you get the idea.

As a conclusion, it does feel that there are different answers for the question : “Should I use Open Source for my analytics ? “
The answers would be :

  • Yes – If you are a small to medium enterprise and don’t have lots of money.
    You may end up with not very competitive tools that can be replaced with open-source softwares.
    If you get a licence tool, make sure you keep enough money for hiring an actual guy or team for that tool. The tool won’t give you the answer you want, the guy / team will.
  • No – If you are a large enterprise. The solution provided by licencing software are pretty and are still worse their money. You also crave for supports because you will encounter issues that are very specific to your company and you need guidance.
    Also politically it is easier to justify spending money and time on a renown solution.
  • Yes – if you are a very very large enterprise. I believe that there is a critical mass that once it is reached, it makes more sense to switch to your own or to open source software. You need to have a large set of skills in your team, plenty of money because you will need teams to take care of lots of things that was hidden by the solution you used.
    This is a very rare case and only few companies are actually achieving that state (Amazon, Google as example). Some companies believe that they are in this state because they are large but they miss the most important. The team skills, the experience of dealing with data at this scale.
    Internalization is the path but to go too fast too soon or you just create a big mess of things and no one will trust your data anymore.

Note:

As you may have noticed, I am using the GNU General Public License v3.0 for my python wrapper. It is the newest version of the GNU licence v2 discussed above.
Some explanation on the differences are published here. Overall I read that it increased compatibility with Apache Licence v2 and it still keep me as original contributor in case it gets distributed.

Leave a Reply

Your email address will not be published. Required fields are marked *