Skip to content

Analytics Tools: How to select a Data Storage platform

This is part 3 of a 5 part series on Selecting Analytics Tools.

Next step in the value chain: Storage tools.

Data Storage link of the analytics value chain

Not long ago, you had to make a choice when selecting a storage platform: Do you want to store huge amounts of data or be able to use SQL [1] to query and manipulate your data in real time? Luckily, those days are behind us and there are many storage platforms that store nearly limitless amounts of data with SQL accessibility and query responses in seconds.

Before considering tools, first answer these questions:

  • What questions do you need to answer? Where and how you store your data will determine the questions you can answer with it in the future. The better you understand those questions the better you can make the relevant choices.
  • How much data will you store? Just like with collection tools, storage tools will get more expensive with the more data you have. Understanding how much data you will store requires you to understand not just everything you track, but how long you will store it for! 10 GB per day for a year is 3.65 TB which is a lot of data.
  • How fast do you need answers? Some solutions will answer queries in a matter of minutes, others in a matter of milliseconds. Decide on this up front as once you choose a storage platform it’s unlikely you can change your mind.

The more data you store in a data storage solution the harder it will be to switch vendors later, so choose carefully. Here are the criteria to use:

Great tools will be…

  • High capacity. There should be no drop in performance or quality as your data size scales up, so make sure to read all the documentation. You never want to have to throw data away if you don’t want to.
  • Data format agnostic. You should be able to store your data in whatever format you prefer using the schema you prefer. Modern technology enables high performance in many scenarios, so you don’t have to compromise by contorting your data in ways that are not helpful.
  • SQL Compliant. I mentioned this before, but to remind you – use something that provides SQL access. It will provide much greater flexibility in the future and make many more tools available for visualization.

Beware of…

  • Proprietary interfaces. Some data storage systems promise high performance, but at the cost of lock in. They will try and force you to use their data visualization tools or data formats, which will only raise your cost and reduce your flexibility later.
  • Security holes. You’d be surprised how many tools don’t provide the security and access control features necessary to keep your data secure. Make sure that your needs are met and that there are not backdoors set up by the vendor.
  • Disasters. The disaster recovery plan for your vendor is critical to make sure your data is always available. If a single data center failure takes your data offline for days you will have many challenges. Ensure they have geographic distribution (multiple locations) and are not dependent on another provider for service. Ask how often they test their disaster recovery plan!

The great challenge with storage tools is that once you have a significant amount of data in them the switching cost is very high. You can change your data collection and visualization tools frequently, but the amount of effort (and price) to move huge amounts of data around make it prohibitively expensive to change your storage platform. Choose wisely.

Tomorrow we’ll continue moving down the chain when we cover Visualization tools.

[1] Structured Query Language, a standard way to query data stored in relational databases. Interestingly, the structure of SQL was defined in the 1970s and remains largely the same today making it one of the oldest software languages in mainstream use.

Quote of the Day: “The very first version was Oracle version 2,” he said. “We knew no one would want to buy version 1.” – Larry Ellison, on Oracle v2, the first commercial SQL product

Tagged: , ,