You Might Have Small Data (And That’s Okay)

The following is adapted from Jesse Anderson’s blog. Jesse Anderson is a data engineer and director of the Big Data Institute.

There is a common beginner question for engineers starting out with Big Data. An engineer will post to social media saying “I need to know which Big Data technology to use. I have 3 billion rows in 10,000 files. The whole dataset is 100 GB. Is Big Data Technology X efficient for processing this?”

The short answer is no. The long answer is more than likely no and only a qualified data engineer can tell you for sure.

The issue starts with a misunderstanding of what Big Data is and isn’t. The original poster is assuming that small data technologies can’t do something for them. After all, 3 billion rows sounds like a lot. It isn’t.

If you think about it, you can easily provision a VM with 256 GB of RAM. For a dataset of 100 GB, the entire dataset could fit in memory. There are some nuances like how much this dataset will grow and the complexity of the processing, but this probably isn’t a Big Data problem.

On threads with answers to these questions, there is often another person who responds that the use case doesn’t need Big Data. Sometimes, the original poster will get insulted or think that people are belittling their use case. They aren’t.

This is because their use case would be so much better off in a “small” data technology like a cloud data warehouse as a data store. Using a technology with a relational structure instead of a Big Data technology like the Hadoop ecosystem has these major benefits:

Less conceptual complexity
More prevalent in the marketplace
More people know the technology
Easier operationally
Faster speeds of queries
Cheaper operationally, technically, and people-wise
Shorter development cycles

When someone is telling you that your use case is small data, they aren’t belittling you or your use case. They’re saving you time, money, and effort.

If you do have Big Data problems, you are specifically held back by a small data technology limitation. You are saying “can’t” because you are hitting a known technical limitation. Namely:

You’re a manager and you ask for a new feature or a report and the technical person says they can’t due to a technical limitation.
You’re a developer and you can’t add new features because the database or data warehouse will fall over and die.
You’re an analyst and you can’t do your report because it would take too long or process too much data.
You’re a Data Warehouse Engineer and you still can’t do the most intensive queries because they take too much time and resources to run.

These problems often accompany a scale of 100s of billions of rows or petabytes of data. For these problems, you will need highly-trained data engineers.

I’ve seen companies succeed with Big Data in the following ways:

Allowing enough time to have a sane project plan
Having realistic expectations for what Big Data would do for the company
Spending the money on excellent training
Getting the team the mentoring and help they need
Realizing Big Data is a complex animal

And I’ve seen companies fail in the following ways:

Thinking Big Data is the silver bullet that will save the company from itself
Rushing through the process and not giving the team the time and resources to succeed
Thinking the team can just read some books or watch some YouTube videos to learn Big Data
Cheaping out on training and help for the team
Having a team without the right skills

Remember that even if your organization does have Big Data use cases, not every data-related use case within your organization is a Big Data one. You can simultaneously have small data and Big Data use cases coexisting within the same organization, and the two should be approached somewhat differently. Don’t hit a fly with a sledgehammer – using Big Data technologies for small data will bring a high expense with little reward.

If you’re running a business that needs help with your Big Data strategy, you can read about my mentoring service.

Start for free

You might have small data (and that’s okay)

You might have small data (and that’s okay)

Related blog posts

Start for free