Data-First Enterprises

Oliver Jack Dean Apr 3, 2024

While AI and Large Language Models (LLMs) represent groundbreaking advancements for humanity, if we zoom-in a little bit more, these technologies only form a part of the broader digital transformation journey for the majority of enterprises.

Yep, this is the reality and is especially relevant in the EU, where digital budgets are notably lower than in North America, averaging $140 million compared to North America's $254 million.

Despite challenging economic conditions, I anticipate a significant increase in EU digital budgets towards the tail-end of 2024. A key focus for executives and technology leaders will be transforming their organizations into "data-first enterprises".

For us, "data-first organizations" means that no one will have to make bad business decisions anymore. Instead, data and analytics combined provides a solid foundation for business decision making and success.

However, this transformation won't occur overnight. It will involve a gradual overhaul of enterprise applications and tools, shifting them towards data-centricity. Further, some applications might not transition at all and will be sadly left behind.

With these changes underway, executives need to consider where to focus their efforts and resources pretty fast. But where to start?

Data Pipeline Orchestration:

The first step for many decision makers will be to think carefully about becoming cloud-frist enterprises, as being cloud-first is an important component of a modern data pipeline. Allowing for scalable infrastructure and quick setup of new pipelines without the need for continuous IT overhead.

Today's data pipelines must navigate complex requirements like integrating multiple data sources, managing large volumes of data, and providing near real-time data delivery. Achieving this is a formidable task, especially when enterprises lack the specialized skillsets needed to weave together data from intricate business environments.

The shift from traditional software solutions to cloud-based models began about five years ago. For enterprises to grow, they must recognize and pursue opportunities in the cloud. Understanding that our societies and customers are moving towards a cloud-centric, data-first world is vital for growth.

For sure, a few years ago, on-premise SQL servers with minimal data needs and usage were the norm. Now, data engineers and BI analysts are tasked with creating more sophisticated data pipelines. This complexity arises from the need to integrate data from various sources, with most enterprises using around 40 to 400 SaaS applications across various workflows.

To address these challenges, supporting multiple data storage layers, such as cloud-based data lakes and warehouses to meet various data pipeline use cases will be essential for lift-off. Yet, first and foremost, decision makers must identify key data lifecycles across business landscapes, and think carefully about how these can accommodated or even automated in the cloud.

Thankfully, this job to be done cannot be completed alone. Collaborating with various business units and understanding customer needs will be essential to not only store data in lakes or warehouses but also figure out how to surface such data into internal business applications effectively - something we commonly call "data reusability".

Data Engineering Orchestration:

Integrating data pipelines is a complex task and the next challenge for technology decision makers will be to identify common use-cases and business scenarios that will benefit from data-centric applications such as business analytics, AI/ML, user-facing analytics, and even scientific research.

With enterprises often employing dozens or hundreds of SaaS tools to either interact with customers or communicate between different business units or supply-chains, the key initial step will be to devise strategies for securely and efficiently sharing data across such systems.

Fortunately, many SaaS used by enterprises, such as MS Teams, Notion, or Slack, offer no-code interfaces and managed API integration. These APIs provide accessible points for reading and writing data between the SaaS systems and an enterprise's core application portfolio. However, there will be times when gaps in data appear, or external connections to APIs drop over the wire.

So, the ability to rapidly construct data pipelines without direct access to every data source at any one time will become essential for future data-first success. This data engineering feature ensures that gaps in data availability do not hinder the effectiveness of the end-to-end pipeline.

Gradually, we foresee a growing need for bespoke data mechanisms across enterprise workflows, that facilitate quick integration of SaaS tools with other business applications primarily through low-code.

Essentially, this involves creating tailored solutions that function independently of centralized resources, while also maintaining the capacity to interact and integrate data across various business applications. It's a big challenge to tackle but again, it cannot be done alone and a challenge that must be faced in order to become a data-first enterprise.

No-code or Low-code:

Already I have alluded to those dreaded words that many an engineer fear - low-code and no-code. But we believe the integration of low-code and no-code platforms will continue to mature over 2024 and become an important part of an enterprise's toolbox in the near future.

While certain elements of the data pipeline can seamlessly transition to a no-code development experience, others will still require custom code. But this bridge presents a significant challenge for executives and technology decision makers and one that cannot be ignored.

Many workforce members or employees lack the time or incentives to master data ETL processing or go learn Rust or Zig, placing extra burdens on developers who act as human middleware, facing increased risk and resource demands. So, we think that in urgent situations, utilizing no-code and low-code tools will become a practical, time-saving solution for enterprises.

In fact, low-code and now code tooling are particularly suitable for BI analysts familiar with Python or regular users of platforms like Tableau, needing support only for highly specialized system integrations.

Adopting a hybrid approach, which combines the simplicity of no-code and low-code solutions with the flexibility of custom code (think platforms like synera.io or n8n.io) for deeper and more complex systems integration, will be essential for modernizing enterprises and transitioning towards a data-first model.

Data Usage vs Costs:

Transitioning to a data-first enterprise necessitates the management of well-ordered, data-driven systems. In parallel to data pipelines and enabling business units with data anlaytical tooling and applications, another challenge will be in addressing anomalies through data monitoring.

These tools scrutinize data pipelines for inconsistencies or contradictions, known as "data consistency" or "linearizability" in systems parlence. Such problems are particularly challenging in microservices-based projects and can lead to significant cost inefficiencies if not managed properly.

Without having complete visibility on the ground, and not understanding how data pipelines are actively being used - unforeseen cost ineffiencies will soon bite back.

No doubt data usage-based pricing models through managed services can optimize costs and enhance value. However, maintaining observability across data pipelines, including visibility into the operation and cost implications of data usage, is critical.

For example, data engineers working with data inside Google Sheets might need to lift-and-shift such data over into another tool being used by the marketing and sales team - say Notion. Such an data engineer might not know the extent of data growth or updates by months-end. They may not know how often it will require updating and when? Therefore, such uncertainty underscores the need for additional tooling that can provide upfront estimates of data usage costs.

Therefore, a deep understanding of data pricing and management, along with addressing observability and data consistency challenges, will also be vital for many looking to become data-first.

In fact, this will be one of final steps for executives and technology decision makers aiming to pivot their organizations towards being data-first. It's not just about integrating data into warehouses, central repositories and then integrating across workflows but also gaining visibility into how such data is actively being used and the associated costs that come along with such usage.