10 Hot Hadoop Startups to Watch in 2014
It’s no secret that data volumes are growing exponentially. What’s a bit more mysterious is figuring out how to unlock the value of all of that data. A big part of the problem is that traditional databases weren’t designed for big data-scale volumes, nor were they designed to incorporate different types of data (structured and unstructured) from different apps.
Lately, Apache Hadoop, an open-source framework that enables the processing of large data sets in a distributed environment, has become almost synonymous with big data. With Hadoop, end users can run applications on systems composed of thousands of nodes that pull in thousands of terabytes of data.
According to Gartner estimates, the current Hadoop ecosystem market is worth roughly $77 million. The research firm expects that figure to balloon to $813 million by 2016.
Here are 10 startups hoping to grab a piece of that nearly $1 billion pie. These startups were chosen and ranked based on a combination of funding, named customers, competitive positioning, the track record of its executives, and the ability to articulate a real-world problem and explain why the startup’s solution is an ideal one to solve it.
What They Do: Provide a big data analytics solution that transforms raw data in Hadoop into interactive, in-memory business intelligence.
Platfora tries to simplify the data collection and analysis process, automatically transforming raw data in Hadoop into interactive, in-memory business intelligence, with no ETL or data warehousing required. Platfora provides an exploratory BI and analytics platform designed for business analysts. Platfora gives business analysts visual, self-service analytical tools that help them navigate from events, actions, and behaviors to business facts.
2. Alpine Data Labs
What They Do: Provide a Hadoop-based data analysis platform.
Alpine Data provides a visual drag-and-drop approach that allows data analysts (or any designated user) throughout an organization to work with large data sets, develop and refine models, and collaborate at scale without having to code. Data is analyzed in the live environment, without migrating or sampling, via a Web app that can be locally hosted.
What They Do: Provide Hadoop-as-a-Service (HaaS).
Hadoop has become almost synonymous with Big Data, yet the number of Hadoop experts available in the wild cannot hope to keep up with demand. Thus, the market for HaaS should rise in step with big data. In fact, according to TechNavio, the HaaS market will top $19 billion by 2016.
Altiscale’s service is intended to abstract the complexity of Hadoop. Altiscale’s engineers set up, run, and manage Hadoop environments for their customers, allowing customers to focus on their data and applications. When customers’ needs change, services are scaled to fit — one of the core advantages of a cloud-based service.
What They Do: Provide a platform that enables users to transform raw, complex data into clean and structured formats for analysis.
Trifacta is backed by $16.3 million in funding raised in two rounds from Accel Partners, XSeed Capital, Data Collective, Greylock Partners, and individual investors.
Why They’re on This List: According to Trifacta, there is a bottleneck in the data chain between the technology platforms for Big Data and the tools used to analyze data. Business analysts, data scientists, and IT programmers spend an inordinate amount of time transforming data. Data scientists, for example, spend as much as 60 to 80 percent of their time transforming data. At the same time, business data analysts don’t have the technical ability to work with new data sets on their own.
5. Splice Machine
What They Do: Provide a Hadoop-based, SQL-compliant database designed for big data applications.
Splice Machine provides all the benefits of NoSQL databases, such as auto-sharding, scalability, fault tolerance, and high availability, while retaining SQL, which is still the industry standard. Splice Machine optimizes complex queries to power real-time OLTP and OLAP applications at scale without rewriting existing SQL-based apps and BI tool integrations. By leveraging distributed computing, Splice Machine can scale from terabytes to petabytes by simply adding more commodity servers. Splice Machine is able to provide this scalability without sacrificing the SQL functionality or the ACID compliance that are cornerstones of an RDBMS.
What They Do: Provide a real-time stream processing platform built on Hadoop.
DataTorrent argues that we’ll soon start thinking about latency issues when we think about Big Data solutions. DataTorrent points out that “data is happening now, streaming-in from various sources — in real-time, all the time.” Many organizations struggle to process, analyze, and act on this never-ending and ever-growing stream of information — at all.
What They Do: Offer Big Data-as-a-Service with a “true auto-scaling Hadoop cluster.”
Qubole handles the initial setup and then maintains the clusters. Qubole’s auto-scaling feature automatically spins up users’ clusters when a job is started and automatically scales or contracts based on workload, cutting back on costs and management requirements.
What They Do: Provide a Hadoop-based big data application hosting platform.
The company’s flagship product, Reactor, is a Java-based integrated data and application framework that layers on top of Apache Hadoop, HBase, and other Hadoop ecosystem components. It surfaces capabilities of the infrastructure through simple Java and REST APIs, shielding end users from unnecessary complexity.
What They Do: Provide HaaS.
Xplenty technology provides Hadoop processing on the cloud via a coding-free design environment, so businesses can quickly and easily benefit from the opportunities offered by Big Data without having to invest in hardware, software, or highly specialized personnel.
What They Do: Provide Big Data analytics applications.
Nuevora has set its sights on one of big data’s early growth areas: marketing and customer engagement. Nuevora’s nBAAP (Big Data Analytics & Apps) Platform features purpose-built analytics apps based on best-practices-driven predictive algorithms. nBAAP is based on three key big data technologies: Hadoop (data processing), R (predictive analytics), and Tableau (visualizations).