Big Data and Data Science White Papers

OLAP cubes first rose to prominence in the 1990s, so they can be seen as one of the first pervasive forms of analytical visualization, enabling a dataset to be depicted in a multi-dimensional manner in a cube format, for slicing and dicing, to see more granular detail. Having been somewhat sidelined as the modeling approach fundamental to OLAP-based analysis gave way to more free-form query building and data-exploration methods, OLAP is now back in favor. Moreover, it is being championed by a new breed of startups, which are applying cube building to the thorny issue of interactively analyzing data in Hadoop.

What are the three core components of a big data strategy? How do you move from strategy to production and what pitfalls might you encounter along the way? Read this comprehensive Think Big whitepaper to find out!

Big Data has exploded on to the scene as a tremendous opportunity for companies across most major industries to gain competitive advantage. But Big Data is not a small project that should be taken lightly. It represents a business imperative requiring active participation by business leaders and their teams, as well as by technical leaders and their teams. This paper discusses some of the key factors that are critical to the success of Big Data projects.

High level exploration of the skills, tools, and techniques needed to achieve early success and to help you build your data science practice.

One of the first roadblocks many developers face when trying to learn about Hadoop is simply getting an installation of Hadoop working locally that they can use to test. In this guide, we show you how to install Hadoop on a Mac so you can start playing with Hadoop today. We also show you some tips and tricks on how to debug and test the distributed MapReduce jobs you create. To note, this setup is for developers only. This is not a setup you would deploy in a production cluster.

The growth of Internet businesses led to a whole new scale of data processing challenges. Companies like Google, Facebook, Yahoo, Twitter, and Quantcast now routinely collect and process hundreds to thousands of terabytes of data on a daily basis. The most important of the storage techniques used by these companies is discussed in this whitepaper. 

Previous knowledge of Hadoop is not necessary, but you should be comfortable using R interactively from a command shell in addition to a GUI.

Think Big  provides data science, engineering and training services that quickly help companies meet their business goals. We identify and prioritize the best opportunities for Big Data projects based on your desired business outcomes. We then assemble the right architecture and custom applications that create real value. Our unique and proven methodology ensures our clients begin to see ROI within the first 40 days of a project.