Data Science & Engineering

Helping public and private organizations address and manage the hurdles of Big Data.

The digital realm is bursting at the seams with data. The volume of today’s data deluge is so colossal that it almost defies description. That’s why the information technology community invented the term “Big Data,” as a catchy, concise way of expressing the unprecedented magnitude and rapid growth of continuously generated data. The Age of Big Data is affecting all enterprises, from healthcare and financial services to energy and manufacturing.

The term Big Data, however, is perhaps too shallow, shortsighted and even misleading. To be sure, today’s data is high volume, has high velocity and is of high complexity. But data is just the raw material; it’s the beginning, not the end. Raw data needs to be transformed into insight, understanding, and wisdom. The data is the input, not the output. It is of high value and high impact only if merged with the right business intelligence, advanced analytics and other new technologies to deliver meaningful outcomes. If Big Data is one of those stylish terms riding the wave of technology’s “hype cycle,” the data deluge is very real and presents an array of challenges for enterprises.

To help public and private organizations address and manage those hurdles, We have created a Data Science & Engineering Practice (DS&E) containing four competencies: data systems architecture, data engineering, descriptive analytics, and artificial intelligence. These areas of the data life cycle encompass back-end data systems, data engineering (or “the glue” that makes the data available), and the front-end results via descriptive and predictive analytics.

A closer look at the four capabilities: 

Data Systems Architecture focus on technology—managing tools and systems that support all the above. Focused on technology, the data systems Practice is concerned with obtaining and managing tools and systems. It also leverages automation to improve reliability and scalability, architecting and administering Big Data systems, and ensuring data security at rest and in transit.

Data engineering seeks to understand the flow of data from collection to display so that the right data is sourced, prepared, and made available. Data engineering performs data modeling to meet analytical requirements, and creates new metrics, dimensions and features in support of analytics programs. It also develops and administers ETL and data integration processes.

Descriptive analytics focus on an organization’s mission and understanding the nature of the business so that performance can be summarized in dashboards and reports. Descriptive analytics concentrates on an organization’s objectives and analyzing mission-related performance. Based on analyses performed by its data technologists, KYC creates and delivers presentations to stakeholders and builds reports, dashboards and visualization that promote self-service.

Artificial Intelligence includes Natural Language Processing, computer vision, machine learning, and prescriptive analytics.

These four capabilities face an enormous amount of data being generated from disparate sources, everything from government and public data to social media, commercial transactions and sensor databases. And it’s not just the obvious sources like Amazon or Facebook, but data from the Internet of Things and other burgeoning, even disruptive, sources like telemetry from self-driving vehicles. Government and industry’s ability to aggregate, securely process, and harmonize mountains of both structured and unstructured data is rapidly evolving with the adoption of the cloud, advanced data analytics tools, and, increasingly and perhaps most importantly, the advent of AI and its fields of machine learning (ML) and NLP.

As the data deluge grows, the demand for new solutions also intensifies. There is a vastly increased need for sophistication and speed in delivering intelligent and actionable insights from data. To keep up, the technology world is accelerating into automation and real-time decision analytics. Much of the data is unstructured, so it will be initially difficult to analyze it without technology like NLP to make it actionable. Organizations are also quickly exploring and adopting various technologies like storage as a service and serverless compute in order to handle the scale.

The emergence of these new cloud architecture options and infrastructures for immense volumes of data poses stumbling blocks for conventional thinking about data and co-located storage and compute. The latest push has been toward separating compute from storage and accessing the data through open formats. With open formats, users can access information via a variety of options, letting them perform multiple analytics operations or use different, competing tools against the same data set at the same time. These open formats and the decoupling of storage and compute are also critical in helping organizations keep up with continuous changes in the technology sphere and the surging pace of the data deluge by allowing experimentation in hours instead of months. Finally, decoupling offers independent scalability and access to underlying storage without adding costly compute resources that may not be needed.

As new solutions and technologies come roaring down the pike in the next few years and into the 2020s, we also will see the maturation of serverless compute yielding data systems that will be an order of magnitude cheaper and scalability that will be effectively unlimited. Patching, provisioning and backup will be accomplished externally by the cloud provider, letting organizational leaders focus on mission objectives, not technology problems and issues.

Data Challenges for Healthcare Organizations

The healthcare information technology sector, a key arena of expertise for KYC, represents a microcosm of the formidable challenges wrought by the data deluge but is also an area fertile for leading edge data analytics solutions, such as those offered by KYC’s Data Science & Engineering Practice.

A focal point in healthcare IT is staying ahead of the swiftly moving advances in the field. Not so long ago, the IT function of healthcare organizations was confined to a back office where staff oversaw basic operations, such as managing infrastructure, email systems and networks to ensure that everything ran smoothly for mission-focused users. Today, healthcare IT teams face some of the most daunting challenges of any sector: the staggering growth of unstructured data, the need for interoperability across systems and increasingly sophisticated cyber threats. With digitization and automation, most healthcare data is collected in IT systems, including sensitive healthcare and patient data, clinical information, lab results and diagnostic imaging. Still, some critical healthcare information, like that produced by encounters between doctors and patients, is still processed manually.