Monday, October 05, 2015

Massive Parallel Procssing of Data with SCOPE

SCOPE is an acronym for Structured Computations Optimized for Parallel Execution, a declarative language for working with large-scale data. It is still under development at Microsoft. If you know SQL then working with SCOPE will be quite easy as SCOPE builds on SQL.
The execution environment is different from that RDBMS oriented data.
Data is still modeled as rows. Every row has typed columns and eveyr rowset has a well-defined schema. There is a SCOPe compiler that comes up with optimized execution plan and a runtime execution plan.

Look at this QCount query in SCOPE:

SELECT query, COUNT(*) AS count
FROM "search.log" USING LogExtractor
GROUP BY query
HAVING count > 1000
OUTPUT TO "qcount.result";

You probably know most and the rest you are able to guess.

In the above there is a built-in LogExtractor. You can get it step-by-step going line by line; each step output being the input of next step.

SCOPE requires a software platform for storing and analyzing massive amounts of data and Microsoft has one called 'Cosmos'. Here is graphic of SCOPE processing is carried out.

This post is based on the PDF document you will find here and the image is taken from the same PDF.

No comments: Protection Status