This blog post shows how to load data from blob storage to the following (in order):
- SQL Datawarehouse (SQL DW for short). SQL DW has the following advantages
- It can scale up and down in a few seconds
- You can load data easily from flat files in Azure Blob with Polybase
- SQL is a well known language
- You can do joins
- You can query it from Azure ML
- Azure Machine Learning
- learn from the dataset
- tune your model
- evaluate scoring and how the model generalize
- operationalize as a Web API
- a Jupyter notebook
- explore, plot, transform the dataset in Python
- document in Markdown
- Power BI
- visualize your data
- Share the dataviz with others
There is also a good article with demo videos in the following article: Using Azure Machine Learning with SQL Data Warehouse.
This is a sample.
By convention, all values ended by 34 should be replaced by your own values. For instance, the data storage account is
mydata34. Yours should have a different name.
The data is available as a number of flat delimited files in a blob storage container. In this example, there are 2 files. here are a few lines of data:
Fields are separated by the pipe character (‘|’). The name of the fields are:
The data is available in a storage account.
In this example, the storage account name is
mydata34 and its key is
The key can be found in the portal, for example:
NB: by the time your read this page, the key may have change. I share the key so that you can find it in code where it is necessary.
Let’s create a SQL DW.
choose SQL Data Warehouse and click
Here is the data you can enter:
- 200 DWU
- Create a new Server
- West Europe
Once the SQL DW has been created, we must connect to it. One of the tools you can use is Visual Studio; you can download Visual Studio community edition from visualstudio.com. Please refer to Connect to SQL Data Warehouse with Visual Studio for details.
You must allow your own IP address to access the SQL DB server:
Let’s load the data.
Here is the code that you can paste in Visual Studio and execute:
--also refer to https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/
the result of the last statement is
each file has ~22000 lines so the total numlber of lines seems good.
The rejected lines are some headers that are inside regular rows:
We can now query the data:
Let’s now get a subset of the data in Azure Machine Learning.
For example, the query we need is
select top 1500 * from mydata WHERE [sc-status]='200' ORDER BY date, time asc
Let’s assume you have an Azure Machine learning available. You’ve created a new experiment from the Studio at studio.azureml.net.
Then add a reader
In the properties, choose and fill:
- Azure SQL Database
- select top 1500 * from mydata WHERE [sc-status]=’200’ ORDER BY date, time asc
click Run at the bottom of the page, then you can visualize the dataset
You may also want to use the dataset in a Jupyter notebook.
For that, you just have to convert the dataset to CSV and then generate the code to access the dataset from Azure Machine Learning.
Drag & drop the Convert to CSV shape, connect it to the reader. Then you can generate the Python code to access that dataset or directly ask for a new notebook that will have access to the dataset:
You may also want to see your dataset
For that, you can go to your Power BI environment at app.powerbi.com, choose Get Data, Databases & More and choose Azure SQL Data Warehouse.
from there, you have the dataset available, and can visualize it:
We saw how to load flat files in a SQL DW, then in Azure Machine Learning and Jupyter, as well as Power BI.