MSBI (SSIS/SSRS/SSAS) Online Training

Wednesday, October 8, 2014

BI Tool Comparison’s

I got this question often: which is the best Data Visualization product? Let’s compare some DV platforms such as Spotfire, Qlikview, Tableau Software and the Microsoft BI platform (PowerPivot, Excel 2010, SQL Server with its VertiPaq, SSAS, SSRS and SSIS). 2 key factors for tool selection are
·         which makes it easy to comprehend the data,
·         price-performance
Because modern datasets are huge (or growing very fast!), they are usually best comprehended using Data Visualization (DV) with an ability to interact with data through visual drill-down capabilities and dashboards. There is NO a single best visualization product.  Each has its place.  For example, Spotfire has best web client and analytical functionality. On the other hand, Qliktech may be the best visualization product for interactive drill-down capabilities.  The Microsoft BI platform provides better price-performance ratio and good as a backend for DV (especially with release of SQL Server 2012) or for people who wish to build own (that will be a mistake) DV. Tableau has the best ability to interact with OLAP cubes etc. We put into a summary table the comparison of 4 Platforms to help you to evaluate DV products, based on your needs.
MS BI Stack
Business Criteria
======= ======= ======= =======
Speed, Scalability, Price
Implementation Speed
Qlikview is fastest to implement
Limited by RAM
Very Good
Need the expert in scalable SaaS
Above Average
Microsoft is the price leader
Licensing/support cost
Smart Client is the best way to save
Enterprise Readiness
Good for SMB
Good for SMB
Partners are the key to SMB market
Long-term viability
1 product
Microsoft are 35+ years in business
Analytics Market
Growing fast
Growing fast
3rd attempt to win BI
Qlikview is a DV Leader, Successful IPO
Technical Criteria
======= ======= ======= =======
Drilldown, Analytics, UI
Clients for End Users
ZFC, Spotfire Silver
RIA, ZFC,Mobile
Excel, .NET
Free Qlikview Personal Edition is a big plus
Interactive Visualization
Very Good
Very Good
As good as Excel
Most users value Visualization over Modeling
Data Integration
Need for Data Integration expert
Visual Drill-Down
Very Good
Qlikview is fastest thanks to in-memory database
Dashboard Support
Very Good
Below Average
Spotfire and Qlikview are best for Dashboards
Integration with GIS
Spotfire has the best GIS integration
Modeling and Analytics
Excellent OLAP
Good with SSAS
Spotfire is the best, Excel is the most popular
UI & set of Visual Controls
Very Good
Very Good
Need for UI expert to integrate DV components
Development Environment
Rich API, S+
Scripting, Rich API
Tableau requires less consulting than competitors
64-bit In-Memory Columnar DB
Very Good
In-memory Data Engine
Very Good
64-bit RAM allows huge datasets in memory
Summary – Best for:
Visual Analytics
DV, Drilldown
Visual OLAP
Backend for DV
Good Visualization requires a customization!
We also suggest for you to take a look on other comparisons: Altheon comparing Qlikview, Tableau and Omniscope and others (please ping me if you will find good comparison of DV tools).
·         TIBCO is a vendor of the balanced Spotfire Platform: Spotfire Professional, Server, Enterprise and Web Players, S-Plus, DecisionSite, Metrics, Data Automation and Miner  technologies, combining in-memory database with interactive Data Visualization, advanced analytics (S+), data mining and multi-filtering functionality. Spotfire 3.2.1 provides best web client and supports iPad
·         Qliktech is a vendor of Qlikview DV Platform: Qlikview Developer, Qlikview Server, Publisher and Access Point tools with a leading in-memory columnar database combined with advanced Visual Drill-Down, Interactive Dashboards and comprehensive set of client software, running even on SmartPhones.
·         Microsoft is a vendor of the most comprehensive set of BI and DV components, including SQL Server, Analytical, Reporting and Integration Services (SSAS, SSRS, SSIS), Sharepoint, Excel 2010 with PowerPivot add-in (Excel is the most popular BI tool, regardless) and VertiPaq engine. I did not include PerformancePoint Server: it was discontinued in April 2009 and PerformancePoint Services from SharePoint I cannot recommend, because I CAN NOT recommend SharePoint.
·         DV expert can be a cost-effective customizer of all of the above with the ability to customize DV with development components from leading technological vendors like Dundas, DevExpress etc.
·         I did not include Oracle, because it does not have own DV tool; however Oracle is an OEM partner with Tableau and resell it as a part of OBIEE toolset.
DV expert can help to select the appropriate DV approach for your specific application. Additional services include the following:
·         Professional gathering of business requirements, system analysis of workflow and dataflow, and functional specifications;
·         The custom software development for Extract, Transform and Load (ETL) processing from large Client Databases into in-memory superfast columnar DB and interactive OLAP Cubes;
·         Design of Custom Data Visualization and dashboards deployed over the Internet through smart client and RIA technologies.
Custom DV applications enable the user to perform visual drill-down, fast searches for outliers in large datasets, easy-to-use root-cause and business logic analysis, interactive data search, and visual and predictive analytics. Some factors and/or features are very important and some I did not mention because they will  be mentioned on other pages and posts of this blog (e,g, see “in-memory” page and DV Tools pages. I perceive that DV area has 4 super-leaders in this area: Qlikview 10, Spotfire 3.3, Tableau 6.1 and PowerPivot, but for completeness (because the real function is to be a “Data Visualizers”), I wish to add Visokio’s Omniscope. I do not include vendors who are 1-2 generation behind: SAP, SAS, IBM, Oracle, Microstrategy and I can add to this list a dozen more of mis-leaders. Many vendors working on some in-memory technology. Tableau 6.X has now in-memory data engine (64-bit).
Additional factors to consider when comparing DV tools (table above overlaps with list below):
·         - memory optimization [Qlikview is the leader in in-memory columnar database technology];
·         - load time [I tested all products above and PowerPivot is the fastest];
·         - memory swapping [Spotfire is only who can use a disk as a virtual memory, while Qlikview limited by RAM only];
·         - incremental updates [Qlikview probably the best in this area];
·         - thin clients [Spotfire has the the best thin client, especially with their recent release of Spotfire 3.2 and Spotfire Silver];
·         - thick clients [Qlikview has the best THICK client, Tableau has free Desktop Reader, Visokio has Java-based Omniscope Desktop Viewer] ,
·         - access by 3rd party tools [PowerPivot's integration with Excel 2010, SQL Server 2008 R2 Analysis Services and SharePoint 2010 is a big attraction];
·         - interface with SSAS cubes [PowerPivot has it, Tableau has it, Omniscope will have it very soon, Qlikview and Spotfire do not have it],
·         - GUI [3-way tie, it is heavily depends on personal preferences, but Qlikview is more easy to use than others];
·         - advanced analytics [Spotfire 3.2 is the leader here with its integration with S-PLUS and support for IronPython and other add-ons]

·         - the productivity of developers involved with tools mentioned above. In my experience Qlikview is much more productive tool in this regard.

Thursday, October 2, 2014

SSAS cube performance improvement Best methods – Part 2

In part 1 we looked at a method to quantify the work that gets done by SQL Server Analysis Server and found that the OLE DB provider with a network packet size of 32767 brings best throughput while processing a single partition and maxing out the contribution per single CPU.
In this 2nd part we will focus on how to leverage 10 cores or more (64!) and benefit from every of these CPU’s available in your server while processing multiple partitions in parallel; hope the tips and approach will help you to test and determine the maximum processing capacity of the cubes on your SSAS server and process them as fast as possible!
Quick Wins
If you have more than 10 cores in your SSAS server the first thing you’ll notice when you start processing multiple partitions in parallel is that Windows performance counter ‘% Processor time’ of the msmdsrv process is steady at 1000% which means 10 full CPU’s are 100% busy processing. Also the ‘Rows read/sec’ counter will top and produce a steady flat line similar to the one below at 2 million Rows read/sec (==200K rows read/sec per CPU):

In our search for maximum processing performance we will increase the number to reflect the # Cores by modifying the Data Source Properties. Change the ‘Maximum number of connection’ from 10 into the # Cores in your server. In our test server we have 32 logical- and 32 Hyperthreaded = 64 cores available.
1) # Connections
By default each cube will open up a maximum of 10 connections to a data source. This means that up to 10 partitions are processed at the same time. See picture below: 10x status ‘In Progress- ’ for the AdventureWorks cubes which is slightly enpanded to span multiple years:

Just by changing the number of connections to 64 the processing of 64 partitions in parallel results in an average throughput of over 5 million Rows read/sec, utilizing 40 cores (yellow line)
This seems a great number already but its effective (5 million rows/40 cores =) 125K Rows per core and we do still see a flat line when looking at the effective throughput; this tells us that we are hitting the next bottleneck. Also the CPU usage as visible in Windows Task Manager isn’t at its full capacity yet!

Time to fire up another Xperf or Kernrate session to dig a bit deeper and zoom into the CPU ticks that are spend by the data provider:
Command syntax:
Kernrate -s 60 -w -v 0 -i 80000 -z sqlncli11 -z msmdsrv -z oleaut32 -z sqloledb -nv msmdsrv.exe -a -x -j c:\websymbols > SSAS_trace.txt

This shows an almost identical result as the profiling of a single partition in blog part I.
By profiling around a bit and checking on both the OLEDB and also some SQL native client sessions surprisingly you will find that most of the CPU ticks are spend  on… data type conversions.

The other steps make sense and include lots of data validation; like, while it fetches new rows it checks for invalid characters etc. before the data gets pushed into an AS buffer. But the number 1 CPU consumer, CDataSource::DataConvert is an area that we can optimize!
(To download a local copy of the symbol files yourselves, just install the Windows Debugger by searching the net for ‘windbg download’  and run the symchk.exe utility to download all symbols that belong to all resident processes into the folder c:\websymbols\;
C:\Program Files (x86)\Windows Kits\8.1\Debuggers\x64\symchk.exe /r /ip *  /s SRV*c:\websymbols\* )
2) Eliminate Data type conversions
This is an important topic; if the data types between your data source and the cube don’t match the transport driver will need a lot of time to do the conversions and this affects the overall processing capacity; Basically Analysis Server has to wait for the conversion to complete before it can process the new incoming data and this should be avoided.
Let’s go over an AdventureWorksDW2012 Internet_sales partition as example:

By looking at the table or query that is the source for the partition, we can determine it uses a range from the FactInternetSales table. But what data types are defined under the hood?
To get to all data type information just ‘right click’ on the SSAS Database name and script the entire DB into a new query Editor Window.
Search through the xml for the query source name that is used for the partition, like: msprop:DbTableName="FactInternetSales"

These should match the SQL Server data types; check especially for unsignedByte, short, String lengths and Doubles (slow) vs floats (fast).  (We do have to warn you about the difference between an exact data type like Double vs an approximate like Float here).
A link to a list of how to map the Data types is available here.
How can we quickly check and align the data types best because to go over them all manually one by one isn’t funny as you probably just found out. By searching the net I ran into a really nice and useful utility written by John Tunnicliffe called ‘CheckCubeDataTypes’ that does the job for us; it compares a cube’s data source view with the data types/sizes of the corresponding dimensional attribute. (Kudos John!) But unfortunately even after making sure the datatypes are aligned and running Kernrate again shows that DataConvert is still the number one consumer of CPU  ticks on the SSAS side.
3) Optimize the data types at the source
To proof that this conversion is our next bottleneck we can also create a view on the database source side and explicitly cast all fields to make sure they match the cube definition. (This will also be an option to test environments where you don’t own the cube source & databases)
Maybe as best-practice CAST all columns even if you think the data types are right and exclude also the ones that are not used for processing the Measure group from the View. (For example, to process the FactInternetSales Measure Group from the AdventureWorks2012 DW cube  we don’t need  [CarrierTrackingNumber], [SalesOrderNumber], [PromotionKey] and [CustomerPONumber]) ; every bit that we don’t have push over the wire and process from the database source is a pure win.  Just create a view with the name ‘Speed’ like to give it a try.

(Note: always be careful when changing data types!
For example,  in the picture above,  using the ‘Money’ data type is Okay because it is used for  FactInternetSales, but Money is not a replacement for all Decimals (as it will only keep 4 digits behind the decimal point and doesn’t provide the same range) so be careful when casting data types and double check you don’t lose any data!)
Result: by using the data type optimized Speed view as source the total throughput increased from  5 to 6.6-6.8 Million rows Read/sec and 4600% CPU usage (== 147K rows/CPU).  That’s 36% faster. We’re getting there!
The picture also shows that one of the physical CPU sockets (look at the 2nd line of 16 cores in Numa Node 1) is completely max’d out:

4) Create a ‘Static Speed’ View for testing
If you would like to take the database performance out of the equation something I found useful is to create a static view in the database with all the values pre-populated this way there will still be a few logical reads from the database but significant less physical IO.
1) Copy the original query from the cube:

2) Request just the SELECT TOP (1):

3) Create a Static view:
Add these values to a view named ‘Static_Speed’ and cast them all:

4) Create an additional test partition that queries the new Static_view

5) Copy this test partition multiple times
Create at least as many test partitions equal to the number of cores in your server, or more:
Script the test partition as created in step 4):

Create multiple new partitions from it by just changing the and ; these will run the same query using just the static view. This way you can test the impact of your modifications to the view quickly and at scale!
6) Processing the test partitions
Process all these newly created test partitions who will only query the statics view and  select as many of them or more as the number of CPU’s you have available in your SSAS server.
Determine the maximum processing capacity of your cube server
by monitoring the ‘Rows Read/sec’!

Wrap Up
If you have a spare moment to check out the workload performance counters of your most demanding cube servers you may find that there is room for improvement. If you see flat lines during the Cube processing I hope your eyes will now start to blink; by increasing the number of connections or checking if you don’t spend your CPU cycles on data type conversions you may get a similar of over 3x improvement, like shown in the example above. By looking at the Task Manager CPU utilization where just one of the NUMA nodes is completely max’d out might indicate its time to start looking into some of the msmdsrv.ini file settings…

SSAS cube performance improvement Best methods – Part 1

Recently, with some colleagues, I was working on a project with a serious challenge; there was this Analysis Server 2012 system with 40 physical cores, half a Terabyte of RAM and 10TB of SSD storage waiting to get pushed to its limits but it was installed via the famous ‘next,next finish’ setup approach and we had to tune the box from scratch. Also we had to pull the data from a database running on another box which means the data processing will be impacted by the network round-tripping overhead.
With a few simple but effective tricks for tuning the basics and a methodology on how to check upon the effective workload processed by Analysis Server you will see there’s a lot to gain! If you take the time to optimize the basic throughput, your cubes will process faster and I’m sure, one day, your end-users will be thankful! This Part 1 is about tuning just the processing of a single partition.

Quantifying a baseline

So, where to start? Well to quantify the effective processing throughput, just looking at Windows Task Manager and check if the CPU’s run at 100% full load isn’t enough; the metric that works best for me is the ‘Rows read/sec’ counter that you can find in the Windows Performance monitor MSOLAP Processing object.
Just for fun… looking back in history, the first SSAS 2000 cube I ever processed was capable of handling 75.000 Rows read/sec, but that was before partitioning was introduced; 8 years ago, on a 64 CPU Unisys ES7000 server with SQL- and SSAS 2005 running side by side I managed to process many partitions in parallel and effective process 5+ Million Rows reads/sec (== 85K Rows read/sec per core).

Establishing a baseline – Process a single Partition

Today, with SSAS 2012 your server should be able to process much more data; if you run SQL and SSAS side by side on a server or on your laptop you will be surprise on how fast you can process a single partition;  expect 250-450K Rows read/sec while maxing out a single CPU at 100%.
As an impression of processing a single partition on a server running SSAS 2012 and SQL 2012 side by side using the SQL Server Native Client:  the % processor time of the SSAS process (MSMDSRV.exe) is at 100% flatline. Does this mean we reached maximum processing capacity? Well… no!  There is an area where we will find a lot of quick wins;  lets try if we can move data from A (the SQL Server) to B (the Analysis Server) faster.

100% CPU?

Max’ing out with a flatline on a 100% load == a single CPU may look like we are limited by a hardware bottleneck. But just to be sure lets profile for a minute where we really spend our CPU ticks. My favorite tool for a quick & dirty check is Kernrate (or Xperf if you prefer).
Command line:
Kernrate -s 60 -w -v 0 -i 80000 -z sqlncli11 -z msmdsrv -nv msmdsrv.exe -a -x -j c:\symbols;
Surprisingly more than half of our time isn’t spend in Analysis Server (or SQL server) at all, but in the SQL native Client data provider! Lets see what we can do to improve this.

Quick Wins

1) Tune the Bios settings & Operating system

Quick wins come sometimes from something that you may overlook completely, like checking the BIOS settings of the server. There is a lot to gain there; expect 30% improvement -or more-  if you disable a couple of energy saving options. (its up to you to revert them and save the planet when testing is done…)
For example:
- Enter the Bios Power options menu and see if you can disable settings like ‘Processor Power Idle state’.
- In the Windows Control Panel, set the Server Power Plan to max. throughput (up to Windows 2008R2 this is like pressing the turbo switch but on Windows 2012 the effect is marginal but still worth it).

2) Testing multiple data providers

Like the kernrate profiling shows, a lot of time is spend in the network stack for reading the data from the source. This applies to both side by side (local) processing as well as when you pull the data in over the network.
Since the data provider has a significant impact on how fast SSAS can consume incoming data, lets check for a moment what other choices we have available; just double click on the cube Data Source

Switching from the SQL Native Client to the Native OLE DB\ Microsoft OLE DB Provider for SQL Server brings the best result: 32% higher throughput!

SSAS is still using a single CPU to process a single partition but the overall throughput is significant higher when using the OLE DB Provider for SQL Server:

To summarize; with just a couple of changes the overall throughput per core just doubled!

Reading source data from a remote Server faster

if you run SSAS on a separate server and you have to pull all the data from a database running on another box, expect the base throughput to be significant less due to processing on the network stack and round tripping overhead. The tricks that apply to the side by side processing also apply in this scenario:
1) Process the Partition processing baseline against the remote server.
Less rows are processed when reading from a remote server (see fig.); also the MSMDSRV process is effective utilizing only 1/2 of a CPU. The impact of transporting the data from A to B over the network is significant and worth optimize. Lets focus our efforts on optimizing this first.

2)  Increase the network Packet Size from 4096 bytes  to 32 Kbyte.
Get more work done with each network packet send over the wire by increasing the packet size from 4096 to 32767;  this property can be set via the Data Source – Connection String too; just select on the left ‘All’  and scroll down till you see the ‘Packet Size’ field.

The throughput gain is significant:


When you have a lot of data to process with your SQL Server Analysis Server cubes, every second you spend less in updating and processing may count for your end-users; by monitoring the throughput while processing a single partition from a Measure Group you can set the foundation for further optimizations. With the tips described above the effective processing capacity on a standard server more  than doubled. Every performance gain achieved in the basis will pay back later while processing multiple partitions in parallel and helps you to provide information faster!

In part II we will zoom into optimizing the processing of multiple partitions in parallel.