Subscribe: Mike Diehl's WebLog
Added By: Feedage Forager Feedage Grade B rated
Language: English
attr  azure  career  column  data  database  group  project  query  rows  server  source  sql server  sql  text  time 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Mike Diehl's WebLog

Mike Diehl's WebLog

Much aBlog about nothing...


DB2 timestamps and SQL Server

Thu, 13 Apr 2017 22:44:22 GMT

I've been working with a document database in DB2, getting the document metadata into a SQL Server data warehouse. I've run across some issues with the way DB2 timestamp data types are converted into SQL Server datetime2 when you access the DB2 server via a linked server in SQL. (Aside: I generally don't like using linked servers in SQL, they tend to increase coupling and contribute to "data spaghetti". I'd rather specify a data connection in an SSIS package that uses the DB2 OLE DB provider. However, in this case, for a bunch of reasons I won't go into here, we are using a linked server with the OLE2DB provider.) This is the query that I wanted to execute: select COMPONENTID,ATTR0000001053 as AccountNumber,ATTR0000001040 as CreationDateTime,ATTR0000001054 as Description,ATTR0000001044 as DocumentSource,ATTR0000001043 as DocumentType,ATTR0000001046 as Creator,LASTCHANGEDTS FROM cm.LS.ICMADMIN.ICMUT01024001 where LASTCHANGEDTS > '2017-4-1' At runtime, I would replace the date with the last refresh date. SQL Server recognized the data types of the two columns ATTR0000001040 and LASTCHANGEDTS as datetime2. In DB/2, these are known as the data type timestamp - basically the same as datetime2 (and totally different from the SQL server data type called timestamp). Occasionally I would get this error: Msg 8114, Level 16, State 8, Line 1Error converting data type DBTYPE_DBTIMESTAMP to datetime2. I decided to use OPENQUERY to have more control over what exactly got executed on the DB2 server. This function sends the query directly to the linked server without any local parsing, so I could use DB2 functions to help me with those pesky dates. This SQLServerCentral.Com forum post helped me to figure out how to convert the timestamp in DB2, but it used CAST([SomeDateField] AS varchar(26)), which returned text-formatted dates like this: 2017-03-16-11:04:37.765300 I thought this looked like the "ODBC canonical format", which is style #121 for the CONVERT or TRY_CONVERT functions in SQL. Those dates like above wouldn't convert using the style format 121, and it took me a minute to see why. There is a dash between the date and time in the string. Did you see it? I went on a short goose chase considering how I could use SUBSTRING to extract the parts of the string and put it back together with a space, but I decided there must be a better way in DB2. It turns out there is a helpful DB2 function called VARCHAR_FORMAT that allows you to specify a format string to use when converting from one data type (like timestamp) to VARCHAR. I could specify the format string 'YYYY-MM-DD HH24:MI:SS.NNNNNN' and get the "ODBC canonical" format exactly. varchar_format(ATTR0000001040, 'YYYY-MM-DD HH24:MI:SS.NNNNNN') From the resultset of the OPENQUERY statement, I could use TRY_CONVERT to convert the text-formatted value back to a SQL datetime2. try_convert(datetime2, [ATTR0000001040],121) [CreationDatetime] Try_convert is nice because if it fails to convert, it returns NULL rather than an error. Now my full statement looks like this: SELECT rtrim([COMPONENTID]) AS ComponentId ,ATTR0000001053 as AccountNumber,try_convert(datetime2, ATTR0000001040,121) as CreationDatetime,ATTR0000001054 as Description ,ATTR0000001044 as DocumentSource ,ATTR0000001043 as DocumentType ,ATTR0000001046 as Creator ,try_convert(datetime2,LASTCHANGEDTS,121) as RecordModifiedDatefrom OPENQUERY(cm, 'select COMPONENTID,ATTR0000001053,varchar_format(ATTR0000001040, ''YYYY-MM-DD HH24:MI:SS.NNNNNN'') as ATTR0000001040,ATTR0000001054,ATTR0000001044,ATTR0000001039,ATTR0000001048,ATTR0000001043,ATTR0000001046,varchar_format(LASTCHANGEDTS , ''YYYY-MM-DD HH24:MI:SS.NNNNNN'') as LASTCHANGEDTS FROM LS.ICMADMIN.ICMUT01024001 where LASTCHANGEDTS > ''2017-4-1''') Oddly enough, I don't actually get any NULL values for those two dates after doing it this way. All the text-formatted dates convert successfully. (Yes, I am puzzled by this, but whatever, mission accomplished!) I find that data type conversion issues between systems can often consume a lot of time. Hopefully someone [...]

Speaking at SQL Saturday 594 in Edmonton - Agile Analytics - BI builds and deployments using VSTS

Wed, 05 Apr 2017 14:36:07 GMT

Hey, I'll be in Edmonton on April 22, at the SQL Saturday #594 event.

I'll be speaking on using Visual Studio Team Services build and release systems to automate builds and deployments of BI artifacts.

So, if you currently manually deploy databases, reports, tabular or multidimensional Analysis Services models (cubes), you likely find it either error-prone or highly regimented and with a ton of ceremony.

You can be much more agile and confident in your deployments if you automate them. They become repeatable (by definition) and reliable. Also, when you use source code version control, you gain much better historical insight and change control without imposing significant additional work.

I may stand on my soapbox and shout about the benefits of automated deployments and source code control and such, but I really want to focus on the "how to" so you can take this back to your workplace and implement it and realize the benefits yourselves!

There are three tracks of great speakers all day, and the price is right! (It's free!) Register here.

I hope to see you there!


Trouble finding the Microsoft OLEDB Provider for DB/2 install package?

Fri, 31 Mar 2017 21:35:03 GMT

Today I needed to install the Microsoft OLEDB Provider for DB/2. A quick search brought up this download page:

However, if you click on the Download button, you only get the PDF instructions, not the installer file.

The actual files to download are in the Microsoft SQL Server Feature Pack.

Microsoft® SQL Server® 2012 Feature Pack

If you look at the very bottom of the Install Instructions, you will find two links - for the x86 package and the x64 package.

Microsoft® SQL Server® 2014 Feature Pack

In this one, the files are version 5 of the provider, in the Download link.

This was a pain to find, so I thought I'd blog so that someone could benefit from my pain.


Power Query Pivot Swap

Wed, 29 Mar 2017 22:28:00 GMT

I recently was given a CSV file that listed users and the groups they belonged to, from an Active Directory dump. The person asked me if it I could transform it to show the list of groups and the users that belonged to them. I thought I could, but I wasn't really sure how hard or easy it would be. It turned out to be easier than I thought. I call it a Pivot Swap (leave a comment at the bottom if you have a better name!) If you're the type who doesn't care about the process, and just wants the answer up front, here is the Power Query:let    Source = Excel.CurrentWorkbook(){[Name="Survey"]}[Content],    #"Split Column by Commas" = Table.SplitColumn(Source,"Career Interests",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Career Interests.1", "Career Interests.2", "Career Interests.3", "Career Interests.4", "Career Interests.5" , "Career Interests.6", "Career Interests.7", "Career Interests.8"}),    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Split Column by Commas", {"Respondent"}, "Attribute", "Career Interest"),    #"Removed Attribute Column" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),    #"Grouped Rows by Concat Respondents" = Table.Group(#"Removed Attribute Column", {"Career Interest"}, {{"Respondents", each Text.Combine([Respondent],", "), type text}})in    #"Grouped Rows by Concat Respondents" I've reproduced the scenario using Survey respondents and their career interests. The first column is the name of the respondent, and the second column contains a list of careers they are interested in, separated by commas. The desired result is to have a column that contains the career interests and the second column containing a list of respondents that are interested in them (separated by commas). The first step is to split the second column by commas. That's pretty easy. Note that the wizard will determine how many commas there are in all the rows and create new columns accordingly. If I needed this to be something that would need to run again, and there might be a different maximum number of columns, I'd probably insert some fancy dynamic column stuff that I would learn by reading Chris Webb's blog post "Dynamic Column Selection In Custom Columns In Power BI/Power Query/M" but here I'm just leaving it simple. Ok, now to the kinda cool part - the Unpivot operation. We're going to select the Respondent column, right click, and choose Unpivot Other Columns. This is nice because we don't have to worry about how many career interests were listed by each respondent. Lovely! Now we have a (much longer) table that repeats the Respondent for each of the other columns it had a value in, and puts the column name in the new Attribute column, and the value in the Career Interest column (the default wizard will use the name Value). We don't really care about the order in which they listed their career interests, so we don't need the Attribute column. That's simple to remove. Next is the REALLY COOL part. What I want to do is group the Career Interest column and aggregate the Respondents. This is the part that isn't so obvious. If you select the Career Interest column, right-click and choose Group By, the dialog box only gives you numerical aggregations, like Count, Sum, Avg, etc. There's no option for text concatenation. Ok, we can overcome that! For now, choose Count rows. Notice that the formula bar shows: = Table.Group(#"Removed Attribute Column", {"Career Interest"}, {{"Respondents", each Table.RowCount(_), type number}}) Here's the magic: after you close the dialog box, edit the formula bar to use the Text.Combine function instead of Table.RowCount, and give it the [Respondent] column and ", " as parameters. = Table.Group(#"Removed Attribute Column", {"Career Interest"}, {{"Respondents", each Text.Combine([Respondent],", "), type text}}) It's pretty simple once you know how, right? My original estimat[...]

Scaling Azure SQL databases up and down

Fri, 04 Nov 2016 01:04:00 GMT

I've recently been working on a project where we push data from an on-premise database to a set of  Azure SQL databases daily, and use that Azure SQL database as the source for a Power BI dataset. The dataset gets refreshed once a day. Our Azure SQL data marts are pretty small - all but one of them are under 2 GB, and the largest one is about 3 GB. The ETL and Power BI refresh happen overnight - the ETL starts at 1 am, and the Power BI refresh is scheduled at 3 am. In my SSIS packages, I am loading a few tables that have anywhere from a few thousand to hundreds of thousands of rows in them, and a few tables that have less than 100 rows. For the small tables, I just truncate them and insert the new rows. For the large tables, I use some change detection logic to determine which rows are the same in Azure as in the source, and which ones are new or changed, and which ones should no longer exist in Azure. That pattern will be the subject of another blog post. What I was finding was that I was exceeding my DTU allocation for my Azure SQL databases frequently, and Azure was restricting the SQL database's response times according to its DTU limit. I decided that I could decrease the overnight data refresh window by scaling up the Azure SQL databases before loading them, and scaling them down again after the Power BI refresh was complete. After a failed attempt at using Azure automation and adapting a runbook that uses AzureRM PowerShell scripts to set database properties, I happened across this T-SQL equivalent. ALTER DATABASE MyAzureDb MODIFY (Edition='basic', Service_objective='basic') When you run this statement, it returns immediately. In the background, Azure is making a copy of the current database with the new service objective level, and when the new copy is ready, it swaps out the current database for the new copy. Based on how much activity is happening at the time of the switch, the change will be transparent to any existing connections. Microsoft does warn that some transactions may be rolled back, so I wanted to wait until the scaling request was complete before I started the heavy-duty ETL (even though I did build in retry logic into my SSIS packages). Finding the T-SQL that told me when a scale request was complete was a little more difficult. If you try this: select DATABASEPROPERTYEX('MyAzureDb','Edition') as Edition, DATABASEPROPERTYEX('MyAzureDb','Service_Objective') as ServiceObjective ... you will see the old service objective until the switch occurs. However, if you happen to try another ALTER Database statement to change the objective again while the switch is still happening, you'll get an error message like this: A service objective assignment on server 'yyyyy' and database 'CopyOfXXXX' is already in progress. Please wait until the service objective assignment state for the database is marked as 'Completed'. So I went looking to find this "objective assignment state" which was a little difficult to find. Eventually, I came across the Azure SQL-only view sys.dm_operation_status that tells me all the operations that are applied in Azure SQL. It has a row for every operation, so if you've done a few service objective changes, you'll see a row for each. So basically I needed to find the most recent operation and see if it has an IN_PROGRESS status. with currentDb as ( select *, row_number() over (partition by resource_type, major_resource_id, minor_resource_id order by start_time desc) as rowNum from sys.dm_operation_status)select major_resource_id as DbName, operation, state, state_desc, percent_complete, error_code, error_desc, error_severity, start_timefrom currentdbwhere rowNum = 1 and resource_type_desc = 'Database' You must be in master to get any results back from sys.dm_operation_status. This is what it returns for me: You can see that I kicked off some service objective changes that were in progress when I captured this image. In my s[...]

Power Query for Azure tables and updating column names

Tue, 18 Oct 2016 19:41:03 GMT

I recently did some PowerBI work using Azure Table Storage. In the query editor, it seemed that each table I pulled in from Azure Tables had a very similar process.: Get the list of tables available for the storage account. Select a table Select the Content of the table (excluding the RowKey, PartitionKey, and timestamp columns) Expand the columns in the Content Rename the columns Azure table column names don't allow spaces, but they are case sensitive, so I found that if I was careful to use PascalCasing in my Azure Table columns (which I am careful to do in any storage container), I could adapt the steps above to make a function that would do all those things for me, just by providing the name of the account and the name of the table. let GetAzTable = (account as text, table as text) =>let    Source = AzureStorage.Tables(account),    Table1 = Source{[Name=table]}[Data],    ContentOnly = Table.SelectColumns(Table1,{"Content"}),    ColumnName = Table.ColumnNames(ContentOnly){0},    Content = Table.Column(Table.FirstN(ContentOnly,100),ColumnName),    ColumnsToExpand = List.Distinct(List.Combine(List.Transform(Content,            each if _ is record then Record.FieldNames(_) else {}))),    NewColumnNames = List.Transform(ColumnsToExpand, each NewColumnName(_)),    ExpandedContent = Table.ExpandRecordColumn(ContentOnly, ColumnName, ColumnsToExpand,NewColumnNames)in    ExpandedContent in GetAzTable In this code, I reference another function that generates the new "nice" column name based on the column name in the Azure table. This function doesn't do the name formatting, but it determines if a column should get a nice name or not. I use a naming convention for columns that are meant to be used internally - either as a join key or for a sort column, or for a measure calculation. This function returns the column name "as is" if it follows the naming convention for internal columns, or calls the function GetPascalName to make it a "nice" name. let NewColumnName = (colName as text) as text =>let    Source = if Text.EndsWith(colName, "Key") or Text.EndsWith(colName,"Num") or Text.EndsWith(colName, "Sort") or Text.EndsWith(colName,"SortOrder") or colName = Text.Upper(colName) then colName else GetPascalName(colName)in    Sourcein NewColumnName Here is the cool code: let PascalName = (pascalName as text) as text =>let    Source = Text.ToList(pascalName),    Caps = { "A" .. "Z" },    NewSource = List.Combine(List.Transform(Source, (x) => if List.PositionOf(Caps, x) >= 0 then {" ",x} else {x})),    result = Text.Trim(Text.Combine(NewSource, ""))in    resultin PascalName This code takes a text value and turns it into a list (of individual characters). Then it recombines the list of characters, adding a space before each capital letter. Finally, it trims the leading space. This isn't perfect by all means - it could be improved by only putting a space in front of the first capital in a series of capitals. A column named "FTPUrl" will come back as "F T P Url". (Note that I've already caught the case of a column with ALLCAPS in the NewColumnName function.) Just looking at this edge case, I think the algorithm would need to put a space before the first and the last capital in a series of capitals (" FTP Url", trimmed to "FTP Url"). Care to put your improved algorithm below in the comments? [...]

Power Query function for dense ranking

Tue, 18 Oct 2016 19:06:45 GMT

Recently I was having issues writing a measure in DAX to rank items properly (not an issue with the DAX, but in my understanding of the DAX). I did some searching and came across Chris Webb's article that had a nice implemenation of ranking using Power Query. The technique he proposed uses Power Query to sort a table by the column you want to rank by, add an index column to the table The result of his code would give you a unique ranking number for every row, even if there are ties. He went on to show how you could get grouped rankings, such as by month: you can add a step at the beginning to group by those columns first (using the All Rows aggregation), then ungroup them at the end. The index value gets reset for each group of rows, and within the group, ties get a different rank value. I wanted to adapt his technique but where rows with ties would have the same rank. Also, I wanted rows with a null value in the column to rank by to get a null ranking. Here's my adaptation: Select rows with a null value in the column to rank by into a table Select rows with a non-null value in the column into a separate table Add a new column (with null values) to the table with null rows Group the non-null table by the ranking column add an index column to the grouped table ungroup the table combine the null table with the ungrouped table Step 2 excludes those rows in the table with a null value in the column you want to rank by. Step 1 and 3 keeps these null value rows to put back in the table after the non-null rows have been ranked. Step 4 allows me to have ties with the same ranking value. I wrote it as a function. When I need to rank a table without grouping it, it's a simple call to the RankFunction. When I want to have grouped rankings, I group the table first (using the All Rows aggregation), then use Table.TransformColumns to rank each group's rows. let  RankFunction = (tabletorank as table, sortcolumn as text, newcolumn as text) as table =>  let      NullRows = Table.SelectRows(tabletorank, each (Table.Column(_, sortcolumn) = null)),      NullWithIndex = Table.AddColumn(NullRows, newcolumn, each null, Int64.Type),            NotNullRows =Table.SelectRows(tabletorank, each (Table.Column(_, sortcolumn) <> null)),      DenseGrouping = Table.Group(NotNullRows, {sortcolumn}, {{"Groupedrows", each _, type table}}),      SortRows = Table.Sort(DenseGrouping,{{sortcolumn, Order.Descending}}),       AddIndex = Table.AddIndexColumn(SortRows, newcolumn, 1, 1),      ChangedType = Table.TransformColumnTypes(AddIndex,{{"Employer Rate Rank", Int64.Type}}),            DenseUngrouping = Table.ExpandTableColumn(ChangedType,"Groupedrows",             List.RemoveMatchingItems(Table.ColumnNames(NotNullRows), {sortcolumn})),            AggregateRows =  Table.Combine( { NullWithIndex, DenseUngrouping } )  in      AggregateRowsin RankFunction Here is a sample of how I use the RankFunction to rank an Employer table by their AssessmentRate, first by PayrollYear, then by PayrollYear and Industry: let    Source = GetData("Employer"),    #"Grouped Rows" = Table.Group(Source, {"PayrollYear"}, {{"AllRows", each _, type table}}),    AddedRank = Table.TransformColumns(#"Grouped Rows", {"AllRows", each RankFunction(_, "AssessmentRate","Employer Rate Rank")}),    #"Expanded AllRows" = Table.ExpandTableColumn(AddedRank, "AllRows", {...column names...}, {{...column names...}),    #"Grouped Rows1" = Table.Group(#"Exp[...]

Power Query to Promote Headers and removing Special Characters

Thu, 31 Mar 2016 22:50:21 GMT

I had an Excel spreadsheet that had "nicely" formatted column headers that made the text appear on separate lines (ie. the text included Alt-Enter linefeeds). I wanted to use it as a source for Power Query (and PowerBI), but remove the linefeeds in the names before promoting them to the column header.

I found a post on MSDN forums that did something similar:

In the thread, Ehren posted a snippet that combines the first four lines of the source to become the column header names:

= let
    firstN = Table.FirstN(Source, 4),
    renames = List.Transform(
        each {_, Text.Combine(List.Transform(Table.Column(firstN, _), each Text.From(_)), ",")}),
    renamedTable = Table.RenameColumns(Source, renames)
    Table.Skip(renamedTable, 4)

I modified this slightly to only use the first line, but remove the linefeeds using the special token "#{lf}"

= let
     firstN = Table.FirstN(Source, 1),
     renames = List.Transform(
         each {_, Text.Combine(
                     List.Transform(Table.Column(firstN, _), each Text.Trim(Text.Replace(Text.From(_),"#(lf)"," "))
     renamedTable = Table.RenameColumns(Source, renames)
     Table.Skip(renamedTable, 1)

Ehren's original instructions still apply to using this formula:

Click the little fx button to add a custom step to your query, and paste this in. It handles doing the promotion for all columns in the table without the need for hard-coding. (It assumes the previous step in the query is called Source. Please update it accordingly if that's not the case.)


SQL Server Maintenance Plans - how I use them

Mon, 04 May 2015 22:18:08 GMT

Jonathan Cox (@hackdba) has a good post for starting out on SQL Maintenance Plans.

I like the maintenance plan wizard too, but it doesn't really give good advice about some of the operations.

The critical maintenance pieces are backup and integrity checking. You gotta do both of these. And you need to understand full backups and log backups and database recovery modes (Full & Simple primarily) otherwise you'll end up with a disk full error eventually. The Cleanup task also ensures that you delete your old backup files and don't fill up the backup drive either. I could (and maybe should) write another post about the right kind of SQL backups and recovery models, but lots of people already have. If you're reading this and don't know what I mean, then please google SQL backups and recovery models and educate yourself (before it's too late!).

I don't like to use the maintenance plan for index maintenance, because it's a very large hammer approach to a mostly non-existent problem. Ola Hallengren has a much better approach for index optimization. The default settings for this script are much better choices with regards to choosing when to REORG or REBUILD (and there's no point in doing both).

As far as I'm concerned, updating statistics is optional in most scenarios.

PLEASE PLEASE PLEASE DO NOT shrink databases and/or log files in a maintenance plan.


A pseudo-listener for AlwaysOn Availability Groups for SQL Server virtual machines running in Azure

Thu, 01 Aug 2013 03:47:00 GMT

I am involved in a project that is implementing SharePoint 2013 on virtual machines hosted in Azure. The back end data tier consists of two Azure VMs running SQL Server 2012, with the SharePoint databases contained in an AlwaysOn Availability Group. I used this "Tutorial: AlwaysOn Availability Groups in Windows Azure (GUI)" to help me implement this setup.Because Azure DHCP will not assign multiple unique IP addresses to the same VM, having an AG Listener in Azure is not currently supported.  I wanted to figure out another mechanism to support a "pseudo listener" of some sort. First, I created a CNAME (alias) record in the DNS zone with a short TTL (time to live) of 5 minutes (I may yet make this even shorter). The record represents a logical name (let's say the alias is SPSQL) of the server to connect to for the databases in the availability group (AG). When Server1 was hosting the primary replica of the AG, I would set the CNAME of SPSQL to be SERVER1. When the AG failed over to Server1, I wanted to set the CNAME to SERVER2. Seemed simple enough.(It's important to point out that the connection strings for my SharePoint services should use the CNAME alias, and not the actual server name. This whole thing falls apart otherwise.)To accomplish this, I created identical SQL Agent Jobs on Server1 and Server2, with two steps:1. Step 1: Determine if this server is hosting the primary replica.This is a TSQL step using this script:declare @agName sysname = 'AGTest'set nocount on declare @primaryReplica sysnameselect @primaryReplica = agState.primary_replicafrom sys.dm_hadr_availability_group_states agState   join sys.availability_groups ag on agstate.group_id = ag.group_id   where = @AGname if not exists(   select *    from sys.dm_hadr_availability_group_states agState   join sys.availability_groups ag on agstate.group_id = ag.group_id   where @@Servername = agstate.primary_replica    and = @AGname)begin   raiserror ('Primary replica of %s is not hosted on %s, it is hosted on %s',17,1,@Agname, @@Servername, @primaryReplica) endThis script determines if the primary replica value of the AG group is the same as the server name, which means that our server is hosting the current AG (you should update the value of the @AgName variable to the name of your AG). If this is true, I want the DNS alias to point to this server. If the current server is not hosting the primary replica, then the script raises an error. Also, if the script can't be executed because it cannot connect to the server, that also will generate an error. For the job step settings, I set the On Failure option to "Quit the job reporting success". The next step in the job will set the DNS alias to this server name, and I only want to do that if I know that it is the current primary replica, otherwise I don't want to do anything. I also include the step output in the job history so I can see the error message.Job Step 2: Update the CNAME entry in DNS with this server's name.I used a PowerShell script to accomplish this:$cname = ""$query = "Select * from MicrosoftDNS_CNAMEType"$dns1 = ""$dns2 = ""if ((Test-Connection -ComputerName $dns1 -Count 1 -Quiet) -eq $true){    $dnsServer = $dns1}elseif ((Test-Connection -ComputerName $dns2 -Count 1 -Quiet) -eq $true) {   $dnsServer = $dns2}else{  $msg = "Unable to connect to DNS servers: " + $dns1 + ", " + $dns2   Throw $msg}$record = Get-WmiObject -Namespace "root\microsoftdns" -Query $query -ComputerName $dnsServer  | ? { $_.Ownername -match $cname }$thisServer = [System.Net.Dns]::GetHostEntry("LocalHost").HostName + "."$currentServer = $record.RecordData if ($currentSer[...]

Query Logging in Analysis Services

Thu, 01 Aug 2013 02:32:00 GMT

On a project I work on, we capture the queries that get executed on our Analysis Services instance (SQL Server 2008 R2) and use the table for helping us to build aggregations and also we aggregate the query log daily into a data warehouse of operational data so we can track usage of our Analysis databases by users over time. We've learned a couple of helpful things about this logging that I'd like to share here.First off, the query log table automatically gets cleaned out by SSAS under a few conditions - schema changes to the analysis database and even regular data and aggregation processing can delete rows in the table. We like to keep these logs longer than that, so we have a trigger on the table that copies all rows into another table with the same structure:Here is our trigger code:CREATE TRIGGER [dbo].[SaveQueryLog] on [dbo].[OlapQueryLog] AFTER INSERT AS       INSERT INTO dbo.[OlapQueryLog_History] (MSOLAP_Database, MSOLAP_ObjectPath, MSOLAP_User, Dataset, StartTime, Duration)      SELECT MSOLAP_Database, MSOLAP_ObjectPath, MSOLAP_User, Dataset, StartTime, Duration FROM inserted Second, the query logging process is "best effort" - if SSAS cannot connect to the database listed in the QueryLogConnectionString in the Analysis Server properties, it just stops logging - it doesn't generate any errors to the client at all, which is a good thing. Once it stops logging, it doesn't retry later - an hour, a day, a week, or even a month later, so long as the service doesn't restart.That has burned us a couple of times, when we have made changes to the service account that is used for SSAS, and that account doesn't have access to the database we want to log to. The last time this happened, we noticed a while later that no logging was taking place, and I determined that the service account didn't have sufficient permissions, so I made the necessary changes to give that service account access to the logging database. I first tried just the db_datawriter role and that wasn't enough, so I granted the service account membership in the db_owner role. Yes, that's a much bigger set of permissions, but I didn't want to search out the specific permissions at the time. Once I determined that the service account had the appropriate permissions, I wanted to get query logging restarted from SSAS, and I wondered how to do that? Having just used a larger hammer than necessary with the db_owner role membership, I considered just restarting SSAS to get it logging again. However, this was a production server, and it was in the middle of business hours, and there were active users connecting to that SSAS instance, so I thought better of it.As I considered the options, I remembered that the first time I set up query logging, by putting in a valid connection string to the QueryLogConnectionString server property, logging started immediately after I saved the properties. I wondered if I could make some other change to the connection string so that the query logging would start again without restarting the service. I went into the connection string dialog, went to the All page, and looked at the properties I could change that wouldn't affect the actual connection. Aha! The Application Name property would do just nicely - I set it to "SSAS Query Logging" (it was previously blank) and saved the changes to the server properties. And the query logging started up right away. If I need to get this running again in the future, I could just make a small change in the Application Name property again, save it, and even change it back again if I wanted to.The other nice side effect of setting the Application Name property is that now I can see (and possibly filter for or filter out) the SQL activity in that database that is related to[...]

Joel's Predictions for 2011 (and Mike's comments)

Fri, 31 Dec 2010 21:32:00 GMT

Joel has posted his predictions for 2011. I find his predictions very interesting, mostly because I am crappy at doing predictions myself. However, I am seldom at a loss for commenting on someone else's work:  1. The Kanban Influence: I have seen a little bit of this, and I like what I see. I would like to try to implement this in the project I am currently on, but I think it will take a lot of education of many involved in the project, as most of them don't even know the term. 2. Digital Entertainment Crossing the Chasm: In December 2009, our family purchased a high-def digital cable video recorder (DVR), and a new LCD flatscreen TV. The DVR has completely changed my viewing habits, I seldom watch programs live anymore. I watched the Vancouver olympics on two channels, in near-real-time, by using the two tuners on the DVR and using pause/skip to split our viewing across the channels, and avoid commercials. The new TV has USB connections, so we occasionally watch videos and view photos on the TV as well. We recently received an XBox 360 as a Christmas gift and I am probably going to explore its integration with Windows Media player on our home PCs. 3. Many App Stores: I recently noticed an "App Store" tab in the latest version of uTorrent. It lists the uTorrent add-ons available for download. 4. Kinecting with your PC We also received the Kinect with our Xbox 360, and it significantly changes the gaming experience, I think. Still a little lagging in responsiveness in some games and situations, and really hard to be precise, but it's a huge leap over the WiiMote (which we also have). In terms of using Kinect on the PC, there is already a burgeoning community for this:  6. Mobile Really Race Heats Up: I am following this somewhat, but our house is certainly not "bleeding edge" with our mobile phones. My daughter is on a pay-as-you-go plan, the cheapest service for what she needs - texting and very little voice calls for $15/month. My wife has a relatively old phone, and she is on a pick-5 plan with unlimited texts (no data). What she likes most about the phone is the large number buttons, which makes me wonder about the aging demographic and when the mobility companies will start catering directly to people who shouldn't need to put on their reading glasses to send/read a text or make a phone call. My (non-smart) phone has a full qwerty keyboard (about the size of a Blackberry) and I have still refused to read/send email from my phone. I do find myself using the web occasionally from my phone, but primarily I use my phone for texts and voice (in that order). With those limited phones, I still pay between $120-$150 per month. That seems crazy, and I can't imagine a decent plan for a smart phone would reduce my costs. Add $30-40 per month for my landline that hardly ever gets used anymore, and I have long since concluded that I pay MTS too much (but changing providers probably wouldn't reduce anything either). What I would love as a feature on my phone: voice recognition for texts - I speak and the phone types. Or the phone reads aloud the texts that I receive.   7. Cloud Apps will gain momentum: This is only recently that I have looked at Windows Azure, and it has changed the way I think of software architectures. The development experience "just worked" - things that I thought would be pretty complex to do (configure Visual Studio 2010 to deploy an app to the cloud) worked first try, and pretty darn simply. The large multi-national company that I am currently working on a project for may never use a public cloud for their applications, but they are starting to use Verizon cloud services, and I have already talked to some people there about using the Azure app fabric in a corpo[...]

Imaginet is hiring

Tue, 01 Jun 2010 20:08:00 GMT

We have an immediate need for new staff members!   
  • Project Manager
  • Systems Analysts
  • Test Manager
  • Web Developer
  • Database Admin

Please contact me if you are interested.

SQL Table stored as a Heap - the dangers within

Fri, 30 Apr 2010 16:55:00 GMT

Nearly all of the time I create a table, I include a primary key, and often that PK is implemented as a clustered index. Those two don't always have to go together, but in my world they almost always do. On a recent project, I was working on a data warehouse and a set of SSIS packages to import data from an OLTP database into my data warehouse. The data I was importing from the business database into the warehouse was mostly new rows, sometimes updates to existing rows, and sometimes deletes. I decided to use the MERGE statement to implement the insert, update or delete in the data warehouse, I found it quite performant to have a stored procedure that extracted all the new, updated, and deleted rows from the source database and dump it into a working table in my data warehouse, then run a stored proc in the warehouse that was the MERGE statement that took the rows from the working table and updated the real fact table. Use Warehouse CREATE TABLE Integration.MergePolicy (PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date, Operation varchar(5)) CREATE TABLE fact.Policy (PolicyKey int identity primary key, PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date) CREATE PROC Integration.MergePolicy as begin begin tran Merge fact.Policy as tgtUsing Integration.MergePolicy as SrcOn (tgt.PolicyId = Src.PolicyId) When not matched by Target then Insert (PolicyId, PolicyTypeKey, Premium, Deductible, EffectiveDate)values (src.PolicyId, src.PolicyTypeKey, src.Premium, src.Deductible, src.EffectiveDate) When matched and src.Operation = 'U' then Update set PolicyTypeKey = src.PolicyTypeKey,Premium = src.Premium,Deductible = src.Deductible,EffectiveDate = src.EffectiveDate When matched and src.Operation = 'D' then Delete ;delete from Integration.WorkPolicy commit end Notice that my worktable (Integration.MergePolicy) doesn't have any primary key or clustered index. I didn't think this would be a problem, since it was relatively small table and was empty after each time I ran the stored proc. For one of the work tables, during the initial loads of the warehouse, it was getting about 1.5 million rows inserted, processed, then deleted. Also, because of a bug in the extraction process, the same 1.5 million rows (plus a few hundred more each time) was getting inserted, processed, and deleted. This was being sone on a fairly hefty server that was otherwise unused, and no one was paying any attention to the time it was taking. This week I received a backup of this database and loaded it on my laptop to troubleshoot the problem, and of course it took a good ten minutes or more to run the process. However, what seemed strange to me was that after I fixed the problem and happened to run the merge sproc when the work table was completely empty, it still took almost ten minutes to complete. I immediately looked back at the MERGE statement to see if I had some sort of outer join that meant it would be scanning the target table (which had about 2 million rows in it), then turned on the execution plan output to see what was happening under the hood. Running the stored procedure again took a long time, and the plan output didn't show me much - 55% on the MERGE statement, and 45% on the DELETE statement, and table scans on the work table in both places. I was surprised at the relative cost of the DELETE statement, because there were really 0 rows to delete, but I was expecting to see the table scans. (I was beginning now to suspect that my problem was because the work table was being stored as a heap.) Then I turned on STATS_IO and ran the sproc again. The output was quite interesting.Table 'Worktable'. Scan count 0, logical re[...]

SSIS Bulk Insert task that bit me in the butt...

Fri, 02 Oct 2009 17:22:00 GMT

I've been working on SSIS packages that extract data from production databases and put them into data warehouses, and recently I hit an issue using the Bulk Insert task that bit me real good.  When you create a Bulk Insert task in the control flow of your package, the properties you generally edit are: 1. The target connection (which references a connection manager) 2. The target table 3. The source file (which references a file-type connection manager).  I did that, ran the package in Visual Studio with my local file against a dev SQL database on a test server and it all worked just fine. I ran it again, and it failed, due to a primary key violation - so I needed to make the execution of the task conditional, so long as the table was empty, I would run the task, otherwise if it contained anything, I would skip the task. This was harder to do than I thought it would be. I started by creating a variable to hold the row count of the table, then an Execute Sql Task to run a statement on the target table (select count(*) as RowCount from targetTable) and set the variable value to the column in the resultset of the statement. Now I go to look for an IF construct and there isn't any such thing. The closest was a For Next loop that I went down a rabbit-trail trying to use, and having it execute only zero or once, and I couldn't get that to work. Is there magic between the @variable syntax in the initialize, condition, and iteration expressions and the package user:variable declarations that make those work together? I still don't know the answer to that. Then I thought of using the Expression on the dependency arrow from the task that got the row count from the target table. So I joined the Row Count task to the Bulk Insert task using the green arrow, then edited the dependency to be dependent both on Success of the row count task and the value of the user:rowCount variable I had created. That worked.  Believe it or not, that isn't really what bit me in the butt. Now I had a package that I could execute multiple times and it would work properly. My buddy Jeremy would say that it is "idempotent".  What bit me was when I went to execute the package in another environment.  I moved the .dtsx file to a test server and used the Execute Package Utility. I set the values for the connection managers in the package to the new server connections (and the new location of the bulk copy file), and ran the package, and it worked. Just to make sure it was "idempotent", I ran it again. It failed this time. Another PK violation. Why? It took me a while to find the problem. Eventually it came down to the target table property of the Bulk Insert Task - the value of this property was not just a two part table name, but it also included the database name. It just so happened that the database I was testing with from my Visual Studio is on the same server as when I was testing with the Execute Package utility. So, the first time I ran it with the Execute Package utility with the modified connection manager settings, it was querying the *real* target database for the number of rows, and getting back 0. Then it executed the bulk insert task into the *original* database I was testing with on the server (that I happened to clear out the rows from the table), and the bulk insert worked. The second time, the number of rows was still 0, and it tried to do the bulk insert into the same database, despite the fact that the connection manager was pointing to a different database. I can understand why this was done this way, since when you use the bcp command line utility, you need to database-qualify the table you are moving d[...]

SQL Database diagramming and VSTS Data Dude

Wed, 29 Jul 2009 00:14:00 GMT

At Imaginet, we use Visual Studio Team Edition for Database Professionals (Data Dude) on our projects to manage database schemas, keep them in source control, unit testing, and lots of other nice features. But it doesn't do database models well. Or at all, for that matter. I really would like the Database Diagramming tool in SQL Management Studio and Visual Studio to be able to go against a database project. But no, it can only go against an actual database. Here is what we do to be able to model our tables and relationships with the diagramming tool and still use Data Dude. For every project, we have a number of database "instances" - usually named after the project (I'll use the name Northwind from here on) with a suffix for the "environment", such as Northwind_Dev and Northwind_Test. We also have another called Northwind_Schema, which is considered the "gold" standard for the schema of the project database. I'll start by creating that schema database and create tables in it using the database diagramming tool in SSMS. I can fairly quickly create a number of tables, and have a diagram for each subject area of the data. It also means my documentation is getting built at the same time as my database (in my world, the diagram forms the large part of the required database documentation). And these diagrams, like Xml comments in C# or VB, are also very close to "the code", and will keep current with the state of the schema database. Models created in other tools then exported to a database are very hard to keep accurate in the long run. When it comes time to snapshot the documentation for the database, we can fairly quickly embed pictures of the database models in Word or OneNote or some other documentation tool. At the same time as I am modelling the database in Northwind_Schema, I create a database project in Visual Studio called Northwind. If I have the Northwind_Schema database in a state that I like (for first draft), I will use the Import Schema from Database wizard when creating the new database project. Otherwise, I'll just create an empty database project. When I am happy with Northwind_Schema, I use a Schema Comparison to compare the Northwind_Schema database to the Northwind database project. I will update the database project with the changes that are in Northwind_Schema, then run any local tests against the database project before checking in. Upon checkin, we have Team System automatically build the database and deploy it to Northwind_Dev, which is available for any developers on the project to use as they code other areas of the project. In the project I am working on now, we use LINQ and CSLA-based entities for our data access layer, so I will keep our LINQ model synchronized with the database project as well (usually by dragging tables onto the LINQ designer surface from the Northwind_Schema database). If we ever lose Northwind_Schema, it is easy to rebuild it from the database project, because the database project in source control is "more true" than the Northwind_Schema instance. (However, we can lose the diagrams by rebuilding Northwind_Schema). As I said above, I would actually prefer to do my diagramming in Visual Studio, against a database project rather than a database, and in that way I could also keep the diagrams in source control. But with the Northwind_Schema database, I can model new subject areas or do fairly major refactoring prior to checking out the database project files. In my next post, I'll talk about how we build and manage stored procedures in project databases.[...]

MS BI Conference: Monday Keynote

Tue, 07 Oct 2008 01:31:00 GMT

Here are my notes on the Monday morning keynote: About 3000 attendees at the conference, over 60 countries represented. There is BI in Halo3: whenever you look at competitor stats or weapon effectiveness, this is implemented using BI tech Madison - MS has acquired DATAllegro, a company that was accomplishing low TCO MPP (massively parallel processing) scale out of BI. Using standard enterprise servers, you can process queries on very large data warehouse databases very quickly. They demonstrated a hardware setup of a MPP cluster: one control node, 24 compute nodes, and at least as many storage nodes (ie. shared disks). They loaded 1 trillion (yes trillion) rows in the fact table, and a bunch of dimension tables, such that the data warehouse contained over 150 TeraBytes of data. Then they sliced the fact table up onto the 24 SQL instances on the compute nodes (each compute node then had 1/24 of the trillion rows) and replicated the dimension tables to all compute nodes. Using SQL 2008 (and its new star join optimization) they then issued a query on the fact table and the related dimension tables to the cluster, where the control node passed the query along to the compute nodes, they each processed it, and returned the results back to the client. On one screen they had Reporting Services (the client app) and on another, a graphic display of the CPU and disk stats for the control node and all 24 compute nodes, each node having 8 CPUs. When the Report was being displayed, the query got processed, and you could see the CPU usage go up on many of the nodes, then disk usage on each of the nodes, then the activity would subside and the reporting view would then display the results. It was all done in under 10 seconds. It was truly impressive. Now, that was with essentially read only data, so you could probably "roll your own" MPP system, given the time and hardware. It's not a huge technical problem to scale out read-only data. If they could show the same demonstration except with a SSIS package *loading* a trillion rows into the cluster, that would have been astounding - it's a much different and more difficult problem. Still, I was impressed. Gemini - this is "BI Self Service" - the first evidence of this is an Excel addin that the always-entertaining Donald Farmer demonstrated. He used the addin to connect to a data warehouse and in a spreadsheet showed 20 million rows. We didn't *see* all 20 million rows, but he did sort it in under a second, and then filtered it (to UK sales only, about 1.5 million rows) in under a second. That performance and capacity was on what he said was a <$1000 computer with 8 GB RAM, similar to what he purchased for home a few weeks ago. Aside from the jaw-dropping performance, he used the addin to dynamically link the data from analysis services with another spreadsheet of user-supplied data (I think it was "industry standard salary" or something). The add-in was able to build a star-schema in the background automatically and then make it available in the views they wanted in Excel  ( a graph or something? I can't remember). So it was showing the fact that sometimes the data warehouse doesn't have all the data needed for users to make decisions, so they got the data themselves, rather than wait for IT to get it in the DW. Ok, cool. So then he published that view into Sharepoint using Excel services, and the user-supplied data went along with it. So centrally publishing that view means it can be utilized by others in the enterprise, rather than sharing via email or a file share or something. From the IT perspective, he showed a management view (dashboard) [...]

MS BI Conference 2008 - First Impressions

Mon, 06 Oct 2008 21:24:00 GMT

It has been over a year since I last blogged, but I want to restart with some posts about the BI Conference I am attending this week.

Chris and I flew to Vancouver yesterday and drove down to Seattle in a Camry Hybrid. Sitting in the lineup at the US border for an hour drained the batteries on the Camry so it had to restart the engine to recharge a couple of times, for about 10 minutes each time. Seemed odd to discharge so much battery just sitting in a lineup and moving 10 feet every five minutes. Anyway...the trip display shows that our fuel efficiency was under 8 liters/100km on the trip down. That also seems a little poor compared to my Golf TDI that gets 4.5 liters/100 km regularly.

We registered last night and wandered the Company Store for a bit - saw uber-geek stuff there and we thought of getting something for Cam, our uber-geek on the team at Imaginet. The conference package was predictable: a nice back-pack, a water bottle, a pen, a 2 GB USB stick, a SQL Server magazine, not as many sales brochures as last year, and a conference guidebook.

Last year's guide book was a small coil bound notebook with a section of blank pages at the end for taking notes. This year's edition has the same content - a description of all the sessions and keynotes and speakers, as well as sponsor ads, but it is missing the note-taking section. I really liked that section last year, so today I found myself scribbling notes on loose paper, and running out. I specifically left my (paper) notebook at home because I liked the smaller conference book instead, but now I am going back to the Company Store to buy a small notebook for the rest of the sessions.

The conference is trying to be more environmentally friendly - in the backpack was a water bottle and they encouraged you to refill that at the water stations rather than having bottled water. That's cool. For me, I would have preferred a coffee mug, since I had three cups of coffee over the day (in paper cups, and no plastic lids). In a strange twist, the breakfast and lunch dishes were on paper plates and not the real dishes like last year - one step forward, two steps back I guess. I can't figure it out.

In the main hall before the keynote address, there was an live band playing 80's hits. They were pretty good, but it seemed odd to have a bouncy energetic group on stage at 8:30 on a Monday morning, everyone was filing in and sitting down, morning coffee still just starting to kick in. The bass player was one or two steps beyond bouncy-happy. It reminded me of someone on a Japanese game show.


How to rename a Build Type in Team System (and a suggested naming convention)

Thu, 24 May 2007 14:33:00 GMT

I suppose this might be in the manual, but...

 If you want to rename a Build Type that you have created in a Team System Project, you need to open the Source Control Explorer window, dig down into the TeamBuildTypes folder under the project, and rename the folder that corresponds to the build type you want to change. After you check in that change, refresh the Team Builds folder in Team Explorer and you'll see your newly named Build Type.

 Remember to change any scheduled tasks you may have created to run your builds automatically.

One more thing about naming Build Types - because we like to have an email sent out to the team members after a build, we have found that a naming convention for the build types helps make it easier to easily recognize and organize the build notifications. We use a standard that includes the environment, the Team Project name, and the sub-solution as the name of the build. So we have build names like

 DEV Slam Customer Website - This builds the CustomerWebsite.Sln in the $\Slam\DEV branch.

QA Slam Customer Website - This builds the CustomerWebsite.Sln in the $\Slam\QA branch.

DEV Slam Monitor Service - This builds the MonitorService.Sln in the $\Slam\DEV branch.

QA Slam Monitor Service  - This builds the MonitorService.Sln in the $\Slam\QA branch.

 Having the project name in the build type helps because if you are a subscriber of lots of different builds for different projects, you cannot tell by looking at the email (other than this naming convention) which project the build is from.


MS BI Conference 14: Evening Reception

Fri, 11 May 2007 04:26:00 GMT

The evening reception was at the Experience Music Project/SciFi Museum Hall of Fame. Gary and I walked through the SciFi Museum. It was really great, lots of memorabilia from all the TV series, movies, as well as books, comics, magazines, scripts, photos, videos. Really cool. The only underrepresentaed Sci Fi series was Dr. Who - I saw one thing from that, the "Fun Gun". No Daleks. Gary has read a lot of sci-fi I found out.

 I thot it was going to be a banquet and awards ceremony - they had awards, but it was more like a standup reception. No tables except in a tent outside. It was pretty stuffy inside, so I hung out with Gary in the fresh air (well, he was smoking, and so was a lot of others around me, but for the most part it was fresh air).

It would have been a great night to have my wife along - she would have loved the sci-fi museum (and the Experience Music Project, very little of which I saw), and she would have been a classy-looking woman to have with me too. We'll come see this place another time.

Tomorrow morning keynote is Steve Ballmer. Somehow the chant "Business Intelligence, Business Intelligence, Business Intelligence" or "Information worker, information worker, information worker" doesn't really roll off the tongue. What will be his hook tomorrow?