Want to treat these as interview questions?
There are three questions in this video. Pause the video after I ask each question and practice answering out loud. Then press play and watch the sample answer.
If you’re not up for job interviewing, just watch the video!
Transcript
Question 1
In this example, we have a query that says, ‘SELECT EmpKey from the DimEmployee table, WHERE the LastName is C and the FirstName is G.’ Does SQL Server need to read the leaf page? If we look at the root page here, it has a sample row that lets us know that the EmpKey is 147. Does it have to go read the leaf page at all?
The answer to this is yes, it does need to go read the leaf page. Those values we have on the root page, I don’t think of those as real data. Those are just little indicators or little directioners to say: the data at this point is this value or higher. They aren’t the data itself. We need to go to the leaf to see: what is the information? The root and intermediate pages merely guide us to the leaf. The leaf is where our answers lie.
In this case, that guidepost we have on the root page of the data could have changed, as we saw in the demo, it could have moved. But also, we could have multiple employees whose LastName is C and FirstName is G. We only have sample guideposts at the root page. We have to go to the leaf to see: is this value there, and then if it’s there, is it the only one? There might be more, because we don’t have every single row listed in the root and intermediate pages of our index.
Question 2
This one is about an update. We are updating the Contacts table. This is a different table. We’ve been looking at the Employees table. Imagine that the schema is different, and that we could have multiple emergency contacts for an employee. In the previous table we were looking at, there was just one row for EmergencyContact, but imagine that we have a Contacts table where our employees can have their EmergencyContact listed and our data is not normalized properly.
We do have things like the FirstName and LastName in there for the employee. So, table design aside– I’ll get angry letters about normalization, I welcome those – let’s say we’re doing this update against our poorly-normalized table, and we’re setting Emergency to Zelda where EmployeeKey is 58 and ContactId is 1.
It’s going to find this row and change it to Zelda.
The question is: will this update change the row order, and if so why or why not?
The answer to this one is it will NOT change the row order. Right off, looking at Emergency there, it’s not a key column. It is an included column, and included columns aren’t sorted. Changing the value of an included column will not impact the row order. Now in this case, we were changing a longer value of Jacky Chen to a shorter value of Zelda.
Changing a longer value to a shorter value, we have space on the page. There might be a case though, where we had a column that allowed very long values, and we changed either something that was null or something that had a short value to something that was a much bigger value, and there might not be room on the page for it. In that case, although these columns aren’t sorted, it does have to put the data somewhere, so we might have a page split happen.
We might impact the page, but it will not impact row order. So, updating included columns can have an impact, it is simply not an impact to row order.
Question 3
We didn’t cover this one, this one is moving a little farther past the course, but it’s an important thing to think about!
The question is: are seeks good and are scans bad? Based on what we saw today, you’re like yeah, those seeks are much more efficient, they read fewer pages.
Well, the answer to this one is: seeks are mostly good, but verify, and scans may be good, but verify. That’s sort of an optimistic way of saying they can either be good or bad. When you’re looking at a seek operator, yes it’s going to have a seek predicate on there, but check and look, does it also have non-seek predicates, predicates that don’t have the word seek in front of them? Because a seek operator can technically seek to a group of rows and then apply a filter to every single one.
A seek may also not be efficient
Let’s say I have a table, and I have an index on CompanyId, and CompanyId is the leading key column of my index. I could seek on CompanyId. But what if the table only has data for one company and my seek means I read every single row? Well, that’s not necessarily very meaningful, right? You need to check and see what’s going on in there.
When I look at seeks, I say good, but can you be even better? And look into those non-seek predicates to help you answer that.
Scan operators may be efficient (or not)
Also a scan operator: maybe we need to read all the rows to do what we’re doing, right? Maybe a scan is great because we need to read all the rows.
We actually do have some scans that don’t read all the rows in the table. I know that this is confusing, but we can technically have a scan operator that maybe is a TOP / ORDER BY query. It goes to an index that’s really well-designed for it, scans a limited amount of rows, meets the top qualification, and then just stops and returns what it has. That can show up as a scan operator, so look at the scan.
Do we need the scan? Does it have predicates on it? How many rows does it read? To see, is it efficient or is it NOT efficient?
Thank you
Thanks so much for joining me for this course about how keys and included columns work in SQL Server’s rowstore disk-based indexes
I had tons of fun. I hope you learned stuff, and I’ll see you in a future course.