Wednesday, August 29, 2012

NuGet Perf Problems, part I

NuGet Perf Problems, part I:
It appears that NuGet has some perf problems recently and Jeff Handley posted the problematic queries as well as the new, hand optimized queries.
You can see the original problematic queries and the optimized code (still tentative) here.
Hand optimized query to load the pages for the packages page:
SELECT        TOP (30)
Paged.PackageRegistrationKey
, Paged.Id
, Paged.Version
, Packages.FlattenedAuthors
, Packages.Copyright
, Packages.Created
, Packages.FlattenedDependencies
, Packages.Description
, PackageRegistrations.DownloadCount
, Packages.ExternalPackageUrl
, N'packages/' + PackageRegistrations.Id + N'/' + Packages.Version AS C1
, Packages.IconUrl
, Packages.IsLatestStable
, Packages.Language
, Packages.LastUpdated
, Packages.LicenseUrl
, Packages.Hash
, Packages.HashAlgorithm
, Packages.PackageFileSize
, Packages.ProjectUrl
, CASE Packages.Listed WHEN 1 THEN Packages.Published ELSE NULL END AS C2
, Packages.ReleaseNotes
, N'package/ReportAbuse/' + PackageRegistrations.Id + N'/' + Packages.Version AS C3
, Packages.RequiresLicenseAcceptance
, Packages.Summary
, CASE WHEN Packages.Tags IS NULL THEN CAST(NULL as varchar(1)) ELSE N' ' + LTRIM(RTRIM(Packages.Tags)) + N' ' END AS C4
, ISNULL(Packages.Title, PackageRegistrations.Id) AS C5
, Packages.DownloadCount AS DownloadCount1
, cast(0 as float(53)) AS C6
FROM (

SELECT Filtered.Id
, Filtered.PackageRegistrationKey
, Filtered.Version
, Filtered.DownloadCount
, row_number() OVER (ORDER BY Filtered.DownloadCount DESC, Filtered.Id ASC) AS [row_number]
FROM (
SELECT PackageRegistrations.Id
, Packages.PackageRegistrationKey
, Packages.Version
, PackageRegistrations.DownloadCount
FROM Packages
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Packages.PackageRegistrationKey
WHERE Packages.IsPrerelease <> cast(1 as bit)
) Filtered
) Paged
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Paged.PackageRegistrationKey
INNER JOIN Packages ON Packages.PackageRegistrationKey = Paged.PackageRegistrationKey AND Packages.Version = Paged.Version
WHERE Paged.[row_number] > 30
ORDER BY PackageRegistrations.DownloadCount DESC
, Paged.Id




This monster query is actually translated to something like:

Give me the top 30 packages which are not pre released, ordered by the download count and then by their id.

It takes a great deal of complexity to deal with that for one major reason, the data is split up across multiple tables in a way that make it hard get all of it easily. The minor reason is that there is really no good way to do paging in SQL Server (shocking, I know). One would assume that such a basic feature would have a bit more attention.
What is worse is the optimized version of the search feature:

SELECT        TOP (30)
Paged.PackageRegistrationKey
, Paged.Id
, Paged.Version
, Packages.FlattenedAuthors
, Packages.Copyright
, Packages.Created
, Packages.FlattenedDependencies
, Packages.Description
, PackageRegistrations.DownloadCount
, Packages.ExternalPackageUrl
, N'packages/' + PackageRegistrations.Id + N'/' + Packages.Version AS C1
, Packages.IconUrl
, Packages.IsLatestStable
, Packages.Language
, Packages.LastUpdated
, Packages.LicenseUrl
, Packages.Hash
, Packages.HashAlgorithm
, Packages.PackageFileSize
, Packages.ProjectUrl
, CASE Packages.Listed WHEN 1 THEN Packages.Published ELSE NULL END AS C2
, Packages.ReleaseNotes
, N'package/ReportAbuse/' + PackageRegistrations.Id + N'/' + Packages.Version AS C3
, Packages.RequiresLicenseAcceptance
, Packages.Summary
, CASE WHEN Packages.Tags IS NULL THEN CAST(NULL as varchar(1)) ELSE N' ' + LTRIM(RTRIM(Packages.Tags)) + N' ' END AS C4
, ISNULL(Packages.Title, PackageRegistrations.Id) AS C5
, Packages.DownloadCount AS DownloadCount1
, cast(0 as float(53)) AS C6
FROM (

SELECT Filtered.Id
, Filtered.PackageRegistrationKey
, Filtered.Version
, Filtered.DownloadCount
, row_number() OVER (ORDER BY Filtered.DownloadCount DESC, Filtered.Id ASC) AS [row_number]
FROM (
SELECT PackageRegistrations.Id
, Packages.PackageRegistrationKey
, Packages.Version
, PackageRegistrations.DownloadCount
FROM Packages
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Packages.PackageRegistrationKey
WHERE ((((Packages.IsPrerelease <> cast(1 as bit)))))
((((AND Packages.IsLatestStable = 1))))
((((AND Packages.IsLatest = 1))))
AND (
PackageRegistrations.Id LIKE '%jquery%' ESCAPE N'~'
OR PackageRegistrations.Id LIKE '%ui%' ESCAPE N'~'

OR Packages.Title LIKE '%jquery%' ESCAPE N'~'
OR Packages.Title LIKE '%ui%' ESCAPE N'~'

OR Packages.Tags LIKE '%jquery%' ESCAPE N'~'
OR Packages.Tags LIKE '%ui%' ESCAPE N'~'
)
) Filtered
) Paged
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Paged.PackageRegistrationKey
INNER JOIN Packages ON Packages.PackageRegistrationKey = Paged.PackageRegistrationKey AND Packages.Version = Paged.Version
WHERE Paged.[row_number] > 30
ORDER BY PackageRegistrations.DownloadCount DESC
, Paged.Id



One thing that immediately popped up to me was the use of queries such as "’%jquery%’. This is a flat out killer for performance in relational databases, since they cannot do any indexes on this and are forced to do a table scan.
I decided to take a stub at moving the NuGet data to RavenDB, which is  a much better fit (in my own obviously utterly unbiased opinion). In the next post, we will start with the actual import process, then we get to actual queries.


DIGITAL JUICE

No comments:

Post a Comment

Thank's!