RE:cord

[SQL] Why UNION? 본문

SQL

[SQL] Why UNION?

beie 2024. 12. 14. 19:42
-----If not using union...

SELECT ProductID,
	SUM(OrderQty) AS TotalQty,
	COUNT(OrderQty) NoOfSales, 
	CASE
		WHEN AVG(OrderQty) < 3 THEN 'Low quantity'
		WHEN AVG(OrderQty) BETWEEN 3 AND 6 THEN 'Reasonable quantity'
		WHEN AVG(OrderQty) > 6 THEN 'High quantity'
		END AS AvgQtyDescribe
FROM Sales.SalesOrderDetail AS SOD
	JOIN Sales.SalesOrderHeader AS SOH
	ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE SOH.OrderDate BETWEEN '2012-05-01' AND '2012-05-31'
GROUP BY SOD.ProductID
ORDER BY AVG(OrderQty) DESC 

-----If using union...

-- Query 1: Calculate Total Quantity and Number of Sales
SELECT 
    SOD.ProductID, 
    SUM(OrderQty) AS TotalQty, 
    COUNT(OrderQty) AS NoOfSales,
    NULL AS AvgQtyDescribe
FROM 
    Sales.SalesOrderDetail AS SOD
    JOIN Sales.SalesOrderHeader AS SOH
    ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE 
    SOH.OrderDate BETWEEN '2012-05-01' AND '2012-05-31'
GROUP BY 
    SOD.ProductID

UNION ALL

-- Query 2: Calculate Average Order Quantity and Description
SELECT 
    SOD.ProductID, 
    NULL AS TotalQty, 
    NULL AS NoOfSales,
    CASE
        WHEN AVG(OrderQty) < 3 THEN 'Low quantity'
        WHEN AVG(OrderQty) BETWEEN 3 AND 6 THEN 'Reasonable quantity'
        ELSE 'High quantity'
    END AS AvgQtyDescribe
FROM 
    Sales.SalesOrderDetail AS SOD
    JOIN Sales.SalesOrderHeader AS SOH
    ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE 
    SOH.OrderDate BETWEEN '2012-05-01' AND '2012-05-31'
GROUP BY 
    SOD.ProductID
ORDER BY 
    AvgQtyDescribe DESC;

< Why Use Union? >

Efficiency of UNION vs. Not Using UNION

You're correct in thinking that using UNION can be less efficient than not using it in certain situations. Here’s why:

  1. Duplicate Removal:
    • When you use UNION, SQL checks for duplicates between the two queries and removes them, which adds an extra processing step. This is especially true if the datasets being combined have a lot of rows or if the DISTINCT operation is performed on large sets of data.
    • In contrast, if you don’t need to combine different result sets but instead can accomplish everything in one query (without needing to combine results), you avoid that unnecessary duplicate-checking overhead.
  2. Complexity and Readability:
    • Using UNION when it's not needed adds complexity to the query. A single query that calculates everything in one go (with proper JOIN, GROUP BY, and CASE statements) is more straightforward and efficient to execute.
  3. Performance:
    • When combining results using UNION, the SQL engine has to do more work, especially if large datasets are involved. This results in a longer execution time compared to a query that processes the data in one pass.

Conclusion:

In your case, if you can achieve the same results with a single query (as you did with the original approach using JOIN, GROUP BY, and CASE), then avoiding UNION is more efficient both in terms of performance and query readability.

Using UNION is more suitable when you need to combine two distinct sets of data, but if both parts of your query are related and can be calculated together, not using UNION will be more efficient.