Thread

Power Query and DAX Data Analysis

This thread delves deeply into deduplication using Power Query based on conditions and data retrieval in Power BI using DAX.

Empty image or helper icon

Power Query and DAX Data Analysis

Description

This thread delves deeply into deduplication using Power Query based on conditions and data retrieval in Power BI using DAX.

Deduplication with Power Query based on specific conditions

Tool: Tools Advisor

Created: 05/23/2024

Prompt

Need help in Power Query Removing Duplicates, below is the logic for duplicates removal:

[PS Status] = Complete & Sent and [Date Out] is in Current Month then Keep the row which is latest. If [Date Out] is null check the Report Date and keep the latest row based on [Report Date] column.

[PS Status] = Closed then check the [Date Out] if its null check [No Bid Date] and if itā€™s in current month keep that row otherwise check the Report Date and keep the latest Report Date Row.

Thank you!

Problem Analysis

The user needs help in Power Query to remove duplicates based on specific conditions related to columns such as [PS Status], [Date Out], [No Bid Date], and [Report Date].

Strategic Solution Development

  1. For [PS Status] = Complete & Sent:

    • Keep the row with the latest date in [Date Out] if it's in the current month.
    • If [Date Out] is null, keep the row with the latest date in [Report Date].
  2. For [PS Status] = Closed:

    • If [Date Out] is not null, keep the row with the latest date in [Date Out].
    • If [Date Out] is null, check [No Bid Date]:
      • If [No Bid Date] is in the current month, keep that row.
      • Otherwise, keep the row with the latest date in [Report Date].

Comprehensive Advice

  1. Load your data into Power Query.
  2. Create a custom column to derive the latest date based on the conditions provided for each status.
  3. Use this custom column to filter out duplicates, keeping only the rows with the latest dates as determined above.
  4. Remove duplicates based on the custom column.

Code-Based Solution

let
    Source = ... ,  // Add your data source here

    // Create a custom column based on the logic provided
    AddedCustom = Table.AddColumn(Source, "CustomLatestDate", each 
        if [PS Status] = "Complete & Sent" then 
            if Date.Month([Date Out]) = Date.Month(DateTime.LocalNow()) then [Date Out] else [Report Date]
        else if [PS Status] = "Closed" then 
            if [Date Out] <> null then [Date Out]
            else if Date.Month([No Bid Date]) = Date.Month(DateTime.LocalNow()) then [No Bid Date]
            else [Report Date]
        else null),

    // Remove duplicates keeping the row with the latest date
    RemovedDuplicates = Table.Distinct(AddedCustom, {"CustomLatestDate"})
in
    RemovedDuplicates

Detailed Explanation

  • The solution provided uses Power Query M language to create a custom column calculating the latest date based on the conditions specified for each status.
  • By adding this custom column, you can then remove duplicates by keeping only the rows with the latest dates according to the logic outlined.
  • Make sure to replace the ellipsis (...) in the code with your actual data source and adjust column names accordingly to match your dataset.