Power Query and DAX Data Analysis
Description
This thread delves deeply into deduplication using Power Query based on conditions and data retrieval in Power BI using DAX.
Deduplication with Power Query based on specific conditions
Tool: Tools Advisor
Created: 05/23/2024
Prompt
Need help in Power Query Removing Duplicates, below is the logic for duplicates removal:
[PS Status] = Complete & Sent and [Date Out] is in Current Month then Keep the row which is latest. If [Date Out] is null check the Report Date and keep the latest row based on [Report Date] column.
[PS Status] = Closed then check the [Date Out] if its null check [No Bid Date] and if itās in current month keep that row otherwise check the Report Date and keep the latest Report Date Row.
Thank you!
Problem Analysis
The user needs help in Power Query to remove duplicates based on specific conditions related to columns such as [PS Status], [Date Out], [No Bid Date], and [Report Date].
Strategic Solution Development
For [PS Status] = Complete & Sent:
- Keep the row with the latest date in [Date Out] if it's in the current month.
- If [Date Out] is null, keep the row with the latest date in [Report Date].
For [PS Status] = Closed:
- If [Date Out] is not null, keep the row with the latest date in [Date Out].
- If [Date Out] is null, check [No Bid Date]:
- If [No Bid Date] is in the current month, keep that row.
- Otherwise, keep the row with the latest date in [Report Date].
Comprehensive Advice
- Load your data into Power Query.
- Create a custom column to derive the latest date based on the conditions provided for each status.
- Use this custom column to filter out duplicates, keeping only the rows with the latest dates as determined above.
- Remove duplicates based on the custom column.
Code-Based Solution
let
Source = ... , // Add your data source here
// Create a custom column based on the logic provided
AddedCustom = Table.AddColumn(Source, "CustomLatestDate", each
if [PS Status] = "Complete & Sent" then
if Date.Month([Date Out]) = Date.Month(DateTime.LocalNow()) then [Date Out] else [Report Date]
else if [PS Status] = "Closed" then
if [Date Out] <> null then [Date Out]
else if Date.Month([No Bid Date]) = Date.Month(DateTime.LocalNow()) then [No Bid Date]
else [Report Date]
else null),
// Remove duplicates keeping the row with the latest date
RemovedDuplicates = Table.Distinct(AddedCustom, {"CustomLatestDate"})
in
RemovedDuplicates
Detailed Explanation
- The solution provided uses Power Query M language to create a custom column calculating the latest date based on the conditions specified for each status.
- By adding this custom column, you can then remove duplicates by keeping only the rows with the latest dates according to the logic outlined.
- Make sure to replace the ellipsis (...) in the code with your actual data source and adjust column names accordingly to match your dataset.
More Code Generators
Apache Flink Code GeneratorApache Pig Code GeneratorAzure Data Factory Code GeneratorC/C++ Code GeneratorCouchDB Code GeneratorDAX Code GeneratorExcel Code GeneratorFirebase Code GeneratorGoogle BigQuery Code GeneratorGoogle Sheets Code GeneratorGraphQL Code GeneratorHive Code GeneratorJava Code GeneratorJavaScript Code GeneratorJulia Code GeneratorLua Code GeneratorM (Power Query) Code GeneratorMATLAB Code GeneratorMongoDB Code GeneratorOracle Code GeneratorPostgreSQL Code GeneratorPower BI Code GeneratorPython Code GeneratorR Code GeneratorRedis Code GeneratorRegex Code GeneratorRuby Code GeneratorSAS Code GeneratorScala Code GeneratorShell Code GeneratorSPSS Code GeneratorSQL Code GeneratorSQLite Code GeneratorStata Code GeneratorTableau Code GeneratorVBA Code Generator