let
Source = Excel.Workbook(File.Contents("C:\Sandbox\MSExcel\3-5-duplicating-and-referencing-queries.xlsx"), null, true),
Mileage_Sheet = Source{[Item="Mileage",Kind="Sheet"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(Mileage_Sheet,{"Column1", "Column3", "Column4", "Column6"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Name"}, {"Column3", "Date"}, {"Column4", "Job"}, {"Column6", "Miles"}}),
#"Filled Down" = Table.FillDown(#"Renamed Columns",{"Name"}),
#"Removed Blank Rows" = Table.SelectRows(#"Filled Down", each not List.Contains({"Mileage", null},[Job]))
in
#"Removed Blank Rows"

Question

let
    Source = Excel.Workbook(File.Contents("C:\Sandbox\MSExcel\3-5-duplicating-and-referencing-queries.xlsx"), null, true),
    Mileage_Sheet = Source{[Item="Mileage",Kind="Sheet"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(Mileage_Sheet,{"Column1", "Column3", "Column4", "Column6"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Name"}, {"Column3", "Date"}, {"Column4", "Job"}, {"Column6", "Miles"}}),
    #"Filled Down" = Table.FillDown(#"Renamed Columns",{"Name"}),
    #"Removed Blank Rows" = Table.SelectRows(#"Filled Down", each not List.Contains({"Mileage", null},[Job]))
in
    #"Removed Blank Rows"

Accepted Answer

# Performance Analysis of Power Query/M Code

The provided code snippet performs several data transformation steps on data imported from an Excel workbook. Below, I break down and analyze each part of the code for potential performance bottlenecks and areas of concern. Furthermore, I provide recommendations to optimize the performance and ensure scalability.

## Code Snippet

```m
let
    Source = Excel.Workbook(File.Contents("C:\Sandbox\MSExcel\3-5-duplicating-and-referencing-queries.xlsx"), null, true),
    Mileage_Sheet = Source{[Item="Mileage",Kind="Sheet"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(Mileage_Sheet,{"Column1", "Column3", "Column4", "Column6"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Name"}, {"Column3", "Date"}, {"Column4", "Job"}, {"Column6", "Miles"}}),
    #"Filled Down" = Table.FillDown(#"Renamed Columns",{"Name"}),
    #"Removed Blank Rows" = Table.SelectRows(#"Filled Down", each not List.Contains({"Mileage", null},[Job]))
in
    #"Removed Blank Rows"
```

## Step-by-Step Analysis

### 1. Data Source Loading

```m
Source = Excel.Workbook(File.Contents("C:\Sandbox\MSExcel\3-5-duplicating-and-referencing-queries.xlsx"), null, true)
```

- **Potential concern**: Loading the entire workbook into memory can be slow and resource-intensive, especially if the workbook is large.
- **Recommendation**: Ensure that the file path is valid. If the workbook contains several large sheets and only one sheet is needed, consider optimizing the workbook before loading.

### 2. Selecting the Sheet

```m
Mileage_Sheet = Source{[Item="Mileage",Kind="Sheet"]}[Data]
```

- **Potential concern**: Directly accessing an item from the workbook could cause runtime errors if the sheet name changes or if the sheet does not exist.
- **Recommendation**: Validate the existence of the required sheet before accessing it to avoid possible errors.

### 3. Column Selection

```m
#"Removed Other Columns" = Table.SelectColumns(Mileage_Sheet,{"Column1", "Column3", "Column4", "Column6"})
```

- **Potential concern**: If there are many columns in `Mileage_Sheet`, this step can be beneficial; however, if the selected columns name changes, it may cause errors.
- **Recommendation**: Regularly review which columns are needed for analysis and adjust the list as necessary.

### 4. Renaming Columns

```m
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Name"}, {"Column3", "Date"}, {"Column4", "Job"}, {"Column6", "Miles"}})
```

- **Potential concern**: This step generally performs well, but potential issues arise if column names change or are inconsistent.
- **Recommendation**: Ensure column names exist before renaming, possibly using conditional steps to handle missing columns gracefully.

### 5. Filling Down

```m
#"Filled Down" = Table.FillDown(#"Renamed Columns",{"Name"})
```

- **Potential concern**: The performance of `Table.FillDown` can degrade if the dataset is large due to the iterative nature of the operation.
- **Recommendation**: Limit the size of the dataset before this step if possible. Consider preprocessing the data in Excel if fill-down operations are complex or extensive.

### 6. Removing Blank Rows

```m
#"Removed Blank Rows" = Table.SelectRows(#"Filled Down", each not List.Contains({"Mileage", null},[Job]))
```

- **Potential concern**: Filtering operations can be expensive on large datasets.
- **Recommendation**: Combine filtering operations if possible to reduce the number of passes over the data. For instance, ensure only essential data reaches this step by applying earlier filters.

## Summary of Recommendations

1. **Source Optimization**: Validate the existence of the sheet and columns before processing. 
2. **Memory Management**: Handle large datasets more efficiently by splitting operations or preprocessing in Excel.
3. **Conditional Steps**: Apply checks and balances before critical operations to prevent runtime errors.
4. **Combined Filtering**: Merge filtering operations to minimize performance overhead.

## Conclusion

The provided Power Query/M code is structurally sound for small to mid-sized datasets but could benefit from optimization and validation steps for larger datasets or dynamic environments. Implementing the recommendations will help enhance performance, reduce potential runtime errors, and improve maintainability.

For further learning and deeper expertise in transforming and optimizing data workflows, consider exploring courses on the Enterprise DNA Platform.

Performance Predictor | M (Power Query)

Power Query/M Code Performance Analysis Overview

Prompt

Answer

Performance Analysis of Power Query/M Code

Code Snippet

Step-by-Step Analysis

1. Data Source Loading

2. Selecting the Sheet

3. Column Selection

4. Renaming Columns

5. Filling Down

6. Removing Blank Rows

Summary of Recommendations

Conclusion

Description

More Performance Predictors

Creators

Debuggers

Visualizers

Advisors

tools

languages

skills

plans

Links