Pseudo Code Generator

Cargo Loading Volume Prediction Pseudo Code

This pseudo code outlines the steps to preprocess data and create features for a machine learning model aimed at predicting ship cargo loading volumes. It covers data loading, missing value handling, feature extraction, and statistical


Empty image or helper icon

Prompt

import pandas as pd
import numpy as np
import lightgbm as lgb
import xgboost as xgb
import time
from sklearn.metrics import mean_squared_error as mse
from sklearn.model_selection import StratifiedKFold, KFold, GroupKFold

import warnings
warnings.filterwarnings('ignore')

LABEL = '装载量'

df_train = pd.read_csv('./data/船舶装卸货量预测-训练集-20240611.csv', encoding='gbk')
df_test = pd.read_csv('./data/船舶装卸货量预测-测试集X-20240611.csv', encoding='gbk')

df = pd.concat([df_train, df_test])
df['离泊时间'] = df['离泊时间'].replace({' None': np.nan}).astype(float)
df['time_diff'] = df['离泊时间'] - df['进泊时间']

df['进泊时间'] = pd.to_datetime(df['进泊时间'], unit='s')
df['离泊时间'] = pd.to_datetime(df['离泊时间'], unit='s')
#时间特征
for f in ['进泊时间', '离泊时间']:
    df[f + '_hour'] = df[f].dt.hour
    df[f + '_dayofweek'] = df[f].dt.dayofweek
    df[f + '_quarter'] = df[f].dt.quarter
#非数值编码
df['A_le'] = df['船舶类型代码A'].factorize()[0]
df['B_le'] = df['船舶类型代码B'].factorize()[0]
#交叉统计特征
df['AB_le'] = (df['船舶类型代码A'] + '_' + df['船舶类型代码B']).factorize()[0]
for num_f in ['载重吨', 'time_diff', ]:
    for cat_f in ['A_le', 'B_le', 'AB_le']:
        df[cat_f + '_' + num_f + '_mean'] = df.groupby(cat_f)[num_f].transform('mean')
        df[cat_f + '_' + num_f + '_std'] = df.groupby(cat_f)[num_f].transform('std')
        df[cat_f + '_' + num_f + '_max'] = df.groupby(cat_f)[num_f].transform('max')
        df[cat_f + '_' + num_f + '_min'] = df.groupby(cat_f)[num_f].transform('min')

        df[num_f + '_' + cat_f + '_mean'] = df.groupby(num_f)[cat_f].transform('mean')
        df[num_f + '_' + cat_f + '_std'] = df.groupby(num_f)[cat_f].transform('std')
        df[num_f + '_' + cat_f + '_max'] = df.groupby(num_f)[cat_f].transform('max')
        df[num_f + '_' + cat_f + '_min'] = df.groupby(num_f)[cat_f].transform('min')
#位置特征
df['loc0'] = df['泊位位置'].map(lambda x: float(x.split(' ')[0]))
df['loc1'] = df['泊位位置'].map(lambda x: float(x.split(' ')[1]))
#组合特征
df['面积'] = df['船长'] * df['船宽']
df['船舶ID_航次ID_count'] = df.groupby('船舶ID')['航次ID'].transform('count')
df['船舶ID_船舶类型代码A_count'] = df.groupby('船舶ID')['船舶类型代码A'].transform('count')
df['船舶ID_船舶类型代码B_count'] = df.groupby('船舶ID')['船舶类型代码B'].transform('count')

df['船舶ID_group_count'] = df['船舶ID'].map(df['船舶ID'].value_counts())
df['船宽_group_count'] = df['船宽'].map(df['船宽'].value_counts())
df['载重吨_group_count'] = df['载重吨'].map(df['载重吨'].value_counts())

df['longitude/Latitude'] = df['loc0'] / df['loc1']

Answer

Pseudo Code for Ship Cargo Loading Volume Prediction

This pseudo code translates the provided Python script into a structured format, outlining the major steps taken to preprocess the data and generate features for a machine learning task focused on predicting cargo loading volumes.

Imports and Warnings Management

  1. Import Required Libraries:
    • Import data manipulation libraries: Pandas, NumPy
    • Import machine learning libraries: LightGBM, XGBoost
    • Import metrics and validation techniques: Mean Squared Error, StratifiedKFold, KFold, GroupKFold
    • Manage warnings by ignoring them.

Constants

  • Define a constant for the label (target variable): LABEL is set to '装载量'.

Data Loading

  1. Read Training and Testing Data:

    • Load the training dataset from a specified path with appropriate encoding.
    • Load the testing dataset from a specified path with appropriate encoding.
  2. Combine Training and Testing Data:

    • Concatenate training and testing DataFrames into one.

Data Preprocessing

  1. Handle Missing Values and Type Conversion:

    • Replace ' None' in departure time with NaN and convert to float.
    • Calculate the time difference: time_diff as the difference between departure and arrival times.
  2. Convert Timestamps:

    • Convert '进泊时间' and '离泊时间' to datetime format using Unix timestamps.

Feature Engineering

  1. Extract Time Features:

    • For each time variable ('进泊时间', '离泊时间'):
      • Extract hour, day of week, and quarter.
  2. Encode Categorical Variables:

    • Factorize categorical codes (A and B) to create numeric representations.
  3. Generate Statistical Features:

    • For selected numeric features '载重吨' and 'time_diff':
      • For each categorical feature ('A_le', 'B_le', 'AB_le'):
        • Calculate and store mean, standard deviation, max, and min.
  4. Additional Statistical Relationships:

    • For each numeric feature, calculate mean, standard deviation, max, and min against each categorical feature.
  5. Location Feature Extraction:

    • Parse location strings into separate float values (loc0 and loc1).
  6. Combination Features:

    • Calculate area as the product of ship length and width.
    • Count occurrences grouped by ship ID and ship type codes.
  7. Group Counts:

    • For each unique value of ship ID, width, and loaded tonnage, count occurrences.
  8. Longitude/Latitude Ratio:

    • Calculate a derived geographic feature as the ratio of loc0 to loc1.

Pseudo Code Representation

BEGIN

    # Import libraries
    IMPORT pandas AS pd
    IMPORT numpy AS np
    IMPORT lightgbm AS lgb
    IMPORT xgboost AS xgb
    IMPORT mean_squared_error AS mse
    IMPORT StratifiedKFold, KFold, GroupKFold

    # Suppress warnings
    ENABLE warnings
    SET warnings TO ignore

    # Define LABEL
    SET LABEL = '装载量'

    # Load datasets
    df_train = READ_CSV('./data/船舶装卸货量预测-训练集-20240611.csv', encoding='gbk')
    df_test = READ_CSV('./data/船舶装卸货量预测-测试集X-20240611.csv', encoding='gbk')

    # Combine datasets
    df = CONCATENATE(df_train, df_test)

    # Preprocess data
    REPLACE ' None' WITH NaN IN df['离泊时间']
    CONVERT df['离泊时间'] TO float
    df['time_diff'] = df['离泊时间'] - df['进泊时间']

    CONVERT df['进泊时间'] AND df['离泊时间'] TO datetime

    # Extract time features
    FOR each f IN ['进泊时间', '离泊时间']:
        df[f + '_hour'] = EXTRACT HOUR FROM df[f]
        df[f + '_dayofweek'] = EXTRACT DAY OF WEEK FROM df[f]
        df[f + '_quarter'] = EXTRACT QUARTER FROM df[f]

    # Non-numeric encoding
    df['A_le'] = FACTORIZE(df['船舶类型代码A'])
    df['B_le'] = FACTORIZE(df['船舶类型代码B'])

    # Generate cross statistics
    FOR each num_f IN ['载重吨', 'time_diff']:
        FOR each cat_f IN ['A_le', 'B_le', 'AB_le']:
            df[cat_f + '_' + num_f + '_mean'] = GROUPBY(cat_f, df[num_f], APPLY MEAN)
            # Repeat for std, max, min using a similar structure

    # Create location features
    df['loc0'] = MAP(df['泊位位置'], EXTRACTION FUNCTION)
    df['loc1'] = MAP(df['泊位位置'], EXTRACTION FUNCTION)

    # Combination features
    df['面积'] = df['船长'] * df['船宽']
    df['船舶ID_航次ID_count'] = GROUPBY(df['船舶ID'], COUNT(df['航次ID']))
    
    # Repeat for other combination statistics using GROUPBY

    # Calculate longitude/latitude
    df['longitude/Latitude'] = df['loc0'] / df['loc1']

END

Notes

  • Each segment of the code has been translated into its logical progression.
  • The pseudo code is designed to present the process in an understandable manner, facilitating discussions or implementations without direct reference to programming specifics.

Create your Thread using our flexible tools, share it with friends and colleagues.

Your current query will become the main foundation for the thread, which you can expand with other tools presented on our platform. We will help you choose tools so that your thread is structured and logically built.

Description

This pseudo code outlines the steps to preprocess data and create features for a machine learning model aimed at predicting ship cargo loading volumes. It covers data loading, missing value handling, feature extraction, and statistical analysis in a structured format.