Prompt
Extend Here's a tricky one for you, Code Extender AI tool. Generate a code snippet to perform the following complex operations using Google BigQuery SQL:
1. Create a table in Google BigQuery with the name `employee_infos`. This table should include the following fields: `employee_id` (numeric), `first_name` (string), `last_name` (string), `employment_date` (date), `department` (string).
2. Insert some mock data into the `employee_infos` table for testing.
3. Write a query to select all records from the `employee_infos` where `employment_date` is on or after '2020-01-01'.
4. Group the selected records by the `department` field, and for each department output, the total number of employees and the earliest `employment_date`.
5. Define a user-defined function within this SQL that calculates the duration of employment in years for each employee and use it within the SQL query you wrote above to also include the employment duration in the final output.
Remember, all these should be accomplished using Google BigQuery SQL. by
Answer
-- Creating the employee_infos table
CREATE TABLE employee_infos (
employee_id NUMERIC,
first_name STRING,
last_name STRING,
employment_date DATE,
department STRING
);
-- Inserting mock data into the employee_infos table
INSERT INTO employee_infos (employee_id, first_name, last_name, employment_date, department)
VALUES (1, 'John', 'Doe', '2020-01-15', 'Marketing'),
(2, 'Jane', 'Smith', '2020-02-20', 'HR'),
(3, 'Mike', 'Johnson', '2019-12-10', 'Finance'),
(4, 'Emily', 'Williams', '2020-05-03', 'IT');
-- Query to select all records with employment_date on or after '2020-01-01'
SELECT *
FROM employee_infos
WHERE employment_date >= '2020-01-01';
-- Query to group records by department and calculate total number of employees and earliest employment_date
SELECT department, COUNT(*) as total_employees, MIN(employment_date) as earliest_employment_date
FROM employee_infos
WHERE employment_date >= '2020-01-01'
GROUP BY department;
-- User-defined function to calculate employment duration in years
CREATE TEMP FUNCTION calculate_duration(employment_date DATE) AS (
DATE_DIFF(CURRENT_DATE(), employment_date, YEAR)
);
-- Final query including employment duration
SELECT department, COUNT(*) as total_employees, MIN(employment_date) as earliest_employment_date,
calculate_duration(employment_date) as employment_duration
FROM employee_infos
WHERE employment_date >= '2020-01-01'
GROUP BY department;
Explanation:
- First, we create a table called
employee_infos
using theCREATE TABLE
statement. This defines the structure of the table with the specified fields:employee_id
,first_name
,last_name
,employment_date
, anddepartment
. - Then, we insert mock data into the
employee_infos
table using theINSERT INTO
statement. This adds sample records for testing purposes. - Next, we write a simple
SELECT
query to retrieve all records from theemployee_infos
table where theemployment_date
is on or after '2020-01-01'. - To group the selected records by the
department
field and calculate the total number of employees and the earliestemployment_date
, we use theGROUP BY
clause along with the aggregate functionsCOUNT()
andMIN()
. - After that, we define a user-defined function called
calculate_duration
using theCREATE FUNCTION
statement. This function calculates the employment duration in years by taking theemployment_date
as input and using theDATE_DIFF()
function with theYEAR
unit. - Finally, we modify the previous query to include the employment duration by calling the
calculate_duration()
function. This adds theemployment_duration
column to the final output.
Note: This code is written in Google BigQuery SQL.
Description
This project involves analyzing a table called "employee_infos" which contains information about employees such as their employee ID, first name, last name, employment date, and department. The project includes creating the table, inserting mock data, and writing queries to retrieve relevant information such as the total number of employees, earliest employment date, and employment duration for each department. The queries also involve grouping the records by department and using aggregate functions to calculate the required information. Additionally, a user-defined function called "calculate_duration" is created to calculate the employment duration in years for each employee.