Overview of Basic TransformationsΒΆ
Let us define problem statements to learn more about Data Frame APIs. We will try to cover filtering, aggregations and sorting as part of solutions for these problem statements.
Get total number of flights as well as number of flights which are delayed in departure and number of flights delayed in arrival.
Output should contain 3 columns - FlightCount, DepDelayedCount, ArrDelayedCount
Get number of flights which are delayed in departure and number of flights delayed in arrival for each day along with number of flights departed for each day.
Output should contain 4 columns - FlightDate, FlightCount, DepDelayedCount, ArrDelayedCount
FlightDate should be of yyyy-MM-dd format.
Data should be sorted in ascending order by flightDate
Get all the flights which are departed late but arrived early (IsArrDelayed is NO).
Output should contain - FlightCRSDepTime, UniqueCarrier, FlightNum, Origin, Dest, DepDelay, ArrDelay
FlightCRSDepTime need to be computed using Year, Month, DayOfMonth, CRSDepTime
FlightCRSDepTime should be displayed using yyyy-MM-dd HH:mm format.
Output should be sorted by FlightCRSDepTime and then by the difference between DepDelay and ArrDelay
Also get the count of such flights