Solutions - Problem 1¶
Get total number of flights as well as number of flights which are delayed in departure and number of flights delayed in arrival.
Output should contain 3 columns - FlightCount, DepDelayedCount, ArrDelayedCount
Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.
from pyspark.sql import SparkSession
import getpass
username = getpass.getuser()
spark = SparkSession. \
builder. \
config('spark.ui.port', '0'). \
config("spark.sql.warehouse.dir", f"/user/{username}/warehouse"). \
enableHiveSupport(). \
appName(f'{username} | Python - Basic Transformations'). \
master('yarn'). \
getOrCreate()
If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.
Using Spark SQL
spark2-sql \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Scala
spark2-shell \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Pyspark
pyspark2 \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Reading airtraffic data¶
airtraffic_path = "/public/airtraffic_all/airtraffic-part/flightmonth=200801"
airtraffic = spark. \
read. \
parquet(airtraffic_path)
airtraffic.printSchema()
root
|-- Year: integer (nullable = true)
|-- Month: integer (nullable = true)
|-- DayofMonth: integer (nullable = true)
|-- DayOfWeek: integer (nullable = true)
|-- DepTime: string (nullable = true)
|-- CRSDepTime: integer (nullable = true)
|-- ArrTime: string (nullable = true)
|-- CRSArrTime: integer (nullable = true)
|-- UniqueCarrier: string (nullable = true)
|-- FlightNum: integer (nullable = true)
|-- TailNum: string (nullable = true)
|-- ActualElapsedTime: string (nullable = true)
|-- CRSElapsedTime: integer (nullable = true)
|-- AirTime: string (nullable = true)
|-- ArrDelay: string (nullable = true)
|-- DepDelay: string (nullable = true)
|-- Origin: string (nullable = true)
|-- Dest: string (nullable = true)
|-- Distance: string (nullable = true)
|-- TaxiIn: string (nullable = true)
|-- TaxiOut: string (nullable = true)
|-- Cancelled: integer (nullable = true)
|-- CancellationCode: string (nullable = true)
|-- Diverted: integer (nullable = true)
|-- CarrierDelay: string (nullable = true)
|-- WeatherDelay: string (nullable = true)
|-- NASDelay: string (nullable = true)
|-- SecurityDelay: string (nullable = true)
|-- LateAircraftDelay: string (nullable = true)
|-- IsArrDelayed: string (nullable = true)
|-- IsDepDelayed: string (nullable = true)
Get flights with delayed arrival¶
# SQL Style
airtraffic.filter("IsArrDelayed = 'YES' AND Cancelled = 0").show()
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
|Year|Month|DayofMonth|DayOfWeek|DepTime|CRSDepTime|ArrTime|CRSArrTime|UniqueCarrier|FlightNum|TailNum|ActualElapsedTime|CRSElapsedTime|AirTime|ArrDelay|DepDelay|Origin|Dest|Distance|TaxiIn|TaxiOut|Cancelled|CancellationCode|Diverted|CarrierDelay|WeatherDelay|NASDelay|SecurityDelay|LateAircraftDelay|IsArrDelayed|IsDepDelayed|
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
|2008| 1| 17| 4| 1717| 1701| 1915| 1855| OH| 4977| N967CA| 118| 114| 101| 20| 16| SYR| CVG| 527| 2| 15| 0| null| 0| 16| 0| 4| 0| 0| YES| YES|
|2008| 1| 17| 4| 1530| 1530| 1645| 1637| OH| 5426| N779CA| 75| 67| 45| 8| 0| CVG| GRR| 268| 5| 25| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 18| 5| 1215| 1009| 1540| 1251| OH| 5260| N446CA| 145| 102| 140| 169| 126| MCI| CVG| 539| 2| 3| 0| null| 0| 126| 0| 43| 0| 0| YES| YES|
|2008| 1| 19| 6| 835| 835| 1145| 1130| OH| 5276| N523CA| 130| 115| 83| 15| 0| TUL| CVG| 646| 4| 43| 0| null| 0| 0| 0| 15| 0| 0| YES| NO|
|2008| 1| 20| 7| 1925| 1935| 2148| 2124| OH| 5215| N729CA| 143| 109| 34| 24| -10| JFK| PHL| 94| 5| 104| 0| null| 0| 0| 0| 24| 0| 0| YES| NO|
|2008| 1| 20| 7| 825| 830| 1045| 1007| OH| 5324| N933CA| 140| 97| 92| 38| -5| RDU| CVG| 390| 1| 47| 0| null| 0| 0| 0| 38| 0| 0| YES| NO|
|2008| 1| 21| 1| 1200| 1204| 1600| 1559| OH| 5056| N371CA| 180| 175| 147| 1| -4| MSY| LGA| 1183| 13| 20| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 21| 1| 1950| 1830| 2225| 2007| OH| 5595| N641CA| 155| 97| 136| 138| 80| DCA| JFK| 213| 10| 9| 0| null| 0| 0| 80| 58| 0| 0| YES| YES|
|2008| 1| 21| 1| 700| 700| 955| 950| OH| 5610| N964CA| 115| 110| 87| 5| 0| HSV| DCA| 613| 8| 20| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 22| 2| 2020| 1910| 2223| 2125| OH| 5032| N538CA| 63| 75| 44| 58| 70| ORD| CVG| 264| 1| 18| 0| null| 0| 0| 58| 0| 0| 0| YES| YES|
|2008| 1| 22| 2| 1320| 1320| 1600| 1528| OH| 5331| N805CA| 160| 128| 102| 32| 0| CVG| JFK| 589| 12| 46| 0| null| 0| 0| 0| 32| 0| 0| YES| NO|
|2008| 1| 23| 3| 908| 908| 1216| 1149| OH| 5033| N963CA| 188| 161| 124| 27| 0| LGA| SAV| 722| 6| 58| 0| null| 0| 0| 0| 27| 0| 0| YES| NO|
|2008| 1| 23| 3| 630| 635| 840| 831| OH| 5355| N926CA| 130| 116| 85| 9| -5| GSP| LGA| 610| 20| 25| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 24| 4| 930| 930| 1049| 1019| OH| 5689| N403CA| 139| 109| 90| 30| 0| CVG| DSM| 505| 5| 44| 0| null| 0| 0| 0| 30| 0| 0| YES| NO|
|2008| 1| 27| 7| 1950| 1950| 2132| 2129| OH| 5433| N447CA| 102| 99| 65| 3| 0| CVG| SAV| 515| 4| 33| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 27| 7| 1315| 1315| 1506| 1459| OH| 5645| N916CA| 111| 104| 66| 7| 0| BOS| DCA| 399| 3| 42| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 29| 2| 915| 915| 945| 925| OH| 5249| N960CA| 90| 70| 54| 20| 0| CVG| BNA| 230| 13| 23| 0| null| 0| 0| 0| 20| 0| 0| YES| NO|
|2008| 1| 29| 2| 1150| 1150| 1448| 1430| OH| 5670| N695CA| 118| 100| 76| 18| 0| MCI| CVG| 539| 6| 36| 0| null| 0| 0| 0| 18| 0| 0| YES| NO|
|2008| 1| 31| 4| 1505| 1505| 1651| 1645| OH| 5107| N912CA| 106| 100| 85| 6| 0| CVG| SAV| 515| 3| 18| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 1| 2| 1742| 1711| 1748| 1718| OO| 1998| N815SK| 66| 67| 40| 30| 31| ATL| BNA| 214| 6| 20| 0| null| 0| 30| 0| 0| 0| 0| YES| YES|
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
only showing top 20 rows
# Data Frame Style
airtraffic.filter((airtraffic["IsArrDelayed"] == 'YES') & (airtraffic["Cancelled"] == 0)).show()
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
|Year|Month|DayofMonth|DayOfWeek|DepTime|CRSDepTime|ArrTime|CRSArrTime|UniqueCarrier|FlightNum|TailNum|ActualElapsedTime|CRSElapsedTime|AirTime|ArrDelay|DepDelay|Origin|Dest|Distance|TaxiIn|TaxiOut|Cancelled|CancellationCode|Diverted|CarrierDelay|WeatherDelay|NASDelay|SecurityDelay|LateAircraftDelay|IsArrDelayed|IsDepDelayed|
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
|2008| 1| 17| 4| 1717| 1701| 1915| 1855| OH| 4977| N967CA| 118| 114| 101| 20| 16| SYR| CVG| 527| 2| 15| 0| null| 0| 16| 0| 4| 0| 0| YES| YES|
|2008| 1| 17| 4| 1530| 1530| 1645| 1637| OH| 5426| N779CA| 75| 67| 45| 8| 0| CVG| GRR| 268| 5| 25| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 18| 5| 1215| 1009| 1540| 1251| OH| 5260| N446CA| 145| 102| 140| 169| 126| MCI| CVG| 539| 2| 3| 0| null| 0| 126| 0| 43| 0| 0| YES| YES|
|2008| 1| 19| 6| 835| 835| 1145| 1130| OH| 5276| N523CA| 130| 115| 83| 15| 0| TUL| CVG| 646| 4| 43| 0| null| 0| 0| 0| 15| 0| 0| YES| NO|
|2008| 1| 20| 7| 1925| 1935| 2148| 2124| OH| 5215| N729CA| 143| 109| 34| 24| -10| JFK| PHL| 94| 5| 104| 0| null| 0| 0| 0| 24| 0| 0| YES| NO|
|2008| 1| 20| 7| 825| 830| 1045| 1007| OH| 5324| N933CA| 140| 97| 92| 38| -5| RDU| CVG| 390| 1| 47| 0| null| 0| 0| 0| 38| 0| 0| YES| NO|
|2008| 1| 21| 1| 1200| 1204| 1600| 1559| OH| 5056| N371CA| 180| 175| 147| 1| -4| MSY| LGA| 1183| 13| 20| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 21| 1| 1950| 1830| 2225| 2007| OH| 5595| N641CA| 155| 97| 136| 138| 80| DCA| JFK| 213| 10| 9| 0| null| 0| 0| 80| 58| 0| 0| YES| YES|
|2008| 1| 21| 1| 700| 700| 955| 950| OH| 5610| N964CA| 115| 110| 87| 5| 0| HSV| DCA| 613| 8| 20| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 22| 2| 2020| 1910| 2223| 2125| OH| 5032| N538CA| 63| 75| 44| 58| 70| ORD| CVG| 264| 1| 18| 0| null| 0| 0| 58| 0| 0| 0| YES| YES|
|2008| 1| 22| 2| 1320| 1320| 1600| 1528| OH| 5331| N805CA| 160| 128| 102| 32| 0| CVG| JFK| 589| 12| 46| 0| null| 0| 0| 0| 32| 0| 0| YES| NO|
|2008| 1| 23| 3| 908| 908| 1216| 1149| OH| 5033| N963CA| 188| 161| 124| 27| 0| LGA| SAV| 722| 6| 58| 0| null| 0| 0| 0| 27| 0| 0| YES| NO|
|2008| 1| 23| 3| 630| 635| 840| 831| OH| 5355| N926CA| 130| 116| 85| 9| -5| GSP| LGA| 610| 20| 25| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 24| 4| 930| 930| 1049| 1019| OH| 5689| N403CA| 139| 109| 90| 30| 0| CVG| DSM| 505| 5| 44| 0| null| 0| 0| 0| 30| 0| 0| YES| NO|
|2008| 1| 27| 7| 1950| 1950| 2132| 2129| OH| 5433| N447CA| 102| 99| 65| 3| 0| CVG| SAV| 515| 4| 33| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 27| 7| 1315| 1315| 1506| 1459| OH| 5645| N916CA| 111| 104| 66| 7| 0| BOS| DCA| 399| 3| 42| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 29| 2| 915| 915| 945| 925| OH| 5249| N960CA| 90| 70| 54| 20| 0| CVG| BNA| 230| 13| 23| 0| null| 0| 0| 0| 20| 0| 0| YES| NO|
|2008| 1| 29| 2| 1150| 1150| 1448| 1430| OH| 5670| N695CA| 118| 100| 76| 18| 0| MCI| CVG| 539| 6| 36| 0| null| 0| 0| 0| 18| 0| 0| YES| NO|
|2008| 1| 31| 4| 1505| 1505| 1651| 1645| OH| 5107| N912CA| 106| 100| 85| 6| 0| CVG| SAV| 515| 3| 18| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 1| 2| 1742| 1711| 1748| 1718| OO| 1998| N815SK| 66| 67| 40| 30| 31| ATL| BNA| 214| 6| 20| 0| null| 0| 30| 0| 0| 0| 0| YES| YES|
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
only showing top 20 rows
airtraffic.filter((airtraffic.IsArrDelayed == 'YES') & (airtraffic.Cancelled == 0)).show()
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
|Year|Month|DayofMonth|DayOfWeek|DepTime|CRSDepTime|ArrTime|CRSArrTime|UniqueCarrier|FlightNum|TailNum|ActualElapsedTime|CRSElapsedTime|AirTime|ArrDelay|DepDelay|Origin|Dest|Distance|TaxiIn|TaxiOut|Cancelled|CancellationCode|Diverted|CarrierDelay|WeatherDelay|NASDelay|SecurityDelay|LateAircraftDelay|IsArrDelayed|IsDepDelayed|
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
|2008| 1| 17| 4| 1717| 1701| 1915| 1855| OH| 4977| N967CA| 118| 114| 101| 20| 16| SYR| CVG| 527| 2| 15| 0| null| 0| 16| 0| 4| 0| 0| YES| YES|
|2008| 1| 17| 4| 1530| 1530| 1645| 1637| OH| 5426| N779CA| 75| 67| 45| 8| 0| CVG| GRR| 268| 5| 25| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 18| 5| 1215| 1009| 1540| 1251| OH| 5260| N446CA| 145| 102| 140| 169| 126| MCI| CVG| 539| 2| 3| 0| null| 0| 126| 0| 43| 0| 0| YES| YES|
|2008| 1| 19| 6| 835| 835| 1145| 1130| OH| 5276| N523CA| 130| 115| 83| 15| 0| TUL| CVG| 646| 4| 43| 0| null| 0| 0| 0| 15| 0| 0| YES| NO|
|2008| 1| 20| 7| 1925| 1935| 2148| 2124| OH| 5215| N729CA| 143| 109| 34| 24| -10| JFK| PHL| 94| 5| 104| 0| null| 0| 0| 0| 24| 0| 0| YES| NO|
|2008| 1| 20| 7| 825| 830| 1045| 1007| OH| 5324| N933CA| 140| 97| 92| 38| -5| RDU| CVG| 390| 1| 47| 0| null| 0| 0| 0| 38| 0| 0| YES| NO|
|2008| 1| 21| 1| 1200| 1204| 1600| 1559| OH| 5056| N371CA| 180| 175| 147| 1| -4| MSY| LGA| 1183| 13| 20| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 21| 1| 1950| 1830| 2225| 2007| OH| 5595| N641CA| 155| 97| 136| 138| 80| DCA| JFK| 213| 10| 9| 0| null| 0| 0| 80| 58| 0| 0| YES| YES|
|2008| 1| 21| 1| 700| 700| 955| 950| OH| 5610| N964CA| 115| 110| 87| 5| 0| HSV| DCA| 613| 8| 20| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 22| 2| 2020| 1910| 2223| 2125| OH| 5032| N538CA| 63| 75| 44| 58| 70| ORD| CVG| 264| 1| 18| 0| null| 0| 0| 58| 0| 0| 0| YES| YES|
|2008| 1| 22| 2| 1320| 1320| 1600| 1528| OH| 5331| N805CA| 160| 128| 102| 32| 0| CVG| JFK| 589| 12| 46| 0| null| 0| 0| 0| 32| 0| 0| YES| NO|
|2008| 1| 23| 3| 908| 908| 1216| 1149| OH| 5033| N963CA| 188| 161| 124| 27| 0| LGA| SAV| 722| 6| 58| 0| null| 0| 0| 0| 27| 0| 0| YES| NO|
|2008| 1| 23| 3| 630| 635| 840| 831| OH| 5355| N926CA| 130| 116| 85| 9| -5| GSP| LGA| 610| 20| 25| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 24| 4| 930| 930| 1049| 1019| OH| 5689| N403CA| 139| 109| 90| 30| 0| CVG| DSM| 505| 5| 44| 0| null| 0| 0| 0| 30| 0| 0| YES| NO|
|2008| 1| 27| 7| 1950| 1950| 2132| 2129| OH| 5433| N447CA| 102| 99| 65| 3| 0| CVG| SAV| 515| 4| 33| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 27| 7| 1315| 1315| 1506| 1459| OH| 5645| N916CA| 111| 104| 66| 7| 0| BOS| DCA| 399| 3| 42| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 29| 2| 915| 915| 945| 925| OH| 5249| N960CA| 90| 70| 54| 20| 0| CVG| BNA| 230| 13| 23| 0| null| 0| 0| 0| 20| 0| 0| YES| NO|
|2008| 1| 29| 2| 1150| 1150| 1448| 1430| OH| 5670| N695CA| 118| 100| 76| 18| 0| MCI| CVG| 539| 6| 36| 0| null| 0| 0| 0| 18| 0| 0| YES| NO|
|2008| 1| 31| 4| 1505| 1505| 1651| 1645| OH| 5107| N912CA| 106| 100| 85| 6| 0| CVG| SAV| 515| 3| 18| 0| null| 0| NA| NA| NA| NA| NA| YES| NO|
|2008| 1| 1| 2| 1742| 1711| 1748| 1718| OO| 1998| N815SK| 66| 67| 40| 30| 31| ATL| BNA| 214| 6| 20| 0| null| 0| 30| 0| 0| 0| 0| YES| YES|
+----+-----+----------+---------+-------+----------+-------+----------+-------------+---------+-------+-----------------+--------------+-------+--------+--------+------+----+--------+------+-------+---------+----------------+--------+------------+------------+--------+-------------+-----------------+------------+------------+
only showing top 20 rows
Get delayed counts¶
airtraffic. \
select('IsDepDelayed', 'IsArrDelayed', 'Cancelled'). \
distinct(). \
show()
+------------+------------+---------+
|IsDepDelayed|IsArrDelayed|Cancelled|
+------------+------------+---------+
| NO| NO| 0|
| YES| YES| 1|
| NO| YES| 0|
| YES| NO| 0|
| YES| YES| 0|
+------------+------------+---------+
## Departure Delayed Count
airtraffic. \
filter(airtraffic.IsDepDelayed == "YES"). \
count()
265198
## Departure Delayed Count
airtraffic. \
filter((airtraffic.IsDepDelayed == "YES") & (airtraffic.Cancelled == 0)). \
count()
247905
## Arrival Delayed Count
airtraffic. \
filter(airtraffic.IsArrDelayed == "YES"). \
count()
297956
## Arrival Delayed Count
airtraffic. \
filter((airtraffic.IsArrDelayed == "YES") & (airtraffic.Cancelled == 0)). \
count()
280663
airtraffic. \
filter("(IsDepDelayed = 'YES' OR IsArrDelayed = 'YES') AND Cancelled = 0"). \
select('Year', 'Month', 'DayOfMonth',
'FlightNum', 'IsDepDelayed', 'IsArrDelayed'
). \
show()
+----+-----+----------+---------+------------+------------+
|Year|Month|DayOfMonth|FlightNum|IsDepDelayed|IsArrDelayed|
+----+-----+----------+---------+------------+------------+
|2008| 1| 17| 4977| YES| YES|
|2008| 1| 17| 5426| NO| YES|
|2008| 1| 18| 5260| YES| YES|
|2008| 1| 19| 5276| NO| YES|
|2008| 1| 20| 5215| NO| YES|
|2008| 1| 20| 5324| NO| YES|
|2008| 1| 21| 5056| NO| YES|
|2008| 1| 21| 5215| YES| NO|
|2008| 1| 21| 5595| YES| YES|
|2008| 1| 21| 5610| NO| YES|
|2008| 1| 22| 5032| YES| YES|
|2008| 1| 22| 5331| NO| YES|
|2008| 1| 23| 5033| NO| YES|
|2008| 1| 23| 5355| NO| YES|
|2008| 1| 24| 5689| NO| YES|
|2008| 1| 27| 5433| NO| YES|
|2008| 1| 27| 5645| NO| YES|
|2008| 1| 29| 5249| NO| YES|
|2008| 1| 29| 5670| NO| YES|
|2008| 1| 31| 5107| NO| YES|
+----+-----+----------+---------+------------+------------+
only showing top 20 rows
from pyspark.sql.functions import col, lit, count, sum, expr
## Both Departure Delayed and Arrival Delayed
airtraffic. \
filter('Cancelled = 0'). \
agg(count(lit(1)).alias("FlightCount"),
sum(expr("CASE WHEN IsDepDelayed = 'YES' THEN 1 ELSE 0 END")).alias("DepDelayedCount"),
sum(expr("CASE WHEN IsArrDelayed = 'YES' THEN 1 ELSE 0 END")).alias("ArrDelayedCount")
). \
show()
+-----------+---------------+---------------+
|FlightCount|DepDelayedCount|ArrDelayedCount|
+-----------+---------------+---------------+
| 588366| 247905| 280663|
+-----------+---------------+---------------+
from pyspark.sql.functions import when
## Both Departure Delayed and Arrival Delayed
airtraffic. \
filter('Cancelled = 0'). \
agg(count(lit(1)).alias("FlightCount"),
sum(when(col('IsDepDelayed') == 'YES', 1).otherwise(lit(0))).alias("DepDelayedCount"),
sum(when(col('IsArrDelayed') == lit('YES'), 1).otherwise(lit(0))).alias("ArrDelayedCount")
). \
show()
+-----------+---------------+---------------+
|FlightCount|DepDelayedCount|ArrDelayedCount|
+-----------+---------------+---------------+
| 588366| 247905| 280663|
+-----------+---------------+---------------+