Overview of Collections¶
Let’s quickly recap about Collections and Tuples in Python. We will primarily talk about collections that comes as part of Python standard library such as
Group of elements with length and index -
Group of unique elements -
Group of key value pairs -
setcontain group of homogeneous elements,
tuplecontains group of heterogeneous elements.
setare analogous to a database table while
tupleare analogous to individual record.
Typically we create list of tuples or dicts or set of tuples or dicts. Also a dict can be considered as list of pairs. A pair is nothing but a tuple with 2 elements.
dictare quite extensively used compared to
We typically use Map Reduce APIs to process the data in collections. There are also some pre-defined functions such as
maxetc for aggregating data in collections.
Let us perform few tasks to quickly recap details about Collections and Tuples in Python. We will also quickly recap about Map Reduce APIs.
Create a collection of orders by reading data from a file.
%%sh ls -ltr /data/retail_db/orders/part-00000
orders_path = "/data/retail_db/orders/part-00000" orders = open(orders_path). \ read(). \ splitlines()
Get all unique order statuses. Make sure data is sorted in alphabetical order.
sorted(set(map(lambda o: o.split(","), orders)))
Get count of all unique dates.
len(list(map(lambda o: o.split(","), orders)))
Sort the data in orders in ascending order by order_customer_id and then order_date.
sorted(orders, key=lambda k: (int(k.split(",")), k.split(",")))
Create a collection of order_items by reading data from a file.
order_items_path = "/data/retail_db/order_items/part-00000" order_items = open(order_items_path). \ read(). \ splitlines()
Get revenue for a given order_item_order_id.
def get_order_revenue(order_items, order_id): order_items_filtered = filter(lambda oi: int(oi.split(",")) == 2, order_items ) order_items_map = map(lambda oi: float(oi.split(",")), order_items_filtered ) return round(sum(order_items_map), 2)