## Using LIKE Operator or like Function
Let us understand the usage of `LIKE` operator or `like` function while filtering the data in Data Frames.
* `like` is primarily used for partial comparison (e.g.: Search for names which starts with Sco).
* We can use `like` to get results which starts with a pattern or ends with a pattern or contain the pattern.
* We can also use negation with `like`.
* Spark also provides `rlike` to take care of partial comparison using regular expression.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [1]:
from pyspark.sql import SparkSession

import getpass
username = getpass.getuser()

spark = SparkSession. \
    builder. \
    config('spark.ui.port', '0'). \
    config("spark.sql.warehouse.dir", f"/user/{username}/warehouse"). \
    enableHiveSupport(). \
    appName(f'{username} | Python - Basic Transformations'). \
    master('yarn'). \
    getOrCreate()

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

In [2]:
employees = [(1, "Scott", "Tiger", 1000.0, 10,
                      "united states", "+1 123 456 7890", "123 45 6789"
                     ),
                     (2, "Henry", "Ford", 1250.0, None,
                      "India", "+91 234 567 8901", "456 78 9123"
                     ),
                     (3, "Nick", "Junior", 750.0, '',
                      "united KINGDOM", "+44 111 111 1111", "222 33 4444"
                     ),
                     (4, "Bill", "Gomes", 1500.0, 10,
                      "AUSTRALIA", "+61 987 654 3210", "789 12 6118"
                     )
                ]

In [3]:
employeesDF = spark. \
    createDataFrame(employees,
                    schema="""employee_id INT, first_name STRING, 
                    last_name STRING, salary FLOAT, bonus STRING, nationality STRING,
                    phone_number STRING, ssn STRING"""
                   )

In [4]:
employeesDF.show()

+-----------+----------+---------+------+-----+--------------+----------------+-----------+
|employee_id|first_name|last_name|salary|bonus|   nationality|    phone_number|        ssn|
+-----------+----------+---------+------+-----+--------------+----------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10| united states| +1 123 456 7890|123 45 6789|
|          2|     Henry|     Ford|1250.0| null|         India|+91 234 567 8901|456 78 9123|
|          3|      Nick|   Junior| 750.0|     |united KINGDOM|+44 111 111 1111|222 33 4444|
|          4|      Bill|    Gomes|1500.0|   10|     AUSTRALIA|+61 987 654 3210|789 12 6118|
+-----------+----------+---------+------+-----+--------------+----------------+-----------+



* Get employees whose first name starts with **Sco**

In [5]:
employeesDF. \
    filter("first_name LIKE 'Sco%'"). \
    show()

+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|   phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states|+1 123 456 7890|123 45 6789|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+



In [6]:
employeesDF. \
    filter("upper(first_name) LIKE 'SCO%'"). \
    show()

+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|   phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states|+1 123 456 7890|123 45 6789|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+



* API Style

In [7]:
from pyspark.sql.functions import col

In [8]:
c = col('x')

In [9]:
help(c.like)

Help on method _ in module pyspark.sql.column:

_(other) method of pyspark.sql.column.Column instance
    SQL like expression. Returns a boolean :class:`Column` based on a SQL LIKE match.
    
    :param other: a SQL LIKE pattern
    
    See :func:`rlike` for a regex version
    
    >>> df.filter(df.name.like('Al%')).collect()
    [Row(age=2, name='Alice')]



In [10]:
# % at the end is mandatory
employeesDF. \
    filter(col('first_name').like('Sco%')). \
    show()

+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|   phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states|+1 123 456 7890|123 45 6789|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+



In [11]:
from pyspark.sql.functions import upper

In [12]:
employeesDF. \
    filter(upper(col('first_name')).like('SCO%')). \
    show()

+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|   phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states|+1 123 456 7890|123 45 6789|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+



* Get employees where first name contain `ott` irrespective of case.

In [13]:
employeesDF. \
    filter("upper(first_name) LIKE '%OTT%'"). \
    show()

+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|   phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states|+1 123 456 7890|123 45 6789|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+



In [14]:
employeesDF. \
    filter(upper(col('first_name')).like('%OTT%')). \
    show()

+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|   phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states|+1 123 456 7890|123 45 6789|
+-----------+----------+---------+------+-----+-------------+---------------+-----------+



* Get employees whose phone number does not start with **+44**

In [15]:
employeesDF. \
    filter("phone_number NOT LIKE '+44%'"). \
    show()

+-----------+----------+---------+------+-----+-------------+----------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|    phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+----------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states| +1 123 456 7890|123 45 6789|
|          2|     Henry|     Ford|1250.0| null|        India|+91 234 567 8901|456 78 9123|
|          4|      Bill|    Gomes|1500.0|   10|    AUSTRALIA|+61 987 654 3210|789 12 6118|
+-----------+----------+---------+------+-----+-------------+----------------+-----------+



In [18]:
employeesDF. \
    filter(~ col('phone_number').like('+44%')). \
    show()

+-----------+----------+---------+------+-----+-------------+----------------+-----------+
|employee_id|first_name|last_name|salary|bonus|  nationality|    phone_number|        ssn|
+-----------+----------+---------+------+-----+-------------+----------------+-----------+
|          1|     Scott|    Tiger|1000.0|   10|united states| +1 123 456 7890|123 45 6789|
|          2|     Henry|     Ford|1250.0| null|        India|+91 234 567 8901|456 78 9123|
|          4|      Bill|    Gomes|1500.0|   10|    AUSTRALIA|+61 987 654 3210|789 12 6118|
+-----------+----------+---------+------+-----+-------------+----------------+-----------+

