VaporecMaster
In this project the recommendation engine for liquidbox.fr was implemented. The system implements memory based approach with user-item filtering. Each user has profile preferences vector and rating vectors (scores to already purchased items) which are composed into user preferences vector which is then compared with item properties vector using the dot product metric.
$$S_{(user,item)} = (\alpha P_{user} + \beta \displaystyle\sum_{i} (I_{i} R_{(user,i)})) \cdot I_{item}$$
On top of the similarity score the set of hard conditions is applied.

Item selection algorithm



The following features were implemented:


  • Assigning items to users using the following information:
    -ratings given to previous items;
    -user history;
    -user preferences;
    -user package configuration;
    -favourite items;

  • Ensuring diversity in brands and e-liquids assigned to a user;

  • Extracting data from WooCommerce database, saving results both to DB and CSV suitable for further processing;

  • Time complexity is $$O(bottles * items)$$ where "bottles" is the total number of item slots being assigned for all users and "items" is inventory list size;



  • The scikit-learn library was utilized to vectorize text data:

    1
    2
    3
    4
    5
    6
    7
    8
    unames, uf_dict = zip(*preferences.items())
    inames, if_dict = zip(*item_features.T.to_dict().items())
    
    vectorizer = DictVectorizer(sparse=True)
    vectorizer.fit(uf_dict + if_dict)
    
    U = vectorizer.transform(uf_dict)
    I = vectorizer.transform(if_dict)
    


    The pandas library was used heavily throughout the project to manipulate all the data:

    1
    2
    3
    4
    5
    6
    7
    prefs_right[["field_id"]] = prefs_right[["field_id"]].applymap(lambda x: fid_map[x])        
    
    prefs_right = pd.pivot_table(data=prefs_right, index=["item_id", "user_id"], columns="field_id", values="meta_value", aggfunc="first", fill_value="")
    prefs_right = prefs_right.sort_index(level="item_id").reset_index("user_id").groupby(["user_id"]).last().reset_index()
    
    preferences = pd.merge(left=prefs_left, right=prefs_right, how="inner")
    preferences = preferences[~preferences["email"].str.lower().duplicated()]
    


    The project was integrated into the website's WP/Woocommerce/MySQL backend and succesfully shipped to the customer.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    """ Extraction of input user list """
    
    """ Pivot postmeta, extract id to retrieve preferences after. Merge first and last name to be used in the
    recommendations table """
    connection.execute(
        """create or replace algorithm=temptable view 9L8fl_vm_in_users as """
        """select """
        """post_id as order_id, """
        """group_concat( """
        """ case """
        """    when pome.meta_key='_customer_user' then pome.meta_value """
        """    else null """ 
        """ end """
        """) as user_id, """
        """group_concat( """
        """ case """
        """    when pome.meta_key='_billing_email' then pome.meta_value """
        """    else null """ 
        """ end """
        """) as email, """
        """group_concat( """
        """ case """
        """    when pome.meta_key='_billing_first_name' then meta_value """
        """    when pome.meta_key='_billing_last_name' then meta_value """ 
        """    else null """
        """ end """
        """ separator ' ' """
        """) as user_name """
        """from 9L8fl_postmeta as pome join 9L8fl_vm_processing_orders """
        """on """
        """9L8fl_vm_processing_orders.ID=pome.post_id """
        """and """ 
        """(pome.meta_key='_customer_user' or """
        """pome.meta_key='_billing_email' or """
        """pome.meta_key='_billing_first_name' or """
        """pome.meta_key='_billing_last_name') """
        """group by pome.post_id """
    )