I have a project that I am working on that involves building a recommender system using movie ratings data. I am look for someone to help me with this and that HAS experience with machine learning and data mining techniques in Python using K nearest neighbor functions, clustering, collaborative filtering, etc. I believe numpy, scikit learn, and sklearn will be used here...a couple others will be used too. I have a lot of sample code that can be used for this and I believe I have all the resources necessary to complete it but low on time and understanding. If you can include similar work with your bid to prove that you have done something like this before, it will be helpful when picking out bids. It's hard just to take someones word for it since this is a specific assignment and not just regular python coding.
1. I want this to be coded using iPython Notebook.
2. This system will import a set of ratings from the data set provided below. Split the data set in order to have training data and test data, i think the moveilens data is already split up into groups so i guess it might not need to be split up. I don't need to use any demographic data. I just need to focus on ratings for movies. Here is the Data I need to use: MovieLens Ratings
3. From here I want to train the system with the training data and then import the test data and predict ratings based on the test data. Then compare the test ratings with the predictions and do some analysis on that. The data may need to be cleaned up, not sure.
3. Predictions can be done with K Nearest Neighbors or if you have a better suggestion...KNN Code provided here in iPython Format: KNN.ipynb It uses a module included here: kNN.py and using this file: video_store_2.csv In the KNN example earlier it states the user picked this, the prediction says this kind of thing...
4. Recommender system article recommender-systems-eml2010.pdf
5. I also have an item based recommender system code provided here: itemBasedRec.py Some of the code is missing but it may be useful. This code can be useful too and also applies to filling in the blanks of the itemBasedRec.py code but it may or may not help with this project. Matrix Factorization.ipynb
6. Do some simple regression analysis to see how the ratings predicted to the new ratings given. Some sample code on regression analysis is here: IPython Notebook.html
Summary: Import movielens data, train the system, import more data or "users" with their ratings and predict what they will rate each of the movies. Do some analysis on the data. Some regression analysis at the end.
Any questions just ask. See below for full descriptions.
Deliverables for Application Development projects include the following.
A detailed project report, including a description of the system system (including specific techniques and algorithms you used), and the interaction between the components (make references to code segments, modules, methods, functions, as necessary). Your write up should also include description of the evaluation of your system demonstrating its correctness, functionality, accuracy, etc., and the description of the data set(s) used in evaluating your system.
Appendices to the main report should include:
Complete (actual) sample runs of your program with descriptions, illustrating how your system works, along with any intermediate input or output used for the sample runs.
Fully documented code for the system, including any programs and scripts used for data preparation and analysis.
Binary files (e.g., executables, DLLs, Class files) or other components necessary to run your program.
Samples or detailed descriptions of the data sets used.
Readme file containing instructions on how to compile, install, and/or run your program, if appropriate, including relevant references to outside resources used.