I'm working on a movie Prediction project. How can I create a PDF doc for data mining (Discovery of the data set and preparation of data)?

Kaggle has many competitions where you will find interesting data mining problems. You can choose one from there or make a simpler version of one of those problems you project. It also helps to look at different sources of data, like the UCI Repository. Look at different datasets that are available and think what classification, prediction or inference can be made from the available data. Note. If you meant the system output will be visible from a website, you can just have a server serve pages that shows the results. You can have an online learning system that is continuously scanning new data and changing predictions (or other outputs). Then have a have a website that fetch the results and displays it realtime.

There are lots of ways to do this. There is plenty of research on this. Google has multiple open datasets available; Apache Hadoop also has lots. Note. There are numerous tools/platforms such as Spark that can be used to create a lot of these. If you want to make a big data search or machine learning task, Spark is often the tool of choice. Note. For a large dataset you might want to have a web server to serve it. If you are interested in large data analysis such as the famous Amazon cloud computing or Google compute, then it's a good idea to have a big website with the necessary data. These services are good candidates for your web scraping service. Also, if you want to do a big data search for a particular problem or set of problems with large datasets, then it's often useful to have an API.

