久久久久久精品无码人妻_青春草无码精品视频在线观_无码精品国产VA在线观看_国产色无码专区在线观看

IEMS 5730代做、c++,Java語言編程代寫

時間:2024-03-12  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the
submitted homework.
I declare that the assignment submitted on Elearning system is original
except for source material explicitly acknowledged, and that the same or
related material has not been previously submitted for another course. I
also acknowledge that I am aware of University policy and regulations on
honesty in academic work, and of the disciplinary guidelines and
procedures applicable to breaches of such policy and regulations, as
contained in the website
http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________
Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must
be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has
discussed or worked with. If the answer includes content from any other source, the
student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
the homework.
If there is information you need to solve a problem, but the information is not stated in the
problem, try to find the data somewhere. If you cannot find it, state what data you need,
make a reasonable estimate of its value, and justify any assumptions you make. You will be
graded not only on whether your answer is correct, but also on whether you have done an
intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of
Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference
[1] and [2] to download the two datasets. Each line in these two files has the following format
(TAB separated):
bigram year match_count volume_count
An example for 1-grams would be:
circumvallate 1978 335 91
circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per
year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared.
Assume the data set contains all the 1-grams in the last 100 years, and the above
records are the only records for the word ‘circumvallate’. Then the average value is:
(335 + 261) / 2 = 298,
instead of
(335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
per year along with their corresponding average values sorted in descending order. If
multiple bigrams have the same average value, write down anyone you like (that is,
break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform
this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance
between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
compare the performance between Pig and Hive in terms of overall run-time and
explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small
subset of the data instead of the whole data set. Once your Hive commands/ scripts
work as desired, you can then run them up on the complete data set.
Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is
aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
homework, you will implement a similar-users-detection algorithm for the online movie rating
system. Basically, users who rate similar scores for the same movies may have common
tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this
homework, the similarity between a given pair of users (e.g. A and B) is measured as the
total number of movies both A and B have watched divided by the total number of
movies watched by either A or B. The following is the formal definition of similarity: Let
M(A) be the set of all the movies user A has watched. Then the similarity between user A
and user B is defined as:
………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
|𝑀(𝐴)∩𝑀(𝐵)|
|𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
by its unique userID and each movie is represented by its unique movieID. The format of the
data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the
list of the 10 pairs of users having the largest number of movies watched by
both users in the pair within the corresponding dataset. The format of your
answer should be as follows:
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:COMP 315代寫、Java程序語言代做
  • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    久久久久久精品无码人妻_青春草无码精品视频在线观_无码精品国产VA在线观看_国产色无码专区在线观看

    日韩xxxx视频| 粉嫩av一区二区三区天美传媒 | 人体内射精一区二区三区| 丁香婷婷激情网| 69精品丰满人妻无码视频a片| 亚洲一区二区三区四区五区xx| 影音先锋成人资源网站| 538任你躁在线精品免费| 黄色一级视频在线播放| 亚洲天堂一区二区在线观看| 国产成人精品无码播放| 久久99久久99精品| 国产农村妇女精品久久| 亚欧在线免费观看| 国产91美女视频| 亚洲熟妇无码av在线播放| 毛片毛片毛片毛片毛| 福利在线一区二区三区| 97超碰成人在线| avav在线看| 国产精品999视频| 在线播放免费视频| 91人人澡人人爽人人精品| 免费看日本毛片| 91黄色在线看| 成人一区二区av| 1314成人网| 免费在线观看污网站| 牛夜精品久久久久久久| 欧美日韩在线不卡视频| 怡红院av亚洲一区二区三区h| 日本xxxxx18| 午夜免费一级片| 91pony九色| 欧美在线aaa| 亚洲娇小娇小娇小| 中文字幕 91| www.com黄色片| 日本人视频jizz页码69| 国产原创精品在线| 黑森林精品导航| 亚洲污视频在线观看| 老司机午夜av| 男人搞女人网站| 宅男噜噜噜66国产免费观看| 成人免费毛片播放| 精品久久久久久中文字幕2017| 国内自拍视频一区| 日本在线观看免费视频| av亚洲天堂网| 久久精品亚洲天堂| 五月天六月丁香| 九九久久九九久久| 国内少妇毛片视频| 黄色网页免费在线观看| 狠狠爱免费视频| wwww.国产| av免费一区二区| 免费看av软件| 麻豆映画在线观看| 2018国产在线| 91看片就是不一样| 成人免费在线观看视频网站| 九九精品久久久| 操bbb操bbb| www.在线观看av| 免费黄色日本网站| 天堂av在线网站| 国产一区一区三区| 亚洲精品无码国产| 国产精品久久久久9999小说| 亚洲另类第一页| 水蜜桃在线免费观看| 奇米影视亚洲色图| 亚洲福利精品视频| 日韩国产精品毛片| 欧美 丝袜 自拍 制服 另类| 无码无遮挡又大又爽又黄的视频| 久热精品在线观看视频| 欧美少妇在线观看| 国产毛片视频网站| 69久久久久久| 日韩久久久久久久久久久久| 亚洲精品无码久久久久久| 一起操在线视频| 免费在线看黄色片| 污污的网站18| 六月婷婷激情综合| 久久午夜夜伦鲁鲁一区二区| 亚洲国产精品久久久久婷蜜芽| 污视频网站观看| 日韩精品一区二区三区四| 男人天堂成人在线| 99视频精品全部免费看| 国产精品无码专区av在线播放| 激情久久综合网| 日韩avxxx| 国产三级中文字幕| 国产第一页视频| 国产精品一区在线免费观看| 欧洲av无码放荡人妇网站| 亚洲欧美一区二区三区不卡| 精品国产一二三四区| 国内自拍第二页| 欧美性大战久久久久xxx| 伊人免费视频二| 欧美韩国日本在线| www.欧美黄色| 狠狠干狠狠操视频| 99精品人妻少妇一区二区| 日韩成人av免费| 国产一级片黄色| 国产曰肥老太婆无遮挡| 亚洲国产精品三区| 成年人网站免费视频| 久久免费视频2| 欧美黄色一级片视频| 国产一区二区三区乱码| 亚洲18在线看污www麻豆| aa免费在线观看| 久久av高潮av| 91视频福利网| 污版视频在线观看| 女性女同性aⅴ免费观女性恋| 天天在线免费视频| 在线能看的av网站| 丁香婷婷激情网| 免费无码国产v片在线观看| 最新av网址在线观看| 天堂中文av在线| 在线免费观看视频黄| 精品久久一二三| 欧美一区二区激情| 欧美另类videos| 午夜免费视频网站| 做a视频在线观看| 91制片厂毛片| 中文字幕欧美人妻精品一区| 尤物av无码色av无码| youjizz.com在线观看| 日韩精品一区二区三区电影| 国产无色aaa| 亚洲免费999| 五月婷婷六月丁香激情| 国产喷水theporn| jizz欧美性11| 午夜免费看毛片| 五月激情婷婷在线| 性生生活大片免费看视频| 依人在线免费视频| 亚洲欧美国产中文| 天天干天天操天天玩| 天天干天天爽天天射| 中文字幕网av| www.cao超碰| 午夜免费福利网站| 中文字幕第一页亚洲| 国产四区在线观看| 天天想你在线观看完整版电影免费| 成人短视频在线看| 咪咪色在线视频| 中国老女人av| www.九色.com| 国产黄色片免费在线观看| 精品无码国模私拍视频| 男女超爽视频免费播放| 欧美激情 国产精品| 欧美三级午夜理伦三级| 三级a在线观看| 午夜大片在线观看| 黄色一级视频播放| 日韩精品在线中文字幕| 黄色www网站| 天天干在线影院| 国产探花在线观看视频| 日韩视频一二三| 久久精品视频16| 波多野结衣天堂| 午夜免费视频网站| 欧美精品久久久久久久自慰| wwwxxx黄色片| 中文字幕在线视频精品| 99亚洲国产精品| 午夜精品久久久久久久无码| 亚洲综合在线网站| 波多野结衣网页| 日本午夜激情视频| 国产高清视频网站| 天天综合五月天| 精品无码一区二区三区在线| 韩国日本美国免费毛片| 91xxx视频| 欧美激情 国产精品| jizz欧美性11| 国产欧美日韩小视频| 国产又大又黄又粗的视频| 色乱码一区二区三区熟女 | www.xxx亚洲| 亚洲欧美日韩网站| 久久亚洲中文字幕无码|