020-88527781
AI's stumbling block in content distribution

2019-04-13 10:09:52

Since the commercialization of the Internet, all platforms, whether news clients, video websites or e-commerce platforms, have regarded themselves as excellent feeders by default. They push (feed) content to users according to their own ideas.


These keepers are trained professionals, in the jargon of "setting the agenda for users by website editors, and selecting content according to the tastes of most users".


Later, the editor was really busy and used the machine to help - the simplest machine method was "popular recommendation", such as sorting by clicks or other data.


The biggest problem of the feeder model is that we do not know how the customers' appetites are, which will lead to two significant consequences: first, the customers are not satisfied, and the personalized needs of users cannot be met; Second, it is a waste of its own resources. A large number of long tail resources have not been exposed for a long time, increasing the sunk cost.


Someone has discovered the advantages of machines. The machine can recommend content based on user characteristics. Just as a smart cook can provide meals according to the taste of every diner, if the machine is smart enough, it can solve the personalized needs of all users to a certain extent. Isn't this the C2M of the content industry?


To be exact, this is the C2M of content distribution. It communicates with a single user and breaks away from the stereotype of mass communication/mass communication. Is it enough to revolutionize all search engines and portals?


This intelligent content C2M has a profound background of the times. Today, you have stood on the edge of the times and watched AI technology ignite the IOT lead. Next, you will find that you can not refuse to enter the next information nuclear explosion era: information terminal explosion, information scale explosion, information platform explosion


On the information superhighway, the cars you have driven and the roads you have traveled have all changed rules, and all the knowledge frameworks based on the breeder model that you are familiar with are facing disruption.


In this era, the breeder model has failed, and smart machines will become the largest variable.


The first scenario is that humans produce content and machines distribute content.


The next scenario is that machines produce and distribute content.


Is it OK for the content industry to face the C2M revolution?


"Of course not. The machine is stupid." If you think so, it's a pity that you are doomed to not see the sun tomorrow.


"Of course." If you think so, congratulations on falling into the pit.


The real situation may be unexpected.


1、 The essence of content C2M is to move towards individualized communication


As an independent research direction, the origin of the recommendation system can be traced back to the collaborative filtering algorithm in the early 1990s. In the medium term, the representative is the traditional machine learning algorithm. For example, the implicit semantic model promoted by Netflix contest is now a more complex deep learning model.


In recent years, deep learning has made rapid progress, making machine recommendation the sun of the entire Internet. Driven by new technologies, personalized communication has become more feasible and closer to single user communication.


(1) Collaborative filtering starts slowly


According to the encyclopedic entries, collaborative filtering is to recommend information of interest to you by taking advantage of the preferences of user groups. These users either have similar interests or have common experiences. Then the website will filter and analyze your feedback (such as ratings) to help others screen information.


Of course, users' preferences are not necessarily limited to the information they are particularly interested in, and the recording of information they are not particularly interested in is also very important. Collaborative filtering has shown excellent results and started to dominate the Internet industry.


At first, collaborative filtering was applied to mail filtering.


In 1992, Xerox scientists proposed the Tapestry system. This is the first design to apply collaborative filtering system, mainly to solve the problem of information overload in Xerox's research center in Palo Alto. The staff of the research center will receive a lot of emails every day, but they cannot filter and classify them. So the research center will develop this experimental email system to help staff solve this problem.


Then, collaborative filtering ideas began to be applied to content recommendation.


In 1994, the GroupLens project team of Minnesota, the United States, established a news screening system. This system can help news readers filter the news content they are interested in. After reading the content, the readers will give a score for evaluation. The system will record the score for future reference, assuming that the readers will be interested in reading the content in the future, If the reader does not want to reveal his/her identity, he/she can also score anonymously. As the oldest content recommendation research team, GroupLens created the movie recommendation system MovieLens in 1997, Ringo, a similar music recommendation system, and Video Recommender.


Later, another milestone emerged - e-commerce recommendation system.


In 1998, Amazon's Linden and his colleagues applied for a patent for item to item technology, which was a classic algorithm used by Amazon in its early days and once became popular.


Is collaborative filtering AI? From the technical point of view, it also belongs to the category of AI. However, it must be pointed out that collaborative filtering algorithm is relatively weak. Whether it is user based collaborative filtering or item based collaborative filtering, the recommendation effect is always unsatisfactory.


How to guide the continuous optimization of the recommendation system through a systematic methodology? How can we integrate complex practical factors into the recommendation results? The siege lions were once very big, and there must be brave men under the reward. Later, someone finally found a more flexible way of thinking.


(2) Traditional machine learning begins to accelerate


In 2006, Netflix announced the launch of Netflix Prize. Netflix is an old online movie rental website. The purpose of holding the contest is to solve the problem of machine learning and data mining in movie scoring and prediction. The sponsor paid a lot of money for this, claiming that the individuals or teams who can improve the accuracy of Netflix's recommendation system Cinematch by 10% will be rewarded with 1 million dollars!


Netflix has disclosed a lot of huge data on its blog, for example:


We have billions of user rating data, which is growing by millions every day.


Our system generates millions of playback clicks every day, and contains many features, such as playback duration, playback time point, and device type.


Our users add millions of videos to their playlists every day.


Obviously, in the face of these massive data, we can no longer rely on the classification standards established by purely manual or small systems to standardize the user preferences of the entire platform.


One year after the competition, Korbell's team won the first stage award with 8.43% promotion. They spent more than 2000 hours trying to integrate 107 algorithms. Two of the most effective algorithms are matrix decomposition (usually called SVD, singular value decomposition) and restricted Boltzmann machine (RBM).


As a supplement to collaborative filtering, the core of matrix decomposition is to decompose a very sparse user rating matrix R into two matrices: the matrix P of the User property and the matrix Q of the Item property, and use known data to construct these vectors to predict unknown items. This algorithm can not only effectively improve the calculation accuracy, but also add various modeling elements, so that more diversified information can be integrated and a large amount of data can be better used.


However, the matrix decomposition has its drawbacks. The disadvantage is that matrix decomposition, like collaborative filtering algorithm, belongs to the category of supervised learning, which is rough and simple and suitable for small systems. The problem facing the network giants is that if a large recommendation system needs to be established, collaborative filtering and matrix decomposition will take a long time. What should I do?


As a result, some siege lions turned their eyes to unsupervised learning. The essence of clustering algorithm in unsupervised learning is to identify user groups and recommend the same content to users in this group. When we have enough data, we'd better use clustering as the first step to reduce the selection range of relevant neighbors in the collaborative filtering algorithm.


The implicit semantic model uses the cluster analysis method, and its major advantage is that it can not only make score prediction, but also model the text content at the same time, which greatly improves the effect of recommendation through content.


Traditional analysis methods are not accurate in the two steps of labeling users and mapping them to results according to the labels. For example, the age filled in by users may not be true, or not all teenagers like comics. The core of the implicit semantic model is to go beyond the dimensions of these surface semantic tags, and use machine learning technology to mine deeper potential associations in user behavior, so that the recommendation accuracy is higher.


Under the command of the Netflix Prize Million Dollar Wulin Competition, the world's talents frequently emerge. In 2009, it reached a peak and became the most beautiful event in the field of recommendation systems. This competition attracted many professionals to devote themselves to the research in the field of recommendation systems. It also allowed this technology to penetrate into the business field from the professional circle, triggered heated discussions and gradually attracted the saliva of mainstream websites, including content based recommendation, knowledge-based recommendation, hybrid recommendation The recommendation based on trust network has stepped onto the channel of rapid development.


These recommendation engines are different from collaborative filtering. For example, content-based recommendation is based on the content information of the project, instead of relying on users' comments on the project, it is more necessary to use machine learning methods to obtain users' interest information from the cases described by the characteristics of the content. Content filtering mainly adopts natural language processing, artificial intelligence, probability statistics, machine learning and other technologies for filtering.


Is a million dollars worth it? According to the data of Netflix users in 2016, 65 million registered members watch videos for 100 million hours every day. Netflix said the system could save $1 billion a year.


(3) Deep learning brings "unmanned driving"


In recent years, users have experienced great pain points. The popularity of smart phones has made the huge amount of information and the small reading screen a pair of contradictions that are difficult to resolve. The user's reading scene is no longer stuck on the computer screen, but changes to mobile fragmentation. The search engine fails, the manual recommendation is too busy, and the machine recommendation is not enough. This change is a life and death test for the large content platform. If you can meet your needs, you will live; if not, you will die.


Faced with this problem, YouTube and Facebook have proposed a new solution: use deep learning to build smart machines. In the past decade, deep learning has made a huge leap, which is more advantageous for solving large data volume.


If manual content recommendation is like driver driving, the content recommendation brought by in-depth learning is like driverless car. In this technology, user data is used to "perceive" user preferences. Its recommendation system can basically be divided into data layer, trigger layer, fusion filter layer and sorting layer. When the data generated and stored in the data layer enters the candidate layer, the core recommendation task is triggered.


Taking YouTube as an example, its newly published recommendation system algorithm consists of two neural networks, one for candidate generation and the other for sorting. First, with the user's browsing history as the input, the candidate generation network can significantly reduce the number of videos that can be recommended, and select a group of the most relevant videos from a huge library.


The generated candidate video has the highest correlation with the user, and then further predicts the user's score. The goal of this network is to provide a wider range of personalization through collaborative filtering. The task of the sorting network is to carefully analyze the candidate content and select a small number of optimal choices. The specific operation is to use the designed objective function to score each video according to the video description data and user behavior information, and present the video with the highest score to the user.


In this mode, the machine completely takes over the platform. With the continuous training of deep learning, machines become more and more intelligent, and their IQ in dealing with people will gradually improve, and they will gradually assume the responsibility of watchdog in a sense.


2、 Whether the content industry will be subverted by C2M


There is nothing strange about the world. An ATM of a bank in Corpus Christie, Texas, USA, unexpectedly uttered a note on the 11th, saying "Save me". The news soon spread across Chinese networks and became the headline of many websites.


Do you need to see identical articles from N websites?


This redundant information consumes your energy and traffic. Just like you can see many instant noodle advertisements when you open any TV channel, it is difficult to quickly find the content you want from a large amount of information.


How to solve the embarrassment of user information redundancy?


In the past, there were many unsuccessful technical solutions. Personal portals were short-lived, RSS subscription was not a viable option, and cross site tracking was not available. Only C2M can lead the future.


The C2M mode can be applied to the whole network as Toutiao does today, or it can be based on the giant platform like Facebook. Its core is to extract, sort and deliver massive information to users based on user behavior habits, characteristics and demands, which is the secret to overcome pain points.


However, there are many doubts. For example, some people think that recommendations such as collaborative filtering can easily lead users to form information cocoon rooms, unable to identify reading scenes, poor timeliness, long time consuming and other shortcomings. Today's Toutiao model is also often criticized, and has to deal with many challenges such as difficult to capture user interests, user data privacy and management.


Support and challenge each stick to one end, which is right or wrong? Although there are two major opportunities in the future, we need to cross three mountains at present.


1. The supporting reasons are as follows:

① Thousands of people and faces can be adjusted.

Personalized content recommendation mechanism can recommend information for users according to their preferences. Through various algorithms, by analyzing the historical behavior of users, comparing relevant users and related items, we can guess the content that users may like, list candidate sets and verify them, so that users can get more accurate content, make information distribution thousands of people, and achieve accurate connection between content and users, rather than the traditional sense of "all people are one"

② Find a needle in the sea to improve efficiency

Personalized recommendation saves users from extracting and searching from massive information. The user does not need to touch the needle in the massive information, to a certain extent, it removes some useless information for the user, narrows the scope of user information search, and improves the user's reading efficiency.

③ Invest in its favor to enhance viscosity

Constantly recommending content suitable for users can increase user stickiness. Personalized recommendation technology accurately recommends the content that users are interested in through algorithms to help users quickly find the content that they are interested in


一区二区三区字幕不卡|五月丁香亚洲综合色|国产91线观看|在线国产综合一区二区三区