Baidu is going to launch the Chinese version of ChatGPT, the quality remains to be seen, and there will be no less censorship – BlogsSoft

Baidu is going to launch the Chinese version of ChatGPT, the quality remains to be seen, and there will be no less censorship - BlogsSoft


ChatGPT, an AI answering program developed by the American Artificial Intelligence Research Institute, was born in November last year. Its seemingly unrestrained performance of any question has become popular around the world, and it has also attracted the attention of Chinese users. Baidu recently announced that it will launch a Chinese tool similar to ChatGPT. The analysis believes that although Baidu has technical strength, it remains to be seen whether the quality of similar tools in the Chinese version can be compared with the US version, and it is inevitable that its AI tools will be replaced by political censorship.

ChatGPT: AI‘s core technology breakthrough

Conversational chat tools based on artificial intelligence (AI) technology have been around for many years and are often used in virtual customer service, corporate training and other fields. Unlike traditional dialogue software, ChatGPT developed by the OpenAI Research Laboratory in the United States can carry out more complex dialogues. With its massive data reserves, it will not refuse any questions. It can even tell jokes, name names, and recite poems. — These linguistic features were previously thought to be creative jobs beyond the grasp of robots.

At the same time, ChatGPT also has the ability to write and debug computer programs. Christian Terwiesch, a professor at the Wharton School of Business at the University of Pennsylvania, published an article in January this year , revealing that ChatGPT passed an MBA exam he presided over; recently, ChatGPT also passed the University of Minnesota Law Four student tests given by faculty professors .

The industry believes that ChatGPT marks a breakthrough in the core technology of machine learning and artificial intelligence.

Du Yijin, the founder of Taiwan AI Lab and former Asia-Pacific research director of Microsoft’s AI department, said in a recent interview with Voice of America: “It (ChatGPT) uses huge data, a huge deep learning network, and uses very high High-level computer calculations, through the results of this calculation… through understanding the complete document, to answer relatively complex questions.”

Du Yijin said: “To be able to answer well, no matter the order of magnitude of the text, the complexity of the model, or the computing power of a supercomputer, it takes a major breakthrough to achieve this result.

ChatGPT has a simple interface and smooth use, backed by the strong financial and technical strength of American technology companies, as well as AI technical talents.

ChatGPT’s technology relies on the “Generative Pre-Training Transformation Model 3” (GPT-3 for short) of OpenAI Labs. GPT-3, a large-scale language model, comes from the “Transformer” open architecture developed by Google in 2017, which can write articles that are indistinguishable from human language.

Jeffrey Ding, an assistant professor at George Washington University and an expert on AI-related policy issues, told VOA: “The early GPT-3 model was basically trained on a large amount of Internet text. The data comes from academic journal articles and is trained on corpus (corpus) such as Reddit (Internet forum), Wikipedia, etc.”

“So it takes a lot of data, a lot of computing power, and a lot of good researchers and engineers to make sure that the training happens in an efficient way … the barrier to entry for these large language models is very high,” he said.

Microsoft has given OpenAI important financial support, investing US$1 billion in July 2019, and obtained the exclusive license of GPT-3 shortly thereafter. After the birth of ChatGPT, on January 23 this year, Microsoft announced that it would provide investment to OpenAI for several years. According to a previous report by the US news website Semafor , Microsoft’s total capital injection in this round may be as high as 10 billion US dollars.

Baidu is going to launch the Chinese version of ChatGPT, the quality remains to be seen, and there will be no less censorship - BlogsSoft

A researcher of Chinese descent working in an AI project of a well-known technology company in the San Francisco Bay Area told Voice of America that OpenAI recruited a large number of manpower last year to “train” (tutor) an artificial intelligence model based on massive data, and the ChatGPT born was therefore better than GPT-3 The architecture has made a qualitative leap.

The researcher, who requested anonymity, told Voice of America: “Before ChatGPT… the 2020 and 2021 versions of GPT-3 have no human (participating) data, and it is trained through a large amount of text on the Internet. Until Last year, they began to add people, hired many, many people to do labeling, to train the model very well, compared to many open source models on the Internet, open source large language model (large language model), ChatGPT is better than theirs The quality is much, much better. More than half of the credit goes to these guys for these annotations.”

The researcher estimates that at least thousands, or even tens of thousands of GPUs (graphics processing units) are needed to keep ChatGPT running.

“Only the biggest big tech (technology giants) in this area, such as Microsoft, Google, and Nvidia, can have such a cluster of computers and such a large computing power within such a company.” He said.

Baidu is eager to try, the quality of the text remains to be seen

Bloomberg reported on January 29 that Chinese search giant Baidu will also launch a tool similar to ChatGPT, and the technical foundation will be rooted in Baidu’s large-scale machine learning model ERNIE 3.0 system.

Baidu, which started out as a search service, has spent billions of dollars on artificial intelligence research and has been trying to transition from online marketing to next-generation emerging technologies such as cloud services, chips, and autonomous vehicles for years.

After ChatGPT became popular, Chinese users showed great interest in it. Although the ChatGPT tool in the United States supports Chinese question and answer, the services of OpenAI Labs, including ChatGPT, are not open to users in China. Previously, a program developer connected ChatGPT to the WeChat platform in the form of a WeChat mini-program to allow users in China to participate in the use. However, since mid-December, this mini-program has been suspended by WeChat due to “violations”.

According to reports, Baidu plans to launch a Chinese version of ChatGPT in March this year. The initial version will be embedded in its search service, allowing users to obtain conversational search results. According to Chinese media reports, Robin Li, CEO of Baidu, said that related technologies have reached a critical point, and Baidu has a great opportunity in it.

The researcher in the San Francisco Bay Area believes that Baidu is one of the first companies to devote itself to large-scale language models, and has the technical strength to develop its own “ChatGPT”. He said: “They (Baidu) have been engaged in research and development for a long time. And Baidu has financial and manpower, and data. It has so many searches and web pages, and Baidu Cloud stores a lot of web pages, so there is no shortage of data. .”

“Domestic labor costs are also cheap, so labeling data may be cheaper than OpenAI, and the cost is not high.”

Baidu is going to launch the Chinese version of ChatGPT, the quality remains to be seen, and there will be no less censorship - BlogsSoft

The US technology website pointed out that the “Pengcheng-Baidu · Wenxin” (ERNIE 3.0 Titan) pre-trained language model released by Baidu researchers and China Pengcheng Laboratory has 260 billion parameters, which exceeds the technical foundation of ChatCPT ( GPT-3.5 model) 175 billion parameters.

However, some researchers say that the text quality of the Chinese Internet may restrict the service quality of Baidu’s version of ChatGPT.

“In terms of quality, one of the challenges that Baidu will face in making its own version of ChatGPT is that there isn’t that much high-quality Chinese text on the Internet because high-quality Chinese text The corpus is smaller than the corpus of high-quality English text. Many Chinese researchers working in this field have pointed out this key difference.”

“Taking top academic papers as an example, there are many high-quality English articles, but not so many in Chinese,” Ding told VOA.

He also said: “Baidu also has greater funding and computing power constraints than OpenAI. Therefore, these two factors may reduce the potential quality of Baidu’s version of ChatGPT.”

Will the Baidu version of ChatGPT “seriously talk nonsense”?

At present, a prominent problem of ChatGPT is that this chat tool often talks about some issues in a seemingly serious way, and gives misleading answers and even false severe information in a seemingly objective and authoritative style of writing. Artificial intelligence experts have described the problem as “serious nonsense.”

In the early stages of ChatGPT’s launch, chatbots confidently gave long-winded “answers” ​​to questions that were difficult to answer or had ridiculous assumptions. A more famous example is that a user once asked “why does potassium cyanide stir-fry so fragrant”, ChatGPT even described this highly toxic substance as a delicious seasoning, saying “it is especially suitable for adding to Indian curry”.

At the same time, some users pointed out that ChatGPT seems to self-censor on some political issues, including avoiding opinions critical of the Chinese government on sensitive topics. According to the analysis, the Chinese version of ChatGPT developed by Baidu will have more in-depth “political censorship” when dealing with problems.

American independent scholar Philip J. Cunningham is a pioneer user of ChatGPT. While he admires the English expression ability of this “robot”, he also feels that his “writing” is sometimes empty.

“It creates an objective tone, but it’s not objective, but it makes a very convincing voice because, in a way, it’s very sure of itself,” Kim said to VOA.

“Not only is the sentence perfect, but it’s organized. It introduces a theme, feeds into the main parts, and then concludes, and it fits together nicely. So it’s an essay.  … But if you look closely , which actually says almost nothing.”

Jin Peili is the author of “Tiananmen Moon,” an account of the 1989 student movement. He found that ChatGPT seems reluctant to talk about the topic of “June 4th”. When asked about the Tiananmen incident in 1989, ChatGPT emphasized that “the Chinese government has not released relevant information, so we cannot actually understand the situation”.

“I think it’s perfect for something like China Daily, if you want to write something that doesn’t offend anyone,” Kim said.

Observers believe that Baidu’s version of ChatGPT will inevitably be censored for content politics. ERNIE-ViLG, Baidu’s text-generating image AI model launched in August last year, refused to generate images for politically sensitive topics such as “Tiananmen Square” and descriptions of political leaders.

“In China, any AI technology launched for broad consumer use will face pressure to comply with state censorship guidelines, and it is likely that such AI technology has been trained through a wealth of state media articles on a wide range of issues Both stick to Beijing’s official position,” Carl Minzner, a law professor at Fordham University and a senior fellow at the Council on Foreign Relations, told VOA via email.

“In the United States, people say, don’t say anything racist, and then train the machine not to be racist; in China, people train the machine not to criticize Xi Jinping, not to criticize the Communist Party. It’s straightforward.” Jin Peili said.

“(In the U.S.) it’s more of a concern about ‘political correctness’; in China, it’s more of a criticism of those in power,” he said.

China introduces regulations to tackle AI ‘deepfakes’

However, some analysts pointed out that in recent years, “Generative AI”, represented by text-to-image generation software and ChatGPT intelligent chat software, will provide governments and societies in all countries with guidance on how to monitor and identify false information. challenge.

“The risk of relying on these black-box algorithms to deliver information is that no one is in the loop,” said Graham Webster, a digital economy fellow at New America and editor-in-chief of the DigiChina program at Stanford University’s Cyber ​​Policy Center. In the process, check whether the information is good or not.”

“People may believe something is true because the output (information) sounds convincing, but it may not be true,” he told VOA. “It’s not just in China, it’s people, governments all over the world. , companies, and users must face this problem.”

On January 10 this year, China officially implemented the “Deep Fake” (deep fake) “deep fake” (deep fake) “Internet Information Service Deep Synthesis Management Regulations”, requiring service providers to conduct “artificial intelligence” on AI-generated content that “may cause public confusion or misidentification”. prominently identified”.

However, even if China can require technology platforms to mark “deep fake” images similar to watermarks, how to “prominently mark” text that is easy to copy and circulate will bring technical difficulties to the rulers.

Leave a Reply

Your email address will not be published. Required fields are marked *