Skip to main content

2023-04 Reading Paper

Forcing myself to read AI-related papers for 30 minutes every day, not aiming to finish reading all the new ones, but to stay focused. To make myself stick to it, I plan to post the links to the papers I've read every day, along with some of my thoughts. I also hope to get feedback from everyone, such as pointing out my misunderstandings. Learn in public ๐Ÿ˜„

I can't guarantee continuous updates, and my time is limited. I usually only read the conclusions carefully, skim the research process, and bookmark those that need careful reading for later. So the quality of my sharing may not meet expectations.

Please follow cautiously ๐Ÿคฃ

The main tools I use are:

  • BriefGPT.xyz (I recommend everyone use this)
  • Zotero (for saving papers and taking notes)
  • Alarm clock (a physical one, to open the first tool and set a 30-minute reminder)

2023-04-24โ€‹

  • GPT4Tools: Teaching LLM to Use Tools via Self-instruction (not yet on Arxiv): Among many Microsoft and Google papers, I finally saw a paper from Tencent AI Lab. In short, this tool allows users to interact with images through textual dialogues, such as changing backgrounds and removing subjects from photos. What I think is worth noting is that it can replace LLMs as needed, and uses LoRA for fine-tuning.
  • Improving Patient Pre-screening for Clinical Trials:Assisting Physicians with Large Language Models: The paper says they are the first team to introduce LLMs to help clinicians screen eligibility for clinical trials. Doctors were able to simplify 90% of the screening using LLMs tools.
  • Low-code LLM: Visual Programming over LLMs: This paper introduces a new human-LLM interaction framework - Low-code LLM. Through visual interaction with a graphical user interface, users can incorporate their ideas into the workflow without writing trivial prompts. I think this is very suitable for the recent Auto or Agent GPT.
  • What is it like to program with artificial intelligence?๏ผš: Because I've been using Github Copilot lately, I don't feel anything about the content in this paper. The part that was valuable to me was learning that Microsoft has a demo called GridBook that allows users to write Excel formulas using natural language.

2023-04-25โ€‹

  • Scaling Transformer to 1M tokens and beyond with RMT: This paper was quite popular on Twitter. The biggest problem with large language models at present is the token limit. This paper mentions that recurrent memory techniques can be used to expand BERT's role in natural language processing, increasing the effective context length up to two million tokens while maintaining high accuracy. Simply converted, it should be close to 1.5 million English words. Looking at it in detail, large-scale engineering applications still need to wait. The resource consumption still looks quite large at present.
  • Renate: A Library for Real-World Continual Learning: Continual learning refers to the ability of machine learning models to be applied to dynamic data streams and adaptively update models as new data arrives without having to retrain from scratch. Unlike traditional machine learning approaches, continual learning needs to take into account the non-stationarity and uncertainty of data streams during model training to accommodate various changes that may occur in different scenarios. The Renate library mentioned in this paper is a PyTorch continual learning library that can be used in production environments.

2023-04-26โ€‹

I still think the two papers from yesterday (especially the first one) are awesome.

  • AudioGPT: Understanding and Generating Speech,Music, Sound, and Talking Head: This paper proposes AudioGPT, a multi-modal AI system that combines foundation models to handle complex audio information and solve many understanding and generation tasks, as well as support conversational input/output interfaces for speech (ASR, TTS), and demonstrates AudioGPT's capabilities on speech, music, sound, and conversational understanding and generation tasks through a series of experiments.
  • Answering Questions by Meta-Reasoning over Multiple Chains of Thought: Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. The Multi-Chain Reasoning (MCR) approach prompts large language models to meta-reason over multiple reasoning chains, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer.

2023-04-27โ€‹

  • Towards Explainable and Safe Conversational Agents for Mental Health: A Survey: If you are interested in the application of AI in the field of mental health, you might want to take a look at this paper. Based on 297 studies on mental health, the authors of this paper systematically surveyed about 80 intelligent technologies.
  • Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery: This paper introduces the application of two large language models in healthcare. The results show that the answers of GPT-3.5 and GPT-4 pose no obvious harm or risk to patients, but only less than 20% of the answers are consistent with the known answers from the informatics consultation service (I don't quite understand this service, my understanding should still be similar to online medical consultation services?), and their answers contain hallucinated references (however, doctors have disagreements on whether these AI-hallucinated answers pose harm to humans, some doctors think they are harmful). However, the paper believes that fine-tuning the data and using more advanced prompt engineering can improve their usefulness.
  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face: In this paper, the authors propose a HuggingGPT system. My understanding is that it is a bit like ChatGPT's plugin system. Users provide text to communicate with LLMs, and LLMs call the right APIs at the right time. And the system it calls is not API, but models on Hugging Face, so it can read images with model A and generate images with model B.

2023-04-28โ€‹

  • Industrial Engineering with Large Language Models: A case study of ChatGPTโ€™s performance on Oil & Gas problems: This is a case study, and the main conclusion is that LLMs may be useful in industrial engineering, especially in oil and gas engineering. This paper demonstrates the potential application of LLMs in solving problems in various areas of oil and gas engineering through examples of rock physics. These areas generally include full waveform inversion (FWI), acoustic reflectometry, hydrodynamic pressure pulse reflection and downhole intervention.
  • Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models: At present, many LLMs Chat products are in single chat mode, for example, you let AI play a role, and then it chats with you based on that role. This paper studies the case of multiple roles, letting AI play multiple roles. It should be very interesting if chat products could have multiple roles in a conversation. By the way, the first author of this paper is also called Jimmy ๐Ÿคฃ