Skip to main content

Will AI Summary replace Manual Summary?

Recently, AI-related topics have been very hot, and some friends have also commented that with the emergence of AI, our kind of manual summarization will eventually be eliminated. Some have also said that with AI, won't summarizing be faster?

So I wanted to write a bonus article to share my thoughts on the following questions:

  1. How do current AI summarization tools work? What are their limitations?
  2. What are the disadvantages of AI summarization?
  3. Will AI summarization replace manual summarization?

Popularization of Scienceโ€‹

Before comparing, I want to popularize some knowledge about ChatGPT first. With an understanding of this knowledge, you can better compare AI summarization and manual summarization.

  • Token: This refers to the basic unit that OpenAI uses to process text, which can be word fragments or characters. For example, "hamburger" is divided into three tokens "ham", "bur" and "ger", while "pear" is one token. 1 token is approximately equal to 4 characters or 0.75 English words.

  • Some limitations:

    1. OpenAI models have a fixed token limit, for example, GPT-3's Davinci model can handle up to 2049 tokens, about 1500 English words. The latest Turbo model is about 4096 tokens, about 3000 English words.
    2. In addition, this limit has one more detail. The token limit count includes both input and output text. In other words, it's not that I can input 3000 English words and OpenAI can return 3000 English words, but rather the total of input + output cannot exceed 3000 English words.
    3. Text limit. GPT-3 models currently only handle text.

After understanding these limitations, I will explain how video/podcast summarization on the market works. In fact, you can directly simulate these plugins with ChatGPT.

First, because of limitation #3, the audio of the video/podcast needs to be converted to text. For YouTube, many products will directly use the transcript.

The second step is to pass the transcript and prompt (usually "please summarize the following content") to OpenAI together. What is passed to OpenAI looks something like this, you can also try it in ChatGPT by replacing the transcript below with a real YouTube transcript:

Please summarize the following sentences.
Text: """
Transcript
"""

Finally, OpenAI will return the summary result.

However, some videos and podcasts can be very long, with tens of thousands of words per episode. Because of limitation #2, it is impossible to pass the full transcript of an episode to OpenAI, so different products will have different handling methods. The most common handling method currently is "split summarization", dividing the transcript into multiple 5-minute transcripts, then passing them to OpenAI to summarize first, then passing these paragraph summaries back to OpenAI to summarize again and generate the final summary content.

Current disadvantages of AI summarizationโ€‹

After understanding the principles of these summary applications, the disadvantages of such products are also relatively easy to understand:

  • Disadvantage 1: Content relies on transcripts.

Some content without transcripts basically cannot be summarized, such as short videos or vlogs, many of which are just visuals without speech. These cannot be summarized.

Of course, no one probably needs short video summaries ๐Ÿ˜‚

  • Disadvantage 2: Content quality relies on transcript quality.

If the transcript quality is not good, the summarized content generated by the AI โ€‹โ€‹will be very strange. For example, if the transcript contains lyrics from background music, the AI โ€‹โ€‹will summarize those lyrics. When people see it, they will find it very strange.

  • Disadvantage 3: Token limitations lead to missing content.

People can speak about 125-150 English words or 180-200 Chinese characters per minute. Given people's video watching habits, video content is usually not longer than 15 minutes - which works out to around 2200 English words, or 3000 Chinese characters. However, most videos don't have continuous speech, so the word count is lower than this.

So a common way many video AI summary tools handle the token limit is to just pass in say 2500 English words, and ask the AI to return a 500 word summary. This is a crude approach (but still workable!), for long videos, it would miss summarizing the later sections. And if there's a twist or reversal in the second half, the summary could be completely wrong.

For example, the Glarity generates for Knowledge Project #141 is:

In this video, Kunal Shah discusses some of the elements of business success he learned from his family's business. Many of them came from a merchant community, which was more willing to take risks, had lower shame, better understood value, spotted new trends, and helped others in their community succeed. These traits made them more likely to succeed in business.

Compare this to my summary - it seems like only the first 15 minutes of the video was input into the prompt here. (BTW, I'm not saying this design can't work - Glarity is one of my most used AI plugins currently, and it supports custom prompts. Just thought it could be even better! ๐Ÿ˜)

  • Disadvantage 4: Layered summarization leads to missing content.

Of course not everyone does it this way. Splitting the summarization can partially resolve the missing content issue, but the split duration is very important. If it's too long, such as 15 minutes, 3000 English words, the AI โ€‹โ€‹can't return a summary.

In addition, this layered summarization can also lead to missing information. The first summary is like cutting a photo into blocks, masking each block, and then combining these masked blocks into one photo again and masking it once more - you lose content and it becomes less clear.

  • Disadvantage 5: AI does not know what is important.

If the first 4 disadvantages can be technically resolved in the future (for example, OpenAI relaxing the limit to 8K tokens), the last disadvantage is relatively speaking, I have not yet thought of a way to resolve it. Let me give a real example.

Still using Knowledge Project 141 as an example, the summarization method of Summarize.tech's summary is to divide the video into 5 minute segments, and then summarize the generated content a second time. Let's compare this result:

00:15:00 The author discusses how he has learned that many concepts in western society are not applicable to Asian societies, such as the value of time. He also discusses how Hinduism is not as scalable as other religions because it is not standardized.

It mentions "such as the value of time", when I was listening to this part, I felt this segment was very insightful. Because in this part Kunal talks about "why many tool products can't make money in Asia", and he explains that for many Asian countries, the concept of time as a value has never been taught.

But if you look at the above AI summary, it actually omits a lot of content, and the appeal is not strong enough that I could easily miss it if reading quickly. Let's look at another example:

00:35:00 In India, less than six percent of urbanIndian women have financial income of their own, and 94 of them are currently taking care of kids or taking care of the family and not contributing to the labor force. Another interesting thing is 95 of all financial products in India are bought by men. Credit cards, car loans, and home loans are all by men, while investments are only by men. India has now nearly two thousand dollars per capita income yearly, but if you remove the top 30 million families or 30 million individuals, the per capita income would drop to maybe 600. This is why many western markets love to come to India, because its per capita income is never going to beat and grow like China's because before China started becoming affluent, 96 of Chinese urban women were working because of the one child policy which forced it to become a general neutral society. However, in India, female participation of labor is going down. The per capita income is not going to grow and therefore a lot of foreign companies love to come to India because India is the "dau farm of the world." All the big internet giants, like Facebook, Twitter, and YouTube, will say "I have 500 million billion users in India, but look at the arpu and peel the ar

First, this summary probably hit the token limit and did not fully output at the end. Comparing this summary with the previous one, you should have the same question as me - why is this summary longer and more detailed?

I also tried writing some prompts myself, and even ChatGPT is not very consistent in this area (in other words, if you ask the same question repeatedly, it will give you different answers), so I don't know what criteria it uses for summarization right now.

But I have also tested out some interesting things that I will share with everyone later.

Will AI summarization replace manual summarization?โ€‹

My thinking is:

  1. What is worth summarizing, it still can't do well for now.
  2. What it can do well, the meaning of the summary also does not seem very big.

From my testing so far, AI is best at summarizing tech product reviews, especially unboxing reviews like those done by Zhongwenze. The summaries are accurate and very complete. But would I only read the text version of these reviews?

I wouldn't.

So I'm also very curious what the retention is like for these AI summarization tools.

On the contrary, I feel that maybe recombining these results into a new product could be more interesting.

For example, summarize all the videos reviewing the iPhone 14 across the web, and then do some statistics on those summary results - I could know how various influencers reviewed the iPhone 14, who gave it praise and for what aspects, who criticized it and what they criticized.

Current AI product gameplay still can't break away from text-based interaction and direct interaction with AI. Why not try using AI results to make products? In the past it would have been difficult to create something like What to Buy, requiring lots of manpower, but now wouldn't it be simpler?

Speaking of content it currently can't do well, I may use it as an aid, but using it as an aid also has two concerns:

  1. I don't know if its summarization is comprehensive.
  2. Passive learning becomes active learning: this is a bit related to the first one, purely listening, or reading the transcript, I mostly learn passively, I can quickly judge whether this is worth recording, but now it summarizes it for me, I have to actively think about what it summarizes, and whether it's worth listening to in detail.

So far in my testing, written transcripts have provided the most assistance to me. AI summaries come second - they help me more with identifying key points rather than comprehending the full content.

However, I believe better solutions will emerge in the future. For example, support for more media types, and less restrictive token limits.

One More Thingโ€‹

Earlier I mentioned that during my prompt testing, I discovered some interesting things.

Here's what happened:

I said earlier that AI Summary doesn't know what the key points are. But if your prompt contains some examples, it can optimize based on the examples you provide - essentially showing it what the highlights should be.

To put it simply, you can do something like this:

In India, less than six percent of urbanIndian women have financial income of their own, and 94 of them are currently taking care of kids or taking care of the family and not contributing to the labor force. Another interesting thing is 95 of all financial products in India are bought by men. Credit cards, car loans, and home loans are all by men, while investments are only by men. India has now nearly two thousand dollars per capita income yearly, but if you remove the top 30 million families or 30 million individuals, the per capita income would drop to maybe 600.

Highlight: less than six percent of urbanIndian women have financial income of their own.

Then in your next section, you can ask like this, and ChatGPT will return the highlights it understood:

This is why many western markets love to come to India, because its per capita income is never going to beat and grow like China's because before China started becoming affluent, 96 of Chinese urban women were working because of the one child policy which forced it to become a general neutral society. However, in India, female participation of labor is going down. The per capita income is not going to grow and therefore a lot of foreign companies love to come to India because India is the "dau farm of the world."

Highlight:

It is possible to build a personalized AI recommendation or summary system tuned to an individual in the future. But you would need to provide a significant amount of initial training data to the model.

For example, if a person highlights the introductory paragraphs of an article, the AI can learn to automatically highlight similar content that might be highlighted in the rest of the article. Once enough training data has been collected, the AI can automatically highlight new articles.