From ELIZA to ChatGPT

    ()

    sporsmal_grey_rgb
    Article

    I asked ChatGPT to write an editorial about the use of ChatGPT in research. It enthused about itself, but I am more sceptical. The technology is immature, and its use needs to be discussed and regulated.

    Photo: Einar Nilsen
    Photo: Einar Nilsen

    New, large language models such as ChatGPT are based on deep learning and the gathering of information from vast volumes of freely accessible digital texts. Based on this, they produce synthetic text in response to open-ended questions. The result is almost indistinguishable from human-created text.

    What implications will such technology have for research? In order to get good answers to difficult questions, it is often wise to go directly to the source. I therefore logged into ChatGPT and set it the following task: 'Write an editorial for the Journal of the Norwegian Medical Association about the use of ChatGPT in research.' In less than a minute, it had written an article in impeccable Norwegian, which, with a bit of good grace, could have passed as a slightly boring editorial here in the Journal.

    In the editorial, ChatGPT is rather enthusiastic about its own role in research. It 'can be of great help in the research and development of new treatment methods', it writes, and can 'be used to generate hypotheses and ideas for new research projects'. By its own estimation, it has also contributed to 'several successful projects that have led to new discoveries and better treatment methods'.

    In clinical medicine, the potential application of such technology is almost infinite. It can generate medical notes based solely on the conversation between patient and doctor and then recommend supplementary investigations (1). It can answer medical questions, propose solutions to clinical problems and create simple patient summaries of complex medical information – to mention just a few of the many potential applications.

    In the world of research, many journals and universities are most concerned that chatbots should neither be formal nor real authors of academic articles. This is primarily because a chatbot is not a legal entity and thus cannot meet the authorship requirement for taking personal responsibility for the content of the article (2). Since the technology uses existing texts to generate new text, without citing sources, the result may constitute both intellectual and verbatim plagiarism, which goes undetected. Chatbots also have a tendency to 'hallucinate', i.e. make outright false statements (1). For example, I asked ChatGPT to tell me about 'well-known Norwegian researchers in oral cancer'. It immediately listed 'Professor Jon Sudbø' as the leading light in Norway in the field, based on the claim that he 'has published numerous articles in recognised journals'.

    Chatbots also have a tendency to 'hallucinate', i.e. make outright false statements

    With the risk of both factual errors and plagiarism in the texts, it is paradoxical that both modern automated plagiarism detectors and experienced human peer reviewers are so bad at spotting chatbot-generated academic texts (3). Better software is already being developed for this purpose. Meanwhile, the chatbots are getting better and better at imitating text written by humans. This arms race will continue, and it is one of the reasons why the use of chatbots in academic writing needs to be regulated (4).

    Machine learning and artificial intelligence have long been used in the research process itself, for example in data analysis. However, their application can be significantly expanded with the new, advanced language models. A chatbot can perform literature searches, write summaries of previous articles in the field, weigh up findings, create figures and other illustrations and generate ideas and hypotheses for further research, etc. etc. (5). Some of this is uncontroversial, but ChatGPT can produce outright factual errors in even simple summaries of academic manuscripts (6). The chatbots' impressive capabilities can also easily seduce the reader into trusting the text more than there is reason to – a phenomenon known as automation bias (7).

    One of the very first chatbots, the computer program ELIZA, was developed in the 1960s. ELIZA could hold a written conversation with a human by responding with keywords used by the other party in question form (8). The program was named after the fictional character Eliza Doolittle in the play Pygmalion – which in turn took its inspiration from the legendary sculptor in Greek mythology who carved a statue that was so life-like he fell in love with it and had the gods bring it to life (9). Many hoped that ELIZA could be used in the treatment of mental disorders. That never happened.

    There is no doubt that ChatGPT and similar chatbots will represent major potential in clinical medicine and research in the years to come. However, there is still a long way to go before Pygmalion's statue is fully brought to life. The pitfalls of ChatGPT and the frequent errors mean that its use still needs to be developed, discussed and regulated. For instance, out of sheer vanity, I asked ChatGPT who Are Brean is. He is 'a recognised and experienced plastic surgeon' who 'works at Aleris Plastic Surgery in Oslo' and 'is an authority in his field', came the reply. Not even the last claim is correct.

    Comments  ( 0 )
    PDF
    Print

    Recent Articles