GPT-ScoreAID for correcting and grading descriptive responses

1 Introduction

Generative AI, represented by ChatGPT, is being utilized in various aspects of society, and our perception of generative AI is significantly changing. The initial image of it being somewhat “suspicious” has faded, and the recent trend is to consider how to utilize this new tool in our work.

In the field of education, teachers with a progressive spirit continue to bravely challenge its application. Moreover, even among teachers who have not shown much interest so far, the perception that “it might be useful…” is gradually spreading, and it seems that generative AI is steadily penetrating the education sector.

In the comprehensive study time where students summarize what they have investigated, generative AI can suggest adding a certain perspective to the drafts of reports written by students, or it can help to refine expressions by saying, “Wouldn’t it be easier to understand if you wrote it this way?” It can also suggest ways of presenting ideas, such as “There is this material about the idea of XX that you are discussing. Let’s add this to clarify the basis of your argument.” I feel that the application of generative AI holds a variety of possibilities in education.

2 The Use of Generative AI in Japanese Schools…

The perception that “Perhaps it might be useful…” is increasingly becoming widespread among teachers, but the reality is that it has not yet reached the point of changing actual work in real educational settings and tasks in the staff room.

As A. Toffler states in his book “The Future of Wealth”, the world of education is very slow to change. The efforts to digitize tasks in the teachers’ staff room and classes in Japan are progressing very slowly, except in a few schools. The situation is similar for the use of generative AI; there is no rule that it must be used, so the current scene in the staff room in Japan is that teachers who like such things or are proficient in them are using it. (Looking back, it hasn’t changed much from half a year ago ^^;)

However, over the past few months, I have tried using Chat-GPT, a generative AI, in various school tasks and I strongly felt, “Education is no longer unrelated to generative AI, and its use will undoubtedly become commonplace in the world of education. This will bring about significant changes in educational activities such as classes and teachers’ work.”

In the next section, I will provide some examples to illustrate why I have come to feel this way, and I would like to discuss the development of a new software called GPT_ScoreAid, which can quickly grade and correct the descriptive responses given by students. I aim to create an environment where the integration and utilization of generative AI in the field of education becomes much easier to implement.

3 Examples of the Use of Generative AI in Several Scenarios in Japanese Schools

(1) Utilizing in the Compilation of Descriptive Surveys
The moment I strongly felt the effectiveness of generative AI in education was when I was involved in analyzing a survey. The traditional method involves the teacher in charge reading each sentence written in the survey, then classifying these sentences into several categories and summarizing the results.

In the survey I was involved in this time, which asked “What do you think about students staying out overnight?”, there were three choices: in favor, against, and undecided, and many opinions were submitted by parents about this matter. From these written opinions, I carefully removed any confidential or personal information and decided to try using Chat-GPT to classify the responses.(Note: In this inquiry, the information entered was processed using GPT-ScoreAID, which uses an API to ensure that the input data is not learned by the AI.)”

When I asked AI to condense nearly 200 opinions written by those in favor into various categories, it categorized the opinions from parents into four major arguments: “1. It aids in children’s growth and social development, 2. It’s acceptable if conducted under parental approval and supervision, 3. One should respect children’s freedom, 4. It can be accepted if there’s a relationship of trust established.”

The opinions of those who were against and those who were undecided could also be summarized in the same way in a few minutes, so through this trial, I was able to strongly feel the potential of generative AI in school tasks.

If schools can use such methods, the cost of reading and summarizing sentences will be significantly reduced. Furthermore, by starting from the points grouped into several categories, we will be able to focus more on the essence of problem-solving, such as contemplating what direction to take for solutions, or devising countermeasures.

(2) Analyzing the Growth of a New Teacher

Working in education, I sometimes find myself mentoring new teachers. When I do, I advise them on their growth over a slightly longer span of time, such as two to three months or a year. However, I sometimes find myself moved by the efforts of these new teachers, which can lead to somewhat lenient guidance or judgment.

Therefore, I decided to use AI to analyze the growth and changes of these new teachers in a more objective manner. Last year, I was involved in a job where I visited schools with new teachers once a week to observe their classes and various activities. I recorded these observations every week. The documents accumulated from April to July during the first semester amounted to about 72,000 characters in Japanese text files. From this, I removed personal and confidential information, and used AI to analyze the growth of the young teachers.

The prompt was as follows:

  • The following text is a record of weekly observations of a newly hired teacher’s classes and activities. Please read this record and summarize in about four points what you think this teacher has gained as a teacher from April to July. Also, please advise in about four points what should be noted and learned in the training from August onwards. (The observation records from April to July are then inputted as a text file)

Chat-GPT: The skills this teacher gained from April to July

  • Improvement in class progression: The teacher has been trying various methods to smoothly conduct the class and draw out the children’s interest and understanding. For example, using an overhead projector and incorporating pair activities in the class.
  • Communication with children: The teacher has improved their ability to build relationships with children by actively speaking to them during classes and class management, listening to their stories, and paying attention to their opinions and worries.
  • Safety management and response to trouble: The teacher has learned and practiced how to respond appropriately when trouble occurs among the children during class management, and how to ensure the safety of the children.
  • Improvement in the use of teaching materials: The teacher has flexibly incorporated the use of textbooks and other teaching materials in the class, contributing to increasing the children’s interest in learning.

Chat-GPT: What should be noted and learned in future training

  • Improving class management skills: Effective class management techniques, including how to conduct classes when it’s noisy or when attention needs to be gathered.
  • Diversification of evaluation methods: Learn various evaluation methods to accurately grasp the children’s learning progress and understanding.
  • Understanding of special education: Knowledge about appropriate understanding and support measures for children who need special support.
  • Education using ICT: Aim for more attractive and understandable classes by learning effective ways to conduct classes using ICT tools.

The answers I received were indeed what I thought. In particular, the points about “improvement in class progression” and “use of teaching materials” were conclusions drawn by AI from the weekly observation records I had made. By pointing out more specific examples of what I intuitively felt as the new teacher’s “improvement in teaching,” it helped to increase the objectivity of my judgment.

The year-end review amounted to a massive 190,000 characters in Japanese text. I thought I’d take advantage of this and try to analyze the annual growth.However, with such a large amount of information, the software I used for analysis, GPT_ScoreAid, timed out and couldn’t provide an answer. The latest model of Chat-GPT can apparently handle this amount of data without any problem, but the amount of data exchange required would be enormous. Consequently, the value used to calculate the usage fee, called a token, would also increase, so I decided to refrain from making any further inquiries.

From this initiative, I realized that various records written in text that schools usually keep can also be analyzed if necessary. Also, in using Chat-GPT, I realized that it’s important in actual operation to balance the amount of text exchanged and the value of the information obtained from it.

(3) Evaluation of the report created as a summary of the learning at the end of the learning unit.

Mr. Naotaka Gunji, who works in the Inquiry Department at Toin Gakuen Junior and Senior High School in Kanagawa Prefecture, is a teacher with whom I previously conducted research at Fuzoku Hakodate Junior High School.When I consulted with Mr. Gunji about the materials to verify whether generative AI can really be utilized in class, he provided me with a report on a task he had previously assigned in class.
The task was to comprehensively discuss Japan as a country, based on various perspectives learned in geography.

Task

  • From a geographical perspective, how would you describe Japan as a country? Please discuss based on your learning about “topography”, “climate”, “population”, “industry”, “transportation and communication”, “natural disasters”, “resources and energy”.

The student reports provided by Mr. Gunji were excellent pieces exceeding 1700 words in Japanese, and I was curious to see if Chat-GPT could really provide analysis and advice on such high-quality reports. So, I immediately started creating prompts to verify this.

Prompt:

  • “I gave a problem to the second-year junior high school students, ‘From a geographical perspective, how would you describe Japan as a country? Please discuss based on your learning about “topography”, “climate”, “population”, “industry”, “transportation and communication”, “natural disasters”, “resources and energy”.’ The model answer to this problem is as follows. (Note: Here, I will have the AI read the entire 1300-word model answer written by Mr. Gunji) After that, please read each student’s answer in order, and compare the student’s answer with the model answer, and give two points each for the excellent points and the points that need improvement in the student’s answer.”

After some trial and error, I created such a prompt and used GPT_ScoreAid with an API key to continuously ask Chat-GPT about the reports written by the students, and obtained Chat-GPT’s analysis. Let’s take a look at the advice for two students below.

Answer form GPT to Student1:

  • I will list the excellent points and the points that need improvement by comparing this student’s answer with the model answer.

Excellent points

  • Detailed explanation about the climate: The student’s answer specifically explains that Japan has a warm and humid climate, and the characteristic climates of each region (such as the climate of Hokkaido and the southwestern islands). This shows that the student understands the diversity of Japan’s climate.
  • Specific examples of measures against natural disasters: The student has added a specific explanation about the emergency earthquake warning system and hazard maps, showing that they understand the measures and mechanisms Japan has in place against natural disasters and can accurately convey them.

Points that need improvement

  • Logical structure: The answer as a whole is not organized according to clear categories (topography, climate, population, etc.) as in the model answer. The information is diverse, and the views on “resources and energy” and “industry” are scattered in particular. It is necessary to organize the information by topic and clarify its relevance. This student’s answer shows a deep understanding of Japan’s geographical features and problems, but there is room for improvement in the structure of the argument and the use of data. By using a more organized structure and specific facts and data, the student can further enhance the reader’s understanding.

Answer form GPT to Student2:

I will evaluate this student’s answer by comparing it with the model answer as well.

Excellent points

  • Deep explanation about geography and topography: The student shows a deep understanding of how Japan’s topography was formed by plate movement and became the current island country, and the existence of the Fossa Magna and the Japanese Alps. In addition, the student specifically explains the characteristics of alluvial fans and basins, and how they affect agriculture, reflecting concrete knowledge about geography.
  • Insight into the transition of energy and industry: The student provides a detailed explanation of Japan’s low energy self-sufficiency rate and the transition of power generation methods according to the times (transition from hydropower to thermal and nuclear power). The student also clearly points out how Japan’s industrial structure has changed (transition from primary to tertiary industries), which is an important point common to the model answer.

Points that need improvement

  • Organization of information: The student’s answer contains a lot of important information, but it gives the impression that the information is somewhat scattered. For example, topics such as climate, topography, population, and industry are not clearly divided as in the model answer, and the overall flow is somewhat difficult to follow. By dividing each topic into paragraphs and providing headings for each, the student can clarify the structure of the information and deepen the reader’s understanding.
  • Detail of climate classification: The student divides Japan’s climate into three major climate classifications, but the explanation of the finer differences in climate, such as the climate on the Sea of Japan side with a lot of precipitation in winter and the climate on the Pacific side, is lacking. These classifications are important for understanding Japan’s climate, and understanding

On January 19, 2024, an article titled “I had various generative AIs solve the Common Test” was distributed on the Web. The article was written by someone named LifePrompt. In the article, it was shown that Chat-GPT has the ability to score above the average score of university entrance exam students in subjects such as Japanese language and social studies (world history, Japanese history). Looking at Chat-GPT, which reads student reports provided by Professor Gunji and returns advice, I certainly felt convinced by LifePrompt’s point.

(4) Furthermore…

In addition to this, in Japan, students’ achievements in all subjects are evaluated from three perspectives: “knowledge and skills”, “thinking, judgment, and expression”, and “attitude towards proactive learning”. However, it is difficult to grasp the achievement status from the perspective of “attitude towards proactive learning”. Therefore, in many subjects, worksheets are used to record and reflect on one’s learning history. It seems possible to use GPT-ScoreAID to extract descriptions where the following t1 and t2 are expressed from the descriptions written by the students on these sheets.

  • t1: Descriptions that can be seen as the student’s persistent efforts to acquire knowledge and skills, and to develop thinking, judgment, and expression abilities.
  • t2: Descriptions that can be seen as the student’s adjustments and various efforts to their learning.

In Japan, all sixth-grade elementary school students and third-grade junior high school students take a national academic achievement survey in April, which consists of two subjects: Japanese language and arithmetic/mathematics. This test is very important, and it is used to analyze how much academic ability the students in one’s own school have compared to schools nationwide, and what issues they may have. However, because the grading results are not known until around July, a somewhat distant time, schools in Japan grade the tests immediately after the April exam to see how well their students were able to solve the test.

Especially in the Japanese language test, there are many descriptive questions, and the teachers at each school use the grading standards provided by the MEXT(Ministry of Education, Culture, Sports, Science and Technology) to read and grade the students’ written answers.
Grading by the school’s own teachers is a very labor-intensive task, and we believe that there is a possibility of automating the grading of the descriptive responses written by students by utilizing GPT-ScoreAid, effectively reflecting the grading standards of the MEXT in the prompt, to alleviate this burden.

4 GPT_ScoreAid for efficiently correcting and grading the descriptive responses written by students.

Actually, the usage example of Chat-GPT in the previous section was not conducted through a web-based chat, which many people use, but by using a new software I developed called GPT_ScoreAid. It was created based on the code written by a person named Wagner Landgraf using Delphi, a PASCAL language This program has a feature that allows you to continuously query student responses stored in Excel using the same prompt with the API key provided by OpenAI.

The most crucial point when using Chat-GPT in education lies in creating a “prompt”, a directive that instructs how to handle children’s responses and from what perspective to grade them. To correctly create this directive, it is necessary to repeatedly test the collected children’s responses and sentences, confirm their validity, and make necessary modifications to the prompt.

Therefore, GPT_ScoreAid has a feature that allows you to easily perform tasks such as repeatedly testing and remaking prompts (interpretation instructions) for sentences written by children.

GPT_ScoreAid is a software with new features incorporated, so I think there may be parts that cannot be fully conveyed even if I explain in detail here. Therefore, to make it easier to imagine, we have created a video explaining how to use it. If you are interested, please follow the link in this article to watch the video and deepen your understanding of the usage scenarios in schools that GPT-ScoreAid is aiming for.

As for the verification of the effectiveness and application scenarios of GPT_ScoreAid, we hope to collect as much information as possible in the future and convey it through this blog.

5 Conclusion

In the previous chapter, during my discussion with Mr. Gunji, I became aware of a concern that “With the emergence of such tools, some teachers might start grading students’ responses using the tool and return them without checking them themselves.” Generative AI is indeed an innovative tool that clearly distinguishes itself from the tools used in schools so far. However, if used incorrectly, its power could potentially work negatively.

In the endeavor of “education” that nurtures children, there are important elements that cannot be fulfilled by knowledge and cognition alone, as Kuniyoshi Obara advocated with the term “holistic education”. Gandhi also pointed out “Knowledge without Character” in his “Seven Social Sins”. It can be said that the essential form of education is to nurture and pass on important elements that reach the realm of humanity and personality, such as moral views and sensitivity to beauty, along with cognitive elements, when people teach and nurture others.

I believe that even if generative AI is utilized in various fields of education in the future, the involvement and activities of teachers as human beings will always remain at the core. I hope that the various activities and tasks in schools, which will be updated with the help of generative AI, will evolve into a form that allows us to devote time and effort to the truly essential parts of education.

6 Addendum: From the Perspective of Work Style Reform

At the end of this essay, I would like to add some thoughts on the use of GPT_ScoreAid from the perspective of work style reform for busy vice principals in Japan.
Various surveys have already shown that the busiest person in a Japanese school is the vice principal, and one of the tasks that keeps them busy is checking the comments on report cards.
Mistakes on report cards can greatly damage a school’s reputation, so it is common in Japanese schools to check repeatedly with great care to ensure there are no mistakes.


When it comes time to issue report cards, vice principals and head teachers check the comments on the report cards, as well as the evaluations based on descriptions in comprehensive learning time and moral education, with eagle eyes for any mistakes.


I believe that GPT_ScoreAid can also be effective in these areas. One of the tasks that GPT_ScoreAid excels at is arranging draft comments in Excel vertically and correcting expressions and mistakes such as typos and omissions. Especially in schools with a large number of students, it has the potential to dramatically reduce the workload.
Schools that are struggling with such tasks should definitely give GPT_ScoreAid a try.

Former Principal of Hakodate Kita Junior High School, Toshiyuki Okuzaki.

Initial settings of GPT-scoreAID
Utilization for grading descriptive problems integrated with Excel

※If you are interested in using GPT-ScoreAid ver1.0, please contact us through the following form, clearly stating your school name and contact information. The form can be found here.

コメント

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です