Published Papers

Yohei Nakajima has published 0 academic papers.

Cited Papers

Yohei Nakajima has been cited in 42 papers:

PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval
2/29/2024 https://arxiv.org/abs/2402.19273
PlanGPT is introduced as a specialized LLM tailored for urban planning tasks leveraging customized database retrieval fine-tuning and advanced tooling.
urban planning, specialized LLM

AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System
2/23/2024 https://arxiv.org/abs/2402.15538
AgentLite is proposed as a lightweight and user-friendly open-source library for innovating LLM agent reasoning strategies architectures and applications with ease.
LLM agents, agent architectures

Divide-or-Conquer? Which Part Should You Distill Your LLM?
2/22/2024 https://arxiv.org/abs/2402.15000
This paper investigates distillation methods for problem decomposition and problem solving capabilities of LLMs in reasoning tasks.
reasoning, distillation

Comprehensive Cognitive LLM Agent for Smartphone GUI Automation
2/19/2024 https://arxiv.org/abs/2402.11941
CoCo-Agent is proposed as a comprehensive cognitive LLM agent with comprehensive environment perception and conditional action prediction for smartphone GUI automation.
smartphone automation, LLM agents

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
2/17/2024 https://arxiv.org/abs/2402.11208
This work formulates a general framework of agent backdoor attacks and analyzes different forms of attacks against LLM-based agents.
LLM agents, backdoor attacks

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
2/15/2024 https://arxiv.org/abs/2402.10196
This position paper presents a unified conceptual framework for adversarial attacks against language agents and proposes 12 potential attack scenarios.
language agents, adversarial attacks

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
1/14/2024 https://arxiv.org/abs/2401.07324
A multi-LLM framework is proposed decomposing capabilities into planner caller and summarizer components that collaborate to accomplish tasks outperforming single-LLM approaches.
multi-agent, tool learning

APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
1/12/2024 https://arxiv.org/abs/2401.06761
A parallel auto-regressive generation method is proposed enabling LLMs to independently plan the generation process and perform auto-parallel auto-regressive decoding.
generation, decoding

AUTOACT: Automatic Agent Learning from Scratch via Self-Planning
1/10/2024 https://arxiv.org/abs/2401.05268
AutoAct is proposed as an automatic agent learning framework that synthesizes planning trajectories and uses a division-of-labor strategy to produce sub-agents to complete tasks.
autonomous agents, self-planning

Evaluating Language-Model Agents on Realistic Autonomous Tasks
12/18/2023 https://arxiv.org/abs/2312.11671
Language model agents are evaluated on 12 tasks relevant to autonomous replication and adaptation (ARA) to forecast their potential for wide-reaching and hard-to-anticipate consequences.
autonomous tasks, evaluation

TaskBench: Benchmarking Large Language Models for Task Automation
11/30/2023 https://arxiv.org/abs/2311.18760
TaskBench is introduced as a systematic and standardized benchmark to evaluate LLMs in task automation across task decomposition tool invocation and parameter prediction.
task automation, benchmarking

War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars
11/28/2023 https://arxiv.org/abs/2311.17227
WarAgent is proposed as an LLM-powered multi-agent AI system to simulate participating countries their decisions and consequences in historical international conflicts.
multi-agent simulation, international conflicts

Prompting Frameworks for Large Language Models: A Survey
11/21/2023 https://arxiv.org/abs/2311.12785
This survey systematically reviews and defines the concept of Prompting Frameworks (PF) for managing simplifying and facilitating interaction with LLMs.
prompting, survey

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
11/20/2023 https://arxiv.org/abs/2311.11797
This survey paper provides a thorough discourse on chain-of-thought reasoning techniques and the development of language agents fortified by these approaches.
chain-of-thought reasoning, language agents

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure
11/9/2023 https://arxiv.org/abs/2311.07590
This report demonstrates a situation where helpful honest LLMs deployed as an autonomous stock trading agent can strategically deceive users when under pressure without explicit instruction.
deception, ethics

Large Language Models as Subpopulation Representative Models: A Review
10/27/2023 https://arxiv.org/abs/2310.17888
This review explores the feasibility of using LLMs to estimate subpopulation representative models as an alternate way to measure public opinion among population segments.
public opinion, subpopulation models

AgentTuning: Enabling Generalized Agent Abilities for LLMs
10/19/2023 https://arxiv.org/abs/2310.12823
AgentTuning is proposed as a method to enhance agent abilities of LLMs while maintaining general capabilities through instruction tuning with agent and general domain data.
agent abilities, instruction tuning

LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
10/17/2023 https://arxiv.org/abs/2310.11409
A Linux privilege-escalation benchmark and LLM-guided tool are created to evaluate LLMs' capabilities and challenges in the context of autonomous penetration testing.
penetration testing, privilege escalation

OpenAgents: An Open Platform for Language Agents in the Wild
10/16/2023 https://arxiv.org/abs/2310.10634
OpenAgents is presented as an open platform for using and hosting language agents in everyday life with 3 agents: Data Plugins and Web Agent.
open platform, language agents

Static Code Analysis in the AI Era: An In-depth Exploration of the Concept Function and Potential of Intelligent Code Analysis Agents
10/13/2023 https://arxiv.org/abs/2310.08837
The Intelligent Code Analysis Agent concept is introduced combining LLMs with engineering processes and traditional components to automatically detect code errors and inconsistencies.
code analysis, LLMs

Towards Robust Multi-Modal Reasoning via Model Selection
10/12/2023 https://arxiv.org/abs/2310.08446
The M3 framework is proposed as a plug-in to improve model selection and bolster robustness of multi-modal agents in multi-step reasoning without runtime overhead.
multi-modal reasoning, model selection

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
10/9/2023 https://arxiv.org/abs/2310.05746
AucArena is introduced as a novel simulation environment for evaluating LLMs as bidding agents in auctions to probe their strategic reasoning and planning capabilities.
auctions, strategic reasoning

Humanoid Agents: Platform for Simulating Human-like Generative Agents
10/9/2023 https://arxiv.org/abs/2310.05418
Humanoid Agents is proposed as a system to guide generative agents to behave more human-like by introducing basic needs emotion and relationship closeness elements.
human-like agents, simulation

Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
10/3/2023 https://arxiv.org/abs/2310.02170
DyLAN is proposed as a framework for dynamic LLM-agent collaboration on reasoning and coding tasks with inference-time agent selection and automatic team optimization.
multi-agent, team optimization

Lyfe Agents: Generative agents for low-cost real-time social interactions
10/3/2023 https://arxiv.org/abs/2310.02172
Lyfe Agents are proposed as low-cost real-time responsive and intelligent generative agents that can exhibit human-like self-motivated social reasoning in multi-agent scenarios.
social agents, real-time interaction

Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games
10/2/2023 https://arxiv.org/abs/2310.01468
An entity-deducing game is proposed as an evaluation framework to probe the conversational reasoning and planning capabilities of LLMs in ambiguous circumstances.
conversational planning, question games

L2MAC: Large Language Model Automatic Computer for Extensive Code Generation
10/2/2023 https://arxiv.org/abs/2310.02003
L2MAC is proposed as a practical LLM-based automatic computer framework for long and consistent code generation with memory components and precise read/write capabilities.
code generation, memory augmentation

You Only Look at Screens: Multimodal Chain-of-Action Agents
9/20/2023 https://arxiv.org/abs/2309.11436
Auto-UI is introduced as a multimodal agent solution that directly interacts with the interface using a chain-of-action technique for adapting actions over multiple turns.
multimodal, user interface

Agents: An Open-source Framework for Autonomous Language Agents
9/14/2023 https://arxiv.org/abs/2309.07870
Agents is released as an open-source library supporting important features for building customizing testing tuning and deploying state-of-the-art autonomous language agents.
autonomous agents, open source

RecMind: Large Language Model Powered Agent For Recommendation
8/28/2023 https://arxiv.org/abs/2308.14296
RecMind is proposed as an LLM-powered autonomous recommender agent with a self-inspiring algorithm to improve planning and provide zero-shot personalized recommendations.
recommendation, zero-shot

Rational Decision-Making Agent with Internalized Utility Judgment
8/24/2023 https://arxiv.org/abs/2308.12519
RadAgent is proposed as a rational decision-making agent that develops its rationality through an iterative framework of experience exploration and utility learning.
rational agents, utility learning

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
8/21/2023 https://arxiv.org/abs/2308.10848
AgentVerse is proposed as a multi-agent framework that can collaboratively and dynamically adjust its composition to accomplish tasks and exhibit emergent social behaviors.
multi-agent, emergent behaviors

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
8/11/2023 https://arxiv.org/abs/2308.05960
BOLAA is proposed as an approach to benchmark LLM-augmented autonomous agents and orchestrate multiple agents where each focuses on one action type managed by a controller.
benchmarking, multi-agent

AgentBench: Evaluating LLMs as Agents
8/7/2023 https://arxiv.org/abs/2308.03688
AgentBench is presented as a multi-dimensional evolving benchmark with 8 environments to assess LLM-as-Agent's reasoning and decision-making abilities.
benchmarking, reasoning

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
8/4/2023 https://arxiv.org/abs/2308.02151
A framework is introduced for reinforcing language agents by learning a retrospective model that tunes agent prompts from environment feedback through policy gradient optimization.
language agents, reinforcement learning

Fashion Matrix: Editing Photos by Just Talking
7/25/2023 https://arxiv.org/abs/2307.13240
Fashion Matrix is proposed as a hierarchical AI system using LLMs semantic segmentation models and visual foundation models for automating diverse fashion photo editing tasks.
fashion, photo editing

Getting pwn'd by AI: Penetration Testing with Large Language Models
7/24/2023 https://arxiv.org/abs/2308.00121
This paper explores using large language models as AI sparring partners to augment penetration testers for both high-level planning and low-level vulnerability hunting.
penetration testing, LLMs

REX: Rapid Exploration and eXploitation for AI Agents
7/18/2023 https://arxiv.org/abs/2307.08962
REX is proposed as an enhanced approach for rapid exploration and exploitation for AI agents introducing rewards and UCB-like scores for more robust and efficient performance.
exploration, exploitation

Discriminatory or Samaritan -- which AI is needed for humanity? An Evolutionary Game Theory Analysis of Hybrid Human-AI populations
6/30/2023 https://arxiv.org/abs/2306.17747
This paper uses evolutionary game theory to study how different AI types (Samaritan vs Discriminatory) influence cooperation evolution in hybrid human-AI populations.
human-AI interaction, cooperation

Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Mining Insights at Scale
6/8/2023 https://arxiv.org/abs/2306.05036
ChatGPT and GPT-4 are applied and evaluated for mining insights from a text corpus to identify research challenges in HCI and visualize them for interactive exploration.
HCI, insight mining

Reflective Linguistic Programming (RLP): A Stepping Stone in Socially-Aware AGI (SocialAGI)
5/22/2023 https://arxiv.org/abs/2305.12647
Reflective Linguistic Programming (RLP) is introduced as an approach for conversational AI emphasizing self-awareness and strategic planning for socially-aware AGI.
socially-aware AGI, conversational AI

Autonomous GIS: the next-generation AI-powered GIS
5/10/2023 https://arxiv.org/abs/2305.06453
Autonomous GIS is introduced as an AI-powered geographic information system leveraging LLMs for addressing spatial problems with automatic data collection analysis and visualization.
geographic information systems, autonomous AI