サクサク読めて、アプリ限定の機能も多数！アプリで開くはてなブックマーク閉じる ●はてなブックマークって？●アプリ・拡張の紹介●ユーザー登録●

はてなブックマーク

総合 ●人気 ●新着 ● IT ● 最新ガジェット ● 自然科学 ● 経済・金融 ● おもしろ ● マンガ ● ゲーム ● はてなブログ︵総合︶一般 ●人気 ●新着 ● 社会ニュース ● 地域 ● 国際 ● 天気 ● グルメ ● 映画・音楽 ● スポーツ ● はてな匿名ダイアリー ●人気 ●新着 ● 新型コロナウイルス ● 働き方 ● 生き方 ● 地域 ● 医療・ヘルス ● 教育 ● はてな匿名ダイアリー政治と経済 ●人気 ●新着 ● 政治 ● 経済・金融 ● 企業 ● 仕事・就職 ● マーケット ● 国際 ● はてなブログ︵政治と経済︶暮らし ●人気 ●新着 ● カルチャー・ライフスタイル ● ファッション ● 運動・エクササイズ ● 結婚・子育て ● 住まい ● グルメ ● お金 ● はてなブログ︵暮らし︶ ● 掃除・整理整頓 ● 雑貨 ● 買ってよかったもの ● 旅行 ● アウトドア ● 趣味学び ●人気 ●新着 ● 人文科学 ● 社会科学 ● 自然科学 ● 語学 ● ビジネス・経営学 ● デザイン ● 法律 ● 本・書評 ● 将棋・囲碁 ● はてなブログ︵学び︶テクノロジー ●人気 ●新着 ● IT ● セキュリティ技術 ● はてなブログ︵テクノロジー︶ ● AI・機械学習 ● プログラミング ● エンジニアおもしろ ●人気 ●新着 ● まとめ ● ネタ ● おもしろ ● これはすごい ● かわいい ● 雑学 ● 癒やしエンタメ ●人気 ●新着 ● スポーツ ● 映画 ● 音楽 ● アイドル ● 芸能 ● お笑い ● サッカー ● 話題の動画アニメとゲーム ●人気 ●新着 ● マンガ ● Webマンガ ● ゲーム ● 任天堂 ● PlayStation ● アニメ ● バーチャルYouTuber ● オタクカルチャー ● デスク環境を整える

﹃arXiv.org e-Print archive﹄

● 人気 ● 新着 ● すべて

Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction

3users

arxiv.org The fields of Origin of Life and Artificial Life both question what life is and how it emerges from a distinct set of "pre-life" dynamics. One common feature of most substrates where life emerges is a marked shift in dynamics when self-replication appears. While there are some hypotheses regarding how self-replicators arose in nature, we know very little about the general dynamics, computational p ● 学び ●2024/07/17 18:19 ●あとで読む

Mixture-of-Agents Enhances Large Language Model Capabilities

4users

arxiv.org Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) met ● テクノロジー ●2024/06/18 08:44 ●research ●あとで読む

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

4users

arxiv.org This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic s ● テクノロジー ●2024/06/13 12:36

Scalable MatMul-free Language Modeling

4users

arxiv.org Matrix multiplication (MatMul) typically dominates the overall computational cost of large language models (LLMs). This cost only grows as LLMs scale to larger embedding dimensions and context lengths. In this work, we show that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. Our experiments show that our proposed MatMul-fr ● テクノロジー ●2024/06/10 18:25 ●AI ●研究 ●技術

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

4users

arxiv.org Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across ● テクノロジー ●2024/06/07 18:49

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

3users

arxiv.org Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across vari ● テクノロジー ●2024/06/06 19:13

Your Transformer is Secretly Linear

4users

arxiv.org This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o ● テクノロジー ●2024/05/26 01:32

Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

6users

arxiv.org Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking ● テクノロジー ●2024/05/23 08:58 ●言語 ●AI ●あとで読む

Sakuga-42M Dataset: Scaling Up Cartoon Research

4users

arxiv.org Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a not ● 学び ●2024/05/17 23:26

Seven Failure Points When Engineering a Retrieval Augmented Generation System

6users

arxiv.org Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinat ● テクノロジー ●2024/05/17 11:50 ●あとで読む

A Primer on the Inner Workings of Transformer-based Language Models

3users

arxiv.org The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architect ● テクノロジー ●2024/05/06 22:09

KAN: Kolmogorov-Arnold Networks

12users

arxiv.org Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametriz ● テクノロジー ●2024/05/01 16:37 ●機械学習 ●論文

Building a Large Japanese Web Corpus for Large Language Models

3users

arxiv.org Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This c ● 学び ●2024/04/30 17:29

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

9users

arxiv.org We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset ● テクノロジー ●2024/04/23 11:46 ●あとで読む

Many-Shot In-Context Learning

3users

arxiv.org Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative ● テクノロジー ●2024/04/22 02:20

A Survey on Retrieval-Augmented Text Generation for Large Language Models

4users

arxiv.org Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enha ● 学び ●2024/04/18 20:02

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

4users

arxiv.org This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-te ● テクノロジー ●2024/04/13 01:46

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

7users

arxiv.org Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model's capability via loss or benchmarks, we estimate the number of knowledge bits a model stores. We focus on factual knowledge represented as tuples, such as (USA, capital, Washington D.C.) from a Wikipedia page. Through multiple controlled datasets, we establi ● 学び ●2024/04/10 22:16

ReALM: Reference Resolution As Language Modeling

10users

arxiv.org Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in ref ● テクノロジー ●2024/04/03 15:14 ●Apple ●言語 ●あとで読む ●*あとで読む

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

3users

arxiv.org In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs. We first introduce DARE to set most delta parameters (i.e., the disparity between fine-tuned and pre-trained parameters) to zeros without affecting the abilities of Supervised Fine-Tuning (SFT) LMs, which randomly Drops delta parameters with ● テクノロジー ●2024/04/02 15:06

Jamba: A Hybrid Transformer-Mamba Language Model

4users

arxiv.org We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows reso ● テクノロジー ●2024/04/01 22:38

The Elements of Differentiable Programming

18users

arxiv.org Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization o ● テクノロジー ●2024/03/23 16:06 ●あとで読む

Evolutionary Optimization of Model Merging Recipes

9users

arxiv.org We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically disc ● 学び ●2024/03/21 09:47

RAFT: Adapting Language Model to Domain Specific RAG

6users

arxiv.org Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su ● テクノロジー ●2024/03/19 00:15

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

5users

arxiv.org In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la ● テクノロジー ●2024/03/17 18:13

Is Cosine-Similarity of Embeddings Really About Similarity?

3users

arxiv.org Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors ● 学び ●2024/03/12 15:29 ●論文

Stealing Part of a Production Language Model

5users

arxiv.org We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Ba ● テクノロジー ●2024/03/12 13:42 ●セキュリティ

Applied Causal Inference Powered by ML and AI

5users

arxiv.org An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools. ● テクノロジー ●2024/03/06 20:07 ●機械学習 ●学習 ●AI ●*あとで読む

https://arxiv.org/pdf/2402.17764.pdf

3users

arxiv.org ● 学び ●2024/02/29 05:29 次のページ

このページはまだ
ブックマークされていません

このページを最初にブックマークしてみませんか？

﹃arXiv.org e-Print archive﹄の新着エントリーを見る

キーボードショートカット一覧

j次のブックマーク k前のブックマーク lあとで読む eコメント一覧を開く oページを開く ●総合 ●一般 ● ●政治と経済 ●暮らし ●学び ●テクノロジー ●エンタメ ●アニメとゲーム ●おもしろ ●アプリ・拡張機能 ●開発ブログ ●ヘルプ ●お問い合わせ ●ガイドライン ●利用規約 ●プライバシーポリシー ●利用者情報の外部送信について ●ガイドライン ●利用規約 ●プライバシーポリシー ●利用者情報の外部送信について ●公式アカウント ●ホットエントリー ●はてなブログ ●はてなブログPro ●人力検索はてな ●はてなブログタグ ●はてなニュース ●ソレドコ

はてなブックマーク

﹃arXiv.org e-Print archive﹄

Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction

Mixture-of-Agents Enhances Large Language Model Capabilities

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Scalable MatMul-free Language Modeling

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Your Transformer is Secretly Linear

Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

Sakuga-42M Dataset: Scaling Up Cartoon Research

Seven Failure Points When Engineering a Retrieval Augmented Generation System

A Primer on the Inner Workings of Transformer-based Language Models

KAN: Kolmogorov-Arnold Networks

Building a Large Japanese Web Corpus for Large Language Models

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Many-Shot In-Context Learning

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

ReALM: Reference Resolution As Language Modeling

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Jamba: A Hybrid Transformer-Mamba Language Model

The Elements of Differentiable Programming

Evolutionary Optimization of Model Merging Recipes

RAFT: Adapting Language Model to Domain Specific RAG

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Is Cosine-Similarity of Embeddings Really About Similarity?

Stealing Part of a Production Language Model

Applied Causal Inference Powered by ML and AI

https://arxiv.org/pdf/2402.17764.pdf

このページはまだ
ブックマークされていません

キーボードショートカット一覧