サクサク読めて、
アプリ限定の機能も多数!
アプリで開く
●はてなブックマークって?
●アプリ・拡張の紹介
●ユーザー登録
●ログイン
●ログアウト
トップへ戻る
総合
●人気
●新着
●
IT
●
最新ガジェット
●
自然科学
●
経済・金融
●
おもしろ
●
マンガ
●
ゲーム
●
はてなブログ︵総合︶
一般
●人気
●新着
●
社会ニュース
●
地域
●
国際
●
天気
●
グルメ
●
映画・音楽
●
スポーツ
●
はてな匿名ダイアリー
世の中
●人気
●新着
●
新型コロナウイルス
●
働き方
●
生き方
●
地域
●
医療・ヘルス
●
教育
●
はてな匿名ダイアリー
政治と経済
●人気
●新着
●
政治
●
経済・金融
●
企業
●
仕事・就職
●
マーケット
●
国際
●
はてなブログ︵政治と経済︶
暮らし
●人気
●新着
●
カルチャー・ライフスタイル
●
ファッション
●
運動・エクササイズ
●
結婚・子育て
●
住まい
●
グルメ
●
お金
●
はてなブログ︵暮らし︶
●
掃除・整理整頓
●
雑貨
●
買ってよかったもの
●
旅行
●
アウトドア
●
趣味
学び
●人気
●新着
●
人文科学
●
社会科学
●
自然科学
●
語学
●
ビジネス・経営学
●
デザイン
●
法律
●
本・書評
●
将棋・囲碁
●
はてなブログ︵学び︶
テクノロジー
●人気
●新着
●
IT
●
セキュリティ技術
●
はてなブログ︵テクノロジー︶
●
AI・機械学習
●
プログラミング
●
エンジニア
おもしろ
●人気
●新着
●
まとめ
●
ネタ
●
おもしろ
●
これはすごい
●
かわいい
●
雑学
●
癒やし
エンタメ
●人気
●新着
●
スポーツ
●
映画
●
音楽
●
アイドル
●
芸能
●
お笑い
●
サッカー
●
話題の動画
アニメとゲーム
●人気
●新着
●
マンガ
●
Webマンガ
●
ゲーム
●
任天堂
●
PlayStation
●
アニメ
●
バーチャルYouTuber
●
オタクカルチャー
●
おすすめ
デスク環境を整える
﹃arXiv.org e-Print archive﹄
●
人気
●
新着
●
すべて
3users
arxiv.org
The fields of Origin of Life and Artificial Life both question what life is and how it emerges from a distinct set of "pre-life" dynamics. One common feature of most substrates where life emerges is a marked shift in dynamics when self-replication appears. While there are some hypotheses regarding how self-replicators arose in nature, we know very little about the general dynamics, computational p
●
学び
●2024/07/17 18:19
●あとで読む
4users
arxiv.org
Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) met
●
テクノロジー
●2024/06/18 08:44
●research
●あとで読む
4users
arxiv.org
This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic s
●
テクノロジー
●2024/06/13 12:36
4users
arxiv.org
Matrix multiplication (MatMul) typically dominates the overall computational cost of large language models (LLMs). This cost only grows as LLMs scale to larger embedding dimensions and context lengths. In this work, we show that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. Our experiments show that our proposed MatMul-fr
●
テクノロジー
●2024/06/10 18:25
●AI
●研究
●技術
4users
arxiv.org
Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across
●
テクノロジー
●2024/06/07 18:49
3users
arxiv.org
Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across vari
●
テクノロジー
●2024/06/06 19:13
4users
arxiv.org
This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o
●
テクノロジー
●2024/05/26 01:32
6users
arxiv.org
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking
●
テクノロジー
●2024/05/23 08:58
●言語
●AI
●あとで読む
4users
arxiv.org
Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a not
●
学び
●2024/05/17 23:26
6users
arxiv.org
Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinat
●
テクノロジー
●2024/05/17 11:50
●あとで読む
3users
arxiv.org
The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architect
●
テクノロジー
●2024/05/06 22:09
12users
arxiv.org
Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametriz
●
テクノロジー
●2024/05/01 16:37
●機械学習
●論文
3users
arxiv.org
Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This c
●
学び
●2024/04/30 17:29
9users
arxiv.org
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset
●
テクノロジー
●2024/04/23 11:46
●あとで読む
3users
arxiv.org
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative
●
テクノロジー
●2024/04/22 02:20
4users
arxiv.org
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enha
●
学び
●2024/04/18 20:02
4users
arxiv.org
This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-te
●
テクノロジー
●2024/04/13 01:46
7users
arxiv.org
Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model's capability via loss or benchmarks, we estimate the number of knowledge bits a model stores. We focus on factual knowledge represented as tuples, such as (USA, capital, Washington D.C.) from a Wikipedia page. Through multiple controlled datasets, we establi
●
学び
●2024/04/10 22:16
10users
arxiv.org
Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in ref
●
テクノロジー
●2024/04/03 15:14
●Apple
●言語
●あとで読む
●*あとで読む
3users
arxiv.org
In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs. We first introduce DARE to set most delta parameters (i.e., the disparity between fine-tuned and pre-trained parameters) to zeros without affecting the abilities of Supervised Fine-Tuning (SFT) LMs, which randomly Drops delta parameters with
●
テクノロジー
●2024/04/02 15:06
4users
arxiv.org
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows reso
●
テクノロジー
●2024/04/01 22:38
18users
arxiv.org
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization o
●
テクノロジー
●2024/03/23 16:06
●あとで読む
9users
arxiv.org
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically disc
●
学び
●2024/03/21 09:47
6users
arxiv.org
Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su
●
テクノロジー
●2024/03/19 00:15
5users
arxiv.org
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la
●
テクノロジー
●2024/03/17 18:13
3users
arxiv.org
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors
●
学び
●2024/03/12 15:29
●論文
5users
arxiv.org
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Ba
●
テクノロジー
●2024/03/12 13:42
●セキュリティ
5users
arxiv.org
An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools.
●
テクノロジー
●2024/03/06 20:07
●機械学習
●学習
●AI
●*あとで読む
3users
arxiv.org
●
学び
●2024/02/29 05:29
次のページ
このページはまだ
ブックマークされていません
このページを最初にブックマークしてみませんか?
﹃arXiv.org e-Print archive﹄の新着エントリーを見る
キーボードショートカット一覧
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く
●総合
●一般
●世の中
●政治と経済
●暮らし
●学び
●テクノロジー
●エンタメ
●アニメとゲーム
●おもしろ
●アプリ・拡張機能
●開発ブログ
●ヘルプ
●お問い合わせ
●ガイドライン
●利用規約
●プライバシーポリシー
●利用者情報の外部送信について
●ガイドライン
●利用規約
●プライバシーポリシー
●利用者情報の外部送信について
●公式アカウント
●ホットエントリー
●はてなブログ
●はてなブログPro
●人力検索はてな
●はてなブログ タグ
●はてなニュース
●ソレドコ
Copyright © 2005-2024 Hatena. All Rights Reserved.
設定を変更しましたx