About Me

三戸圭史

Ph.D. candidate in Statistical Science

コンピュータサイエンスと機械学習を生業とする研究者・エンジニアです．

専門グラフデータ解析, 最適輸送

拠点東京

Email sando.keishi.sp@alumni.tsukuba.ac.jp

研究業績

Journal Papers

November 2025

概要を表示

Tree Structure for the Categorical Wasserstein Weisfeiler-Lehman Graph Kernel

Keishi Sando, Tam Le, Hideitsu Hino

Transactions on Machine Learning Research

The Wasserstein Weisfeiler-Lehman(WWL) graph kernel is a popular and efficient approach, utilized in various kernel-dependent machine learning frameworks for practical applications with graph data. It incorporates optimal transport geometry into the Weisfeiler-Lehman graph kernel, to mitigate the information loss inherent in aggregation strategies of graph kernels. While the WWL graph kernel demonstrates superior performance in many applications, it suffers a drawback in its computational complexity, i.e., at least $\mathcal{O}(n_{1} n_{2})$, where $n_{1}, n_{2}$ denote the number of vertices in the input graphs. Consequently, it hinders the practical applicability of the WWL graph kernel, especially in large-scale settings. In this paper, we propose the \emph{Tree Wasserstein Weisfeiler-Lehman}(TWWL) algorithm, which leverages a \emph{tree structure} to scale up the exact computation of the WWL graph kernel for graph data with categorical node labels. In particular, the computational complexity of the TWWL algorithm is $\mathcal{O}(n_{1} + n_{2})$, which enables its application to large-scale graphs. Numerical experiments demonstrate that the performance of the proposed algorithm compares favorably with baseline kernels, while its computation is several orders of magnitude faster than the classic WWL graph kernel. This paves the way for applications in large-scale datasets where the WWL kernel is computationally prohibitive.

外部リンク

August 2025

概要を表示

Complex non-backtracking matrix for directed graphs

Keishi Sando, Hideitsu Hino

Journal of Complex Networks

Graph representation matrices are essential tools in graph data analysis. Recently, Hermitian adjacency matrices have been proposed to investigate directed graph structures. Previous studies have demonstrated that these matrices can extract valuable information for clustering. In this paper, we propose the complex non-backtracking matrix that integrates the properties of the Hermitian adjacency matrix and the non-backtracking matrix. The proposed matrix has similar properties with the non-backtracking matrix of undirected graphs. We reveal relationships between the complex non-backtracking matrix and the Hermitian adjacency matrix. Also, we provide intriguing insights that this matrix representation holds cluster information, particularly for sparse directed graphs.

DOI: 10.1093/comnet/cnaf012

October 2020

概要を表示

Modal Principal Component Analysis

Keishi Sando, Hideitsu Hino

Neural Computation

Principal component analysis (PCA) is a widely used method for data processing, such as for dimension reduction and visualization. Standard PCA is known to be sensitive to outliers, and various robust PCA methods have been proposed. It has been shown that the robustness of many statistical methods can be improved using mode estimation instead of mean estimation, because mode estimation is not significantly affected by the presence of outliers. Thus, this study proposes a modal principal component analysis (MPCA), which is a robust PCA method based on mode estimation. The proposed method finds the minor component by estimating the mode of the projected data points. As a theoretical contribution, probabilistic convergence property, influence function, finite-sample breakdown point, and its lower bound for the proposed MPCA are derived. The experimental results show that the proposed method has advantages over conventional methods.

DOI: 10.1162/neco_a_01308

June 2019

概要を表示

Information Geometry of Modal Linear Regression

Keishi Sando, Shotaro Akaho, Noboru Murata, Hideitsu Hino

Information Geometry

Modal linear regression (MLR) is used for modeling the conditional mode of a response as a linear predictor of explanatory variables. It is an effective approach to dealing with response variables having a multimodal distribution or those contaminated by outliers. Because of the semiparametric nature of MLR, constructing a statistical model manifold is difficult with the conventional approach. To overcome this difficulty, we first consider the information geometric perspective of the modal expectation–maximization (EM) algorithm. Based on this perspective, model manifolds for MLR are constructed according to observations, and a data manifold is constructed based on the empirical distribution. In this paper, the em algorithm, which is a geometric formulation of the EM algorithm, of MLR is shown to be equivalent to the conventional EM algorithm of MLR. The robustness of the MLR model is also discussed in terms of the influence function and information geometry.

DOI: 10.1007/s41884-019-00017-y

Refereed Internal Conferences

December 2018

概要を表示

Information Geometric Perspective of Modal Linear Regression

Keishi Sando, Shotaro Akaho, Noboru Murata, Hideitsu Hino

Proceedings of the 25th International Conference on Neural Information Processing (ICONIP 2018)

Modal linear regression (MLR) is a standard method for modeling the conditional mode of a response variable using a linear combination of explanatory variables. It is effective when dealing with response variables with an asymmetric, multi-modal distribution. Because of the nonparametric nature of MLR, it is difficult to construct a statistical model manifold in the sense of information geometry. In this work, a model manifold is constructed using observations instead of explicit parametric models. We also propose a method for constructing a data manifold based on an empirical distribution. The em algorithm, which is a geometric formulation of the EM algorithm, of MLR is shown to be equivalent to the conventional EM algorithm of MLR.

DOI: 10.1007/978-3-030-04182-3_47

International Conferences

February 2026

概要を表示

Accelerating the Categorical Wasserstein Weisfeiler-Lehman Graph Kernel

Keishi Sando, Tam Le, Hideitsu Hino

ISI-ISM-ISSAS Joint Conference 2026 (to appear)

Graph kernels have long been established as effective and versatile tools for predicting graph properties in various classification and regression tasks on graph-structured data. Recently, the field has seen significant advancements with the introduction of graph kernels that incorporate the concept of Optimal Transport (OT), aiming to better capture geometric structures. However, high computational cost remains a major bottleneck. Many existing kernel methods struggle to scale effectively when faced with increasing numbers of graphs in a dataset or increasing node counts within individual graphs. Addressing this scalability issue, this study proposes a novel, fast computation algorithm for the Categorical Wasserstein Weisfeiler-Lehman (WWL) kernel. We reveal that the latent feature structure utilized by the WWL kernel can be effectively represented as a tree structure. By leveraging this insight, we reduce the kernel computation to the calculation of the Wasserstein distance on a tree, which allows for a drastic reduction in complexity. Specifically, while the original WWL kernel implementation relies on the Sinkhorn algorithm, which requires a quadratic computational cost of O(n1 n2) for graphs with n1 and n2 nodes, our proposed method achieves linear time complexity, O(n1+n2). A critical distinction is that the Sinkhorn algorithm only provides an entropically regularized approximate solution. In contrast, our tree-based approach yields the exact Wasserstein distance. Numerical experiments demonstrate that our method is significantly faster than existing approaches while achieving equal or superior classification performance, thereby proving its suitability for large-scale graph datasets.

外部リンク

November 2024

概要を表示

Complex non-backtracking matrix for directed graphs and its application to clustering

Keishi Sando, Hideitsu Hino

The Interim Conference of the Asian Regional Section of the International Association for Statistical Computing

Graph representation matrices are essential tools in graph data analysis. Recently, Hermitian adjacency matrices have been proposed to study the structure of directed graphs. Previous studies have demonstrated that these matrices can extract valuable information for clustering. In this paper, we introduce the complex non-backtracking (CNBT) matrix corresponding to a Hermitian adjacency matrix. The proposed matrix shares similar properties with the non-backtracking matrix of undirected graphs. We reveal relationships between the complex non-backtracking matrix and the Hermitian adjacency matrix. Furthermore, we experimentally show that our spectral clustering method based on the CNBT matrix outperforms an existing method, particularly for sparse directed graphs, which are common in real-world data where the number of edges is only a few times the number of vertices.

国内ワークショップ

2025年11月

概要を表示

木構造によるCategorical Wasserstein Weisfeiler-Lehman Graph Kernelの高速化

三戸圭史, LeTam, 日野英逸

第28回情報論的学習理論ワークショップ

グラフの頂点ラベルを用いるWeisfeiler-Lehman(WL)カーネルに最適輸送を導入したWasserstein WL(WWL)カーネルは優れた性能を示す一方で計算コストが高く大規模グラフデータセットへの適用が困難である．本研究はWLアルゴリズムで生成されるラベル構造に着目した高速化を提案し，離散ラベルWWLカーネルの厳密なWasserstein距離が頂点数の定数倍で得られることを示した．

外部リンク

2024年11月

概要を表示

有向グラフでの複素Non-backtracking行列とクラスタリングへの応用

三戸圭史, 日野英逸

第27回情報論的学習理論ワークショップ

有向グラフの行列表現の1つであるエルミート隣接行列に対応するNon-backtracking(NBT)行列として，複素NBT(CNBT)行列を提案する．無向グラフの隣接行列とNBT行列間で成立するいくつかの関係がエルミート隣接行列とCNBT行列間でも成り立つことを示し，また数値実験によって，スパースグラフのノードクラスタリングでは提案手法が既存手法よりも頑健な推定が可能であることを示す．

外部リンク

2020年3月

概要を表示

最頻値推定量を用いた主成分分析の提案

三戸圭史, 日野英逸

第39回情報論的学習理論と機械学習研究会

🏆 IBIS2019 Research Award Finalist

主成分分析は次元削減や可視化のようなデータ分析のための手法として広く一般に用いられている．しかしながら外れ値に対して脆弱であることが知られており，頑健化のために様々な手法が提案されている．また一般的な統計手法において，外れ値に対する頑健性を改善するために最頻値推定の概念を取り入れた手法が提案されているが，主成分分析への適用はこれまで行われていない．本稿では，主成分方向ではなくマイナー成分方向に着目することで最頻値推定の概念を主成分分析に取り入れた，最頻値主成分分析の提案を行う．提案する推定量の収束性や影響関数，破局点の側面から理論的な性質を評価し，数値実験によって，提案手法が既存手法と同程度の結果を示し，いくつかの状況において優位な性能を有することを確認した．

外部リンク

2018年3月

概要を表示

最頻値線形回帰の情報幾何

三戸圭史, 赤穂昭太郎, 村田昇, 日野英逸

第32回情報論的学習理論と機械学習研究会

MLR（Modal Linear Regression, 最頻値線形回帰）は，出力の最頻値が説明変数の線形予測子で表されるというモデルに基づく回帰手法である．線形回帰と異なり，パラメトリックな分布の仮定がないという意味で制約が緩いことが, MLRの特徴の1つである. 一方, パラメトリックな分布の仮定がないことが，MLRを情報幾何学の枠組みで捉える際のモデル多様体，データ多様体の構成を非自明にする．本稿では，MLRを情報幾何学の枠組みで捉えるにあたり，一般に経験分布を構成するために用いる観測データを，モデル多様体を構成するために用いることを提案する．またMEMアルゴリズムの考察を通じて，MLRにおける誤差分布の最頻値の仮定を用いた経験分布の構成方法を提案する．本稿のモデル多様体，データ多様体における$em$アルゴリズムが，MLRをEMアルゴリズムの枠組みで捉えた際のE-Step，M-Stepの計算と等しくなるという結果を得た．

外部リンク

経歴

学歴

April 2023 - Present

Ph.D. course of Statistical Science

The Graduate University for Advanced Studies @ Institute of Statistical Mathematics (ISM)

統計数理研究所にて，グラフデータ解析手法に関する研究に従事しています．

April 2018 - March 2020

Master of Computer Science

University of Tsukuba

最頻値が局所情報であることから外れ値に対して頑健である性質に着目し，最頻値を用いることで外れ値に脆弱な主成分分析を頑健化する研究課題に取り組みました．

April 2014 - March 2018

Bachelor of Computer Science

University of Tsukuba

学部2年までコンピュータサイエンスを学んでいましたが，3年次から数学に目覚め，群環体論や位相・集合，解析学を独学し，機械学習の研究室に所属しました．

職歴

April 2023 - Present

SOKENDAI Special Researcher

The Graduate University for Advanced Studies

グラフデータ解析手法に関する研究課題を採択され，SOKENDAI特別研究員として給付型奨学金を受給して活動しています．

August 2022 - March 2023

Research Assistant

National Institute for Environmental Studies

化学物質による影響を潜在アウトカムの枠組みで評価するプロジェクトにおいて，解析データやRでの分析コードの整備を担いました．

March 2021 - July 2022

Machine Learning Engineer

Global AI Innovations Laboratory

主に鉄鋼領域での切り出し最適化問題のソリューション開発に従事し，お客様との打ち合わせから要件定義，データスキーマの整備，求解アルゴリズムの実装とPoC準備までの一連の業務を担当しました．

April 2020 - February 2021

System Engineer

KDDI CORPORATION

通信モジュールのソリューション提供をする部門のシステムエンジニアとして，お客様との打ち合わせやdatadogを用いた監視基盤の構築を担当しました．

ポートフォリオ

KeyLytix

web application

タイピング速度を計測し，QWERTY配列に代わる次世代のキーボード配列を提案するWebアプリです．

twwl (Tree Wasserstein Weisfeiler-Lehman graph kernel)

machine learning library

頂点数の二乗オーダーで計算されていたCategorical Wasserstein Weisfeiler-Lehman graph kernelを，頂点数の線形オーダーで正確かつ高速に計算するMLライブラリです．

資格

簿記3級 4th September 2024

TOEIC 735点 18th December 2022

AWS Certificated Developer Associate (expired) 5th October 2020 - 5th October 2023

更新履歴

0.2.4 2025-12-18

経歴説明の追加

0.2.3 2025-12-18

資格・ポートフォリオ情報の追加

0.2.2 2025-12-04

総研大統計科学コースアドベントカレンダー2025の記事追加