SlideShare a Scribd company logo
1 of 31
Download to read offline
An Intelligent Storage
~SQL Execution on GPU Closely Connected with SSD~
PG-Strom Development Team
KaiGai Kohei <kaigai@kaigai.gr.jp>
The PG-Strom Project
Who am I?
▌KaiGai Kohei
 PG-Strom Development Team
▌Background
 Linux kernel (SELinux, …)
 PostgreSQL database
 Enhancement of security features
 Writable FDW, Remote Join
 Custom Scan Interface
 GPU and CUDA
▌PG-Strom
 SQL Acceleration Technology using GPU
 Solution for data analytics in-database
2 Open Source Summit Japan 2017 - An Intelligent Storage
The PG-Strom Project
PG-Strom
PG-Strom (1/3) – GPU as accelerator of SQL
3
GPU CPU
Model
NVIDIA
Tesla P40
Intel Xeon
E5-2699v4
Code name Pascal Broadwell
Release Q3-2016 Q1-2016
# of transistors 12billion 7.2billion
# of cores
3840
(simple)
22
(functional)
Core clocks
1.306GHz
~1.53GHz
2.20GHz
~3.60GHz
Performance
(FP32)
12 TFLOPS
1.2 TFLOPS
(with AVX2)
RAM Size 24GB (GDDR5) max1.5TB (DDR4)
Memory Band 347GB/s 76.8GB/s
Power
consumption
250W 145W
Manufacturing
process
16nm 14nm
Application
GPU
Off-load
WHERE, JOIN,
GROUP BY
Auto generation
of GPU binary
from SQL
PG-Strom: Extension module of PostgreSQL, to accelerate SQL workloads
using multi-thousands cores and high bandwidth memory.
Same
SQL
Statement
Same
Data
Structure
Open Source Summit Japan 2017 - An Intelligent Storage
The PG-Strom Project
PG-Strom (2/3) – Transparent GPU code generation on demand
Open Source Summit Japan 2017 - An Intelligent Storage4
QUERY: SELECT cat, count(*), avg(x) FROM t0
WHERE x between y and y + 20.0 GROUP BY cat;
:
STATIC_FUNCTION(bool)
gpupreagg_qual_eval(kern_context *kcxt,
kern_data_store *kds,
size_t kds_index)
{
pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1);
pg_float8_t KVAR_3 = pg_float8_vref(kds,kcxt,2,kds_index);
pg_float8_t KVAR_4 = pg_float8_vref(kds,kcxt,3,kds_index);
return EVAL((pgfn_float8ge(kcxt, KVAR_3, KVAR_4) &&
pgfn_float8le(kcxt, KVAR_3,
pgfn_float8pl(kcxt, KVAR_4, KPARAM_1))));
} :
E.g) Generate CUDA C code from
calculation formula in WHERE-clause
Reference to input data
SQL expression in CUDA source code
Run-time
Compiler
(nvrtc)
Just-in-time
Compile
Parallel
Execution
The PG-Strom Project
PG-Strom (3/3) – An example of SQL acceleration
5
▌Test Query:
SELECT cat, count(*), avg(x)
FROM t0 NATURAL JOIN t1 [NATURAL JOIN t2 ...]
GROUP BY cat;
 t0 contains 100M rows, t1...t8 contains 100K rows (like a start schema)
Open Source Summit Japan 2017 - An Intelligent Storage
8.48
13.23
18.28
23.42
28.88
34.50
40.77
47.16
5.00 5.46 5.91 6.45 7.17 8.07
9.22 10.21
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
2 3 4 5 6 7 8 9
QueryResponseTime[sec]
Number of tables joined
PG-Strom microbenchmark with JOIN/GROUP BY
PostgreSQL v9.6 PG-Strom 2.0devel
CPU: Xeon E5-2640v4
GPU: Tesla P40
RAM: 128GB
OS: CentOS 7.3
DB: PostgreSQL 9.6.2 +
PG-Strom 2.0devel
The PG-Strom Project
In-database Analytics
▌Computing close to data
 Cost to move data
 Cost to transform / validate data
 Waste of time for DBAs
▌Utilization of SQL’s flexibility
 Pre-/Post-process of analytic
algorithms
 Conjunction with other data (JOIN)
 Summarization of results
(GROUP BY/Window Functions)
Open Source Summit Japan 2017 - An Intelligent Storage6
The PG-Strom Project
What we want to develop (1/2)
Open Source Summit Japan 2017 - An Intelligent Storage7
Data
Collection
Summarize
Pre-process
Data
Generation
Analytics
& ML
Visualization
Data Size: Large  Small
Amount of computing: Small  Large
SSD-to-GPU
Direct SQL Exec
Transparent
GPU code generation
& Asynchronous Exec
Statistical Analytics
& Machine Learning
Libraries
Advanced algorithms
as a part of SQL statement
RowColum
Hardware
Transform
BI Tools
Execution of entire lifecycle
for data analytics in-database
DNN k-means
SVM
Random
Forest
PL/CUDA
NVMe-SSD GPU
The PG-Strom Project
What we want to develop (2/2) – elemental technology
Open Source Summit Japan 2017 - An Intelligent Storage8
Data
Collection
Summarize
Pre-process
Data
Generation
Analytics
& ML
Visualization
Data Size: Large  Small
Amount of computing: Small  Large
SSD-to-GPU
Direct SQL Exec
Transparent
GPU code generation
& Asynchronous Exec
Statistical Analytics
& Machine Learning
Libraries
Advanced algorithms
as a part of SQL statement
RowColum
Hardware
Transform
BI Tools
Execution of entire lifecycle
for data analytics in-database
DNN k-means
SVM
Random
Forest
PL/CUDA
NVMe-SSD GPU
SSD-to-GPU
Direct SQL
Execution
(for I/O)
PG-Strom
(for SQL)
PL/CUDA
and
Library
(for computing)
The PG-Strom Project
I/O Acceleration with GPU?
Open Source Summit Japan 2017 - An Intelligent Storage9
The PG-Strom Project
Re-definition of GPU’s role
I/O workloads Computing workloads
Open Source Summit Japan 2017 - An Intelligent Storage10
The PG-Strom Project
Brief Architecture of GPU
GPU Device
Memory
(GDDR5 or HBM2)
Host RAM
Data Transfer with DMA
over the PCIe Bus
CPU
PCIe Bus
Also data transfer via DMA
from: particular blocks
to: destination physical
address
Storage
GPU Device
Open Source Summit Japan 2017 - An Intelligent Storage11
The PG-Strom Project
Technology Basis (1/2) – GPUDirect RDMA
▌Feature for peer-to-peer DMA between GPU and PCIe devices
▌Originally, designed for MPI over Infiniband
▌But available for any kind of PCIe devices, if custom kernel module
Copyright (c) NVIDIA corporation, 2015
Open Source Summit Japan 2017 - An Intelligent Storage12
The PG-Strom Project
Technology Basis (2/2) – GPUDirect RDMA
Physical
Address Space
PCIe BAR1 AreaGPU
RAM
Host
RAM
NVMe-SSD Infiniband
HBA
PCIe devices
GPUDirect RDMA
A feature to map GPU’s device
memory on physical address space
of the host system Once GPU device memory gets physical
address of the host system, driver can
use the address as source/destination
of DMA operations.
13 Open Source Summit Japan 2017 - An Intelligent Storage
The PG-Strom Project
Idea of SSD-to-GPU Direct SQL Execution
Open Source Summit Japan 2017 - An Intelligent Storage14
Large PostgreSQL Tables
PCIe Bus
NVMe SSD GPUSSD-to-GPU P2P DMA
(NVMe-Strom driver)
WHERE
JOIN
GROUP BYPostgreSQL
Data Block
Reduction of unnecessary
rows according to the SQL
 Smaller amount of I/O
Existing Data Flow
(Large I/O Size)
SSD-to-GPU Direct SQL Execution – Load PostgreSQL’s data block to GPU directly,
then reduce unnecessary rows with multi-thousands cores of GPUs.
The PG-Strom Project
As destination of
SSD to GPU DMA commands!!
Linux kernel APIs of GPUDirect RDMA
int nvidia_p2p_get_pages(uint64_t p2p_token, /* deprecated */
uint32_t va_space, /* deprecated */
uint64_t virtual_address,
uint64_t length,
struct nvidia_p2p_page_table **page_table,
void (*free_callback)(void *private_data),
void *private_data);
struct nvidia_p2p_page_table {
uint32_t version;
uint32_t page_size;
struct nvidia_p2p_page **pages;
uint32_t entries;
uint8_t *gpu_uuid;
} nvidia_p2p_page_table_t;
Array of I/O mapped
physical memory address
to the GPU device memory
CUresult cuMemAlloc(CUdeviceptr *daddr,
size_t bytesize);
Open Source Summit Japan 2017 - An Intelligent Storage15
The PG-Strom Project
NVMe-Strom Software Stack
Open Source Summit Japan 2017 - An Intelligent Storage16
pg-strom
NVMe-Strom
VFS
Page Cache
NVMe SSD
Driver
nvidia
driver
GPU
device
memory
GPU
device
memory
PostgreSQL
file offset
DMA
request
block number
SSD-to-GPU Peer-to-Peer DMA
cuMemAlloc()
/proc/nvme-strom
ioctl(2)
read(2)
User
Space
Kernel
Space
The PG-Strom Project
Benchmark (1/2) – Star Schema Benchmark
Open Source Summit Japan 2017 - An Intelligent Storage17
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Q1-1 Q1-2 Q1-3 Q2-1 Q2-2 Q2-3 Q3-1 Q3-2 Q3-3 Q3-4 Q4-1 Q4-2 Q4-3
QueryExecutionThroughput[MB/s]
PostgreSQL SSDx1 PostgreSQL SSDx2 PG-Strom SSDx1 PG-Strom SSDx2
 SQL with WHERE-clause, JOIN, Aggregation/GROUP BY towards 353GB database.
 Usually, entire workloads are dominated by I/O.
 PostgreSQL achieves up to 1.6GB/s with regular filesystem based access.
 PG-Strom with SSD-to-GPU direct SQL execution achieved up to 3.8GB/s at the peak performance.
※ (Query Execution Throughput) = (Database Size; 353GB) / (Entire Query Response Time [sec])
Theoretical limitation
by 2x SSD [4.4GB/s]
Under
Investigation
Theoretical limitation
by 1x SSD [4.4GB/s]
The PG-Strom Project
Benchmark (2/2) – Star Schema Benchmark
Open Source Summit Japan 2017 - An Intelligent Storage18
 Example of Query
SELECT sum(lo_revenue), d_year, p_brand1
FROM lineorder, date1, part, supplier
WHERE lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_category = 'MFGR#12‘
AND s_region = 'AMERICA‘
GROUP BY d_year, p_brand1
ORDER BY d_year, p_brand1;
customer
12M Rows
(1.6GB)
date1
2500 Rows
(400KB)
part
1.8M Rows
(206MB)
supplier
4M Rows
(528MB)
lineorder
2.4B Rows
(351GB)
Typical queries for summary; which
consists of WHERE-clause, JOIN and
GROUP BY.
Size of lineorder table is larger than
physical RAM size, majority of the
workload is dominated by I/O.
The PG-Strom Project
Hardware configuration on the benchmark
19
▌Hardware/Software configuration
 Model: DELL PowerEdge R730
 CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2
 RAM: 128GB
 HDD: SAS 300GB x8 (RAID5)
 OS: CentOS7.2 (3.10.0-327.el7)
 SW: CUDA 7.5.17
NVIDIA driver 352.99
PostgreSQL 9.6.1
PG-Strom v2.0devel
▌GPU Specifications
 NVIDIA Tesla K80
 2x 2496 CUDA cores (560MHz)
 2x 12GB GDDR5 RAM (240GB/s)
▌SSD Specifications
 Intel SSD 750 (400GB) x2
 Interface: PCIe 3.0 x4 (NVMe 1.2)
 Max SeqRead: 2.2GB/s
Open Source Summit Japan 2017 - An Intelligent Storage
CPU2CPU1
SSD
x2
Tesla
K80
GPU
Tesla
K20
GPU
(unused)
The PG-Strom Project
(OT) Consistency with disk cache
20
 Problem ① Storage block may not be up-to-date, if target blocks are kept on page caches.
② Short size DMA is not efficient.
 Solution – Write back cached PostgreSQL blocks (= 8KB/block) to userspace buffer once,
then kicks bulk transfer from RAM to GPU by CUDA APIs.
 Almost equivalent cost with read(2) + cuMemcpyHtoDAsync()
 Although PostgreSQL blocks are reordered on GPU memory, it generates correct results.
Open Source Summit Japan 2017 - An Intelligent Storage
BLK-100: uncached
BLK-101: cached
BLK-102: uncached
BLK-103: uncached
BLK-104: cached
BLK-105: cached
BLK-106: uncached
BLK-107: uncached
BLK-108: cached
BLK-109: uncached
BLCKSZ
(=8KB)
Transfer Size
Per Request
BLCKSZ *
NChunks
BLK-108: cached
BLK-105: cached
BLK-104: cached
BLK-101: cached
BLK-100: uncached
BLK-102: uncached
BLK-103: uncached
BLK-106: uncached
BLK-107: uncached
BLK-109: uncached
BLK-108: cached
BLK-105: cached
BLK-104: cached
BLK-101: cached
unused SSD-to-GPU
P2P DMA
File Userspace DMA
Buffer (RAM)
Device Memory
(GPU)
CUDA API
(userspace)
cuMemcpyHtoDAsync
The PG-Strom Project
Not only Scan, but GpuJoin also...
Open Source Summit Japan 2017 - An Intelligent Storage21
Inner
relation
Outer
relation
Inner
relation
Outer
relation
Hash table Hash table
Next Step Next Step
Just reference
to the results
on CPU side
Hash Table
Search by CPU
Generation of
results by CPU
Generation of
results by GPU
Parallel
Hash Table
Search by GPU
HashJoin by CPU GpuHashJoin
Here we have
SSD-to-GPU
P2P DMA
The PG-Strom Project
GpuPreAgg
Not only Scan, but GpuPreAgg also...
Open Source Summit Japan 2017 - An Intelligent Storage22
Aggregation by CPU Aggregation with GPU
Relation Relation
Aggregation
Aggregation
1st Step:
Parallel Hash
Reduction
2nd Step:
Atomic Merge
Read of
1M rows
Process
1M rows
by CPU
Smaller
# of rows
GpuPreAgg:
Reduction of rows
to be processed by CPU
Interim
results
Here we have SSD-to-GPU P2P DMA.
The PG-Strom Project
The Future
Open Source Summit Japan 2017 - An Intelligent Storage23
The PG-Strom Project
Single Node Solution
2x Enterprise Grade SSD
Based on PCIe x8
NVIDIA
Tesla P40
CPU
RAM
DMI
SSD-1
SSD-2
GPU
max 10GB/s capability of query processing / node
PCI-E
Hardware Block Diagram
Open Source Summit Japan 2017 - An Intelligent Storage24
The PG-Strom Project
Multi Node Solution (?)
NVIDIA
Tesla P40
Joint solution with PostgreSQL’s scale-out technologies
2x Enterprise Grade SSD
Based on PCIe x8
Open Source Summit Japan 2017 - An Intelligent Storage25
The PG-Strom Project
Scale Up Solution (?)
5x NVIDIA
Tesla P40
DMI
GPU-1
PCI-SW
PCI-SW
QPIx2
GPU-2
GPU-4
GPU-5
SSD-1
SSD-2
SSD-3
SSD-4
SSD-5
GPU-3
RAM
RAM
CPU 1
CPU 2
PCIe x16
x2 slot
PCIe
96lane
PCIe
96lane
Deployment on GPU monster
5x Enterprise Grade SSD
Based on PCIe x16
Open Source Summit Japan 2017 - An Intelligent Storage26
The PG-Strom Project
Towards columnar structure (1/3) – On-memory Columnar Cache
27
▌On-memory Columnar Cache
 Caches PostgreSQL’s data-blocks (row-data) on memory as column-data
 Reduction of data transfer between RAM<->GPU
 Optimization of memory bandwidth of GPU
 Switch GPU kernel to be invoked according to existence of column-cache
Open Source Summit Japan 2017 - An Intelligent Storage
BlockNum = 101
all_visible=false
BlockNum = 102
all_visible=true
BlockNum = 103
all_visible=true
BlockNum = 104
all_visible=true
Columnar Cache
(BlockNum=102)
Columnar Cache
(BlockNum=102)
background
worker
Construction
UPDATE
invalidation
PG-Strom
Row-data
GPU Kernel
for Row
Column-data
GPU Kernel
for Column
Row-data
GPU Kernel
for Row
Column-data
GPU Kernel
for Column
The PG-Strom Project
Towards columnar structure (2/3) – Background
28
▌Row data format – Random memory access
 Less usage ratio of memory bus, larger number of memory transactions
Open Source Summit Japan 2017 - An Intelligent Storage
▌Column data format – Coalesced memory access
 Optimal memory bus usage, least number of memory transactions
32bit
Memory transaction width: 256bit
32bit 32bit32bit 32bit 32bit
32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit
Memory transaction width: 256bit
32bit x8 = 256bit is valid
in 256bit memory transaction.
(usage ratio: 100.0%)
Only 32bit x1 = 32bit is valid
in 256bit memory transaction .
(usage ratio: 12.5%)
GPU Cores
GPU Cores
Why columnar structure is preferable for GPU architecture
The PG-Strom Project
World of OLTP World of OLAP
Towards columnar structure (3/3) – FPGA engine on SSD
Open Source Summit Japan 2017 - An Intelligent Storage29
FPGA Logic on SSD enables write by row, then read as column
RowColumn
Transformer
column-format
Read of
analytic data
(column-format)
Write of
transactional
data
(row-format)
Data-format
transformation by
FPGA on SSD
SQL Execution
– based on columnar
format; best for GPU
Pre-processed
data
Only required
columns are
fetched
The PG-Strom Project
Towards the “true” intelligent storage system
Open Source Summit Japan 2017 - An Intelligent Storage30
Large PostgreSQL Tables
PCIe Bus
NVMe SSD GPUSSD-to-GPU P2P DMA
(NVMe-Strom driver)
WHERE
JOIN
GROUP BYPostgreSQL
Data Block
Reduction of unnecessary
rows according to the SQL
 Smaller amount of I/O
Step-1: A pair of NVMe-SSD and GPU looks like an intelligent storage which understand
SQL workloads for filtering-out unnecessary rows and pre-process of JOIN/GROUP BY
prior to data loading onto the main memory.
Step-2: A special NVMe-SSD
transforms data format for more
suitable one to process, based on
knowledge of the requester.
(PostgreSQL’s block format in this
case)
Any Questions?

More Related Content

What's hot

SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)Kohei KaiGai
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)Kohei KaiGai
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGISKohei KaiGai
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwJan Holčapek
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdwKohei KaiGai
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA UnconferenceKohei KaiGai
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...Equnix Business Solutions
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudEqunix Business Solutions
 

What's hot (20)

SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdw
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
 

Similar to 20170602_OSSummit_an_intelligent_storage

20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_ENKohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...Equnix Business Solutions
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5Steen Larsen
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
 
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client PresentationBladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client PresentationCliff Kinard
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multiKohei KaiGai
 

Similar to 20170602_OSSummit_an_intelligent_storage (20)

20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client PresentationBladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 

More from Kohei KaiGai

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_HistoryKohei KaiGai
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_APIKohei KaiGai
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrowKohei KaiGai
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_countKohei KaiGai
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0Kohei KaiGai
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCacheKohei KaiGai
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGISKohei KaiGai
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_OnlineKohei KaiGai
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdwKohei KaiGai
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_FdwKohei KaiGai
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_TokyoKohei KaiGai
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.JapanKohei KaiGai
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_BetaKohei KaiGai
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStromKohei KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStromKohei KaiGai
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdwKohei KaiGai
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_FdwKohei KaiGai
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LTKohei KaiGai
 
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigDataKohei KaiGai
 
20180920_DBTS_PGStrom_JP
20180920_DBTS_PGStrom_JP20180920_DBTS_PGStrom_JP
20180920_DBTS_PGStrom_JPKohei KaiGai
 

More from Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT
 
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
 
20180920_DBTS_PGStrom_JP
20180920_DBTS_PGStrom_JP20180920_DBTS_PGStrom_JP
20180920_DBTS_PGStrom_JP
 

Recently uploaded

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

20170602_OSSummit_an_intelligent_storage

  • 1. An Intelligent Storage ~SQL Execution on GPU Closely Connected with SSD~ PG-Strom Development Team KaiGai Kohei <kaigai@kaigai.gr.jp>
  • 2. The PG-Strom Project Who am I? ▌KaiGai Kohei  PG-Strom Development Team ▌Background  Linux kernel (SELinux, …)  PostgreSQL database  Enhancement of security features  Writable FDW, Remote Join  Custom Scan Interface  GPU and CUDA ▌PG-Strom  SQL Acceleration Technology using GPU  Solution for data analytics in-database 2 Open Source Summit Japan 2017 - An Intelligent Storage
  • 3. The PG-Strom Project PG-Strom PG-Strom (1/3) – GPU as accelerator of SQL 3 GPU CPU Model NVIDIA Tesla P40 Intel Xeon E5-2699v4 Code name Pascal Broadwell Release Q3-2016 Q1-2016 # of transistors 12billion 7.2billion # of cores 3840 (simple) 22 (functional) Core clocks 1.306GHz ~1.53GHz 2.20GHz ~3.60GHz Performance (FP32) 12 TFLOPS 1.2 TFLOPS (with AVX2) RAM Size 24GB (GDDR5) max1.5TB (DDR4) Memory Band 347GB/s 76.8GB/s Power consumption 250W 145W Manufacturing process 16nm 14nm Application GPU Off-load WHERE, JOIN, GROUP BY Auto generation of GPU binary from SQL PG-Strom: Extension module of PostgreSQL, to accelerate SQL workloads using multi-thousands cores and high bandwidth memory. Same SQL Statement Same Data Structure Open Source Summit Japan 2017 - An Intelligent Storage
  • 4. The PG-Strom Project PG-Strom (2/3) – Transparent GPU code generation on demand Open Source Summit Japan 2017 - An Intelligent Storage4 QUERY: SELECT cat, count(*), avg(x) FROM t0 WHERE x between y and y + 20.0 GROUP BY cat; : STATIC_FUNCTION(bool) gpupreagg_qual_eval(kern_context *kcxt, kern_data_store *kds, size_t kds_index) { pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1); pg_float8_t KVAR_3 = pg_float8_vref(kds,kcxt,2,kds_index); pg_float8_t KVAR_4 = pg_float8_vref(kds,kcxt,3,kds_index); return EVAL((pgfn_float8ge(kcxt, KVAR_3, KVAR_4) && pgfn_float8le(kcxt, KVAR_3, pgfn_float8pl(kcxt, KVAR_4, KPARAM_1)))); } : E.g) Generate CUDA C code from calculation formula in WHERE-clause Reference to input data SQL expression in CUDA source code Run-time Compiler (nvrtc) Just-in-time Compile Parallel Execution
  • 5. The PG-Strom Project PG-Strom (3/3) – An example of SQL acceleration 5 ▌Test Query: SELECT cat, count(*), avg(x) FROM t0 NATURAL JOIN t1 [NATURAL JOIN t2 ...] GROUP BY cat;  t0 contains 100M rows, t1...t8 contains 100K rows (like a start schema) Open Source Summit Japan 2017 - An Intelligent Storage 8.48 13.23 18.28 23.42 28.88 34.50 40.77 47.16 5.00 5.46 5.91 6.45 7.17 8.07 9.22 10.21 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 2 3 4 5 6 7 8 9 QueryResponseTime[sec] Number of tables joined PG-Strom microbenchmark with JOIN/GROUP BY PostgreSQL v9.6 PG-Strom 2.0devel CPU: Xeon E5-2640v4 GPU: Tesla P40 RAM: 128GB OS: CentOS 7.3 DB: PostgreSQL 9.6.2 + PG-Strom 2.0devel
  • 6. The PG-Strom Project In-database Analytics ▌Computing close to data  Cost to move data  Cost to transform / validate data  Waste of time for DBAs ▌Utilization of SQL’s flexibility  Pre-/Post-process of analytic algorithms  Conjunction with other data (JOIN)  Summarization of results (GROUP BY/Window Functions) Open Source Summit Japan 2017 - An Intelligent Storage6
  • 7. The PG-Strom Project What we want to develop (1/2) Open Source Summit Japan 2017 - An Intelligent Storage7 Data Collection Summarize Pre-process Data Generation Analytics & ML Visualization Data Size: Large  Small Amount of computing: Small  Large SSD-to-GPU Direct SQL Exec Transparent GPU code generation & Asynchronous Exec Statistical Analytics & Machine Learning Libraries Advanced algorithms as a part of SQL statement RowColum Hardware Transform BI Tools Execution of entire lifecycle for data analytics in-database DNN k-means SVM Random Forest PL/CUDA NVMe-SSD GPU
  • 8. The PG-Strom Project What we want to develop (2/2) – elemental technology Open Source Summit Japan 2017 - An Intelligent Storage8 Data Collection Summarize Pre-process Data Generation Analytics & ML Visualization Data Size: Large  Small Amount of computing: Small  Large SSD-to-GPU Direct SQL Exec Transparent GPU code generation & Asynchronous Exec Statistical Analytics & Machine Learning Libraries Advanced algorithms as a part of SQL statement RowColum Hardware Transform BI Tools Execution of entire lifecycle for data analytics in-database DNN k-means SVM Random Forest PL/CUDA NVMe-SSD GPU SSD-to-GPU Direct SQL Execution (for I/O) PG-Strom (for SQL) PL/CUDA and Library (for computing)
  • 9. The PG-Strom Project I/O Acceleration with GPU? Open Source Summit Japan 2017 - An Intelligent Storage9
  • 10. The PG-Strom Project Re-definition of GPU’s role I/O workloads Computing workloads Open Source Summit Japan 2017 - An Intelligent Storage10
  • 11. The PG-Strom Project Brief Architecture of GPU GPU Device Memory (GDDR5 or HBM2) Host RAM Data Transfer with DMA over the PCIe Bus CPU PCIe Bus Also data transfer via DMA from: particular blocks to: destination physical address Storage GPU Device Open Source Summit Japan 2017 - An Intelligent Storage11
  • 12. The PG-Strom Project Technology Basis (1/2) – GPUDirect RDMA ▌Feature for peer-to-peer DMA between GPU and PCIe devices ▌Originally, designed for MPI over Infiniband ▌But available for any kind of PCIe devices, if custom kernel module Copyright (c) NVIDIA corporation, 2015 Open Source Summit Japan 2017 - An Intelligent Storage12
  • 13. The PG-Strom Project Technology Basis (2/2) – GPUDirect RDMA Physical Address Space PCIe BAR1 AreaGPU RAM Host RAM NVMe-SSD Infiniband HBA PCIe devices GPUDirect RDMA A feature to map GPU’s device memory on physical address space of the host system Once GPU device memory gets physical address of the host system, driver can use the address as source/destination of DMA operations. 13 Open Source Summit Japan 2017 - An Intelligent Storage
  • 14. The PG-Strom Project Idea of SSD-to-GPU Direct SQL Execution Open Source Summit Japan 2017 - An Intelligent Storage14 Large PostgreSQL Tables PCIe Bus NVMe SSD GPUSSD-to-GPU P2P DMA (NVMe-Strom driver) WHERE JOIN GROUP BYPostgreSQL Data Block Reduction of unnecessary rows according to the SQL  Smaller amount of I/O Existing Data Flow (Large I/O Size) SSD-to-GPU Direct SQL Execution – Load PostgreSQL’s data block to GPU directly, then reduce unnecessary rows with multi-thousands cores of GPUs.
  • 15. The PG-Strom Project As destination of SSD to GPU DMA commands!! Linux kernel APIs of GPUDirect RDMA int nvidia_p2p_get_pages(uint64_t p2p_token, /* deprecated */ uint32_t va_space, /* deprecated */ uint64_t virtual_address, uint64_t length, struct nvidia_p2p_page_table **page_table, void (*free_callback)(void *private_data), void *private_data); struct nvidia_p2p_page_table { uint32_t version; uint32_t page_size; struct nvidia_p2p_page **pages; uint32_t entries; uint8_t *gpu_uuid; } nvidia_p2p_page_table_t; Array of I/O mapped physical memory address to the GPU device memory CUresult cuMemAlloc(CUdeviceptr *daddr, size_t bytesize); Open Source Summit Japan 2017 - An Intelligent Storage15
  • 16. The PG-Strom Project NVMe-Strom Software Stack Open Source Summit Japan 2017 - An Intelligent Storage16 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver GPU device memory GPU device memory PostgreSQL file offset DMA request block number SSD-to-GPU Peer-to-Peer DMA cuMemAlloc() /proc/nvme-strom ioctl(2) read(2) User Space Kernel Space
  • 17. The PG-Strom Project Benchmark (1/2) – Star Schema Benchmark Open Source Summit Japan 2017 - An Intelligent Storage17 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Q1-1 Q1-2 Q1-3 Q2-1 Q2-2 Q2-3 Q3-1 Q3-2 Q3-3 Q3-4 Q4-1 Q4-2 Q4-3 QueryExecutionThroughput[MB/s] PostgreSQL SSDx1 PostgreSQL SSDx2 PG-Strom SSDx1 PG-Strom SSDx2  SQL with WHERE-clause, JOIN, Aggregation/GROUP BY towards 353GB database.  Usually, entire workloads are dominated by I/O.  PostgreSQL achieves up to 1.6GB/s with regular filesystem based access.  PG-Strom with SSD-to-GPU direct SQL execution achieved up to 3.8GB/s at the peak performance. ※ (Query Execution Throughput) = (Database Size; 353GB) / (Entire Query Response Time [sec]) Theoretical limitation by 2x SSD [4.4GB/s] Under Investigation Theoretical limitation by 1x SSD [4.4GB/s]
  • 18. The PG-Strom Project Benchmark (2/2) – Star Schema Benchmark Open Source Summit Japan 2017 - An Intelligent Storage18  Example of Query SELECT sum(lo_revenue), d_year, p_brand1 FROM lineorder, date1, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_category = 'MFGR#12‘ AND s_region = 'AMERICA‘ GROUP BY d_year, p_brand1 ORDER BY d_year, p_brand1; customer 12M Rows (1.6GB) date1 2500 Rows (400KB) part 1.8M Rows (206MB) supplier 4M Rows (528MB) lineorder 2.4B Rows (351GB) Typical queries for summary; which consists of WHERE-clause, JOIN and GROUP BY. Size of lineorder table is larger than physical RAM size, majority of the workload is dominated by I/O.
  • 19. The PG-Strom Project Hardware configuration on the benchmark 19 ▌Hardware/Software configuration  Model: DELL PowerEdge R730  CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2  RAM: 128GB  HDD: SAS 300GB x8 (RAID5)  OS: CentOS7.2 (3.10.0-327.el7)  SW: CUDA 7.5.17 NVIDIA driver 352.99 PostgreSQL 9.6.1 PG-Strom v2.0devel ▌GPU Specifications  NVIDIA Tesla K80  2x 2496 CUDA cores (560MHz)  2x 12GB GDDR5 RAM (240GB/s) ▌SSD Specifications  Intel SSD 750 (400GB) x2  Interface: PCIe 3.0 x4 (NVMe 1.2)  Max SeqRead: 2.2GB/s Open Source Summit Japan 2017 - An Intelligent Storage CPU2CPU1 SSD x2 Tesla K80 GPU Tesla K20 GPU (unused)
  • 20. The PG-Strom Project (OT) Consistency with disk cache 20  Problem ① Storage block may not be up-to-date, if target blocks are kept on page caches. ② Short size DMA is not efficient.  Solution – Write back cached PostgreSQL blocks (= 8KB/block) to userspace buffer once, then kicks bulk transfer from RAM to GPU by CUDA APIs.  Almost equivalent cost with read(2) + cuMemcpyHtoDAsync()  Although PostgreSQL blocks are reordered on GPU memory, it generates correct results. Open Source Summit Japan 2017 - An Intelligent Storage BLK-100: uncached BLK-101: cached BLK-102: uncached BLK-103: uncached BLK-104: cached BLK-105: cached BLK-106: uncached BLK-107: uncached BLK-108: cached BLK-109: uncached BLCKSZ (=8KB) Transfer Size Per Request BLCKSZ * NChunks BLK-108: cached BLK-105: cached BLK-104: cached BLK-101: cached BLK-100: uncached BLK-102: uncached BLK-103: uncached BLK-106: uncached BLK-107: uncached BLK-109: uncached BLK-108: cached BLK-105: cached BLK-104: cached BLK-101: cached unused SSD-to-GPU P2P DMA File Userspace DMA Buffer (RAM) Device Memory (GPU) CUDA API (userspace) cuMemcpyHtoDAsync
  • 21. The PG-Strom Project Not only Scan, but GpuJoin also... Open Source Summit Japan 2017 - An Intelligent Storage21 Inner relation Outer relation Inner relation Outer relation Hash table Hash table Next Step Next Step Just reference to the results on CPU side Hash Table Search by CPU Generation of results by CPU Generation of results by GPU Parallel Hash Table Search by GPU HashJoin by CPU GpuHashJoin Here we have SSD-to-GPU P2P DMA
  • 22. The PG-Strom Project GpuPreAgg Not only Scan, but GpuPreAgg also... Open Source Summit Japan 2017 - An Intelligent Storage22 Aggregation by CPU Aggregation with GPU Relation Relation Aggregation Aggregation 1st Step: Parallel Hash Reduction 2nd Step: Atomic Merge Read of 1M rows Process 1M rows by CPU Smaller # of rows GpuPreAgg: Reduction of rows to be processed by CPU Interim results Here we have SSD-to-GPU P2P DMA.
  • 23. The PG-Strom Project The Future Open Source Summit Japan 2017 - An Intelligent Storage23
  • 24. The PG-Strom Project Single Node Solution 2x Enterprise Grade SSD Based on PCIe x8 NVIDIA Tesla P40 CPU RAM DMI SSD-1 SSD-2 GPU max 10GB/s capability of query processing / node PCI-E Hardware Block Diagram Open Source Summit Japan 2017 - An Intelligent Storage24
  • 25. The PG-Strom Project Multi Node Solution (?) NVIDIA Tesla P40 Joint solution with PostgreSQL’s scale-out technologies 2x Enterprise Grade SSD Based on PCIe x8 Open Source Summit Japan 2017 - An Intelligent Storage25
  • 26. The PG-Strom Project Scale Up Solution (?) 5x NVIDIA Tesla P40 DMI GPU-1 PCI-SW PCI-SW QPIx2 GPU-2 GPU-4 GPU-5 SSD-1 SSD-2 SSD-3 SSD-4 SSD-5 GPU-3 RAM RAM CPU 1 CPU 2 PCIe x16 x2 slot PCIe 96lane PCIe 96lane Deployment on GPU monster 5x Enterprise Grade SSD Based on PCIe x16 Open Source Summit Japan 2017 - An Intelligent Storage26
  • 27. The PG-Strom Project Towards columnar structure (1/3) – On-memory Columnar Cache 27 ▌On-memory Columnar Cache  Caches PostgreSQL’s data-blocks (row-data) on memory as column-data  Reduction of data transfer between RAM<->GPU  Optimization of memory bandwidth of GPU  Switch GPU kernel to be invoked according to existence of column-cache Open Source Summit Japan 2017 - An Intelligent Storage BlockNum = 101 all_visible=false BlockNum = 102 all_visible=true BlockNum = 103 all_visible=true BlockNum = 104 all_visible=true Columnar Cache (BlockNum=102) Columnar Cache (BlockNum=102) background worker Construction UPDATE invalidation PG-Strom Row-data GPU Kernel for Row Column-data GPU Kernel for Column Row-data GPU Kernel for Row Column-data GPU Kernel for Column
  • 28. The PG-Strom Project Towards columnar structure (2/3) – Background 28 ▌Row data format – Random memory access  Less usage ratio of memory bus, larger number of memory transactions Open Source Summit Japan 2017 - An Intelligent Storage ▌Column data format – Coalesced memory access  Optimal memory bus usage, least number of memory transactions 32bit Memory transaction width: 256bit 32bit 32bit32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit Memory transaction width: 256bit 32bit x8 = 256bit is valid in 256bit memory transaction. (usage ratio: 100.0%) Only 32bit x1 = 32bit is valid in 256bit memory transaction . (usage ratio: 12.5%) GPU Cores GPU Cores Why columnar structure is preferable for GPU architecture
  • 29. The PG-Strom Project World of OLTP World of OLAP Towards columnar structure (3/3) – FPGA engine on SSD Open Source Summit Japan 2017 - An Intelligent Storage29 FPGA Logic on SSD enables write by row, then read as column RowColumn Transformer column-format Read of analytic data (column-format) Write of transactional data (row-format) Data-format transformation by FPGA on SSD SQL Execution – based on columnar format; best for GPU Pre-processed data Only required columns are fetched
  • 30. The PG-Strom Project Towards the “true” intelligent storage system Open Source Summit Japan 2017 - An Intelligent Storage30 Large PostgreSQL Tables PCIe Bus NVMe SSD GPUSSD-to-GPU P2P DMA (NVMe-Strom driver) WHERE JOIN GROUP BYPostgreSQL Data Block Reduction of unnecessary rows according to the SQL  Smaller amount of I/O Step-1: A pair of NVMe-SSD and GPU looks like an intelligent storage which understand SQL workloads for filtering-out unnecessary rows and pre-process of JOIN/GROUP BY prior to data loading onto the main memory. Step-2: A special NVMe-SSD transforms data format for more suitable one to process, based on knowledge of the requester. (PostgreSQL’s block format in this case)