摘要:什么是簡而言之,就是,不存在就插入,存在就更新。當(dāng)然也有功能,和的類似。然而,命令不支持,這使一些增量的工作非常不方便。五把表清空上面過程確實很麻煩,如果使用的話,只需要簡單的相關(guān)鏈接
什么是 Upsert
"UPSERT" is a DBMS feature that allows a DML statement"s author to atomically either insert a row, or on the basis of the row already existing, UPDATE that existing row instead, while safely giving little to no further thought to concurrency. One of those two outcomes must be guaranteed, regardless of concurrent activity, which has been called "the essential property of UPSERT".
簡而言之,就是,不存在就插入,存在就更新。
單記錄 UpsertMySQL有INSERT...ON DUPLICATE KEY UPDATE語法,可以實現(xiàn)Upsert:
INSERT INTO customers (id, first_name, last_name, email) VALUES (30797, "hooopo1", "wang", "[email protected]") ON DUPLICATE KEY UPDATE first_name = VALUES(first_name), last_name = VALUES(last_name);
PostgreSQL 從 9.5 也有了INSERT ... ON CONFLICT UPDATE語法,效果和 MySQL 類似:
INSERT INTO customers (id, first_name, last_name, email) VALUES (30797, "hooopo1", "wang", "[email protected]") ON CONFLICT(id) DO UPDATE SET first_name = EXCLUDED.first_name, last_name = EXCLUDED.last_name;批量 Upsert
之前研究 MySQL 里如何插入最快 ,里面提到 LOAD INFILE 方式批量插入,并且 MySQL 的 bulk insert 是支持 REPLACE 語意的,即批量插入的同時還可以 upsert。
LOAD DATA LOCAL INFILE "/Users/hooopo/data/out/product_sales_facts.txt" REPLACE INTO TABLE product_sale_facts FIELDS TERMINATED BY "," (`id`,`date_id`,`order_id`,`product_id`,`address_id`,`unit_price`,`purchase_price`,`gross_profit`,`quantity`,`channel_id`,`gift`)
當(dāng)然 PostgreSQL 也有 Copy功能,和 MySQL 的 LOAD INFILE 類似。然而,copy 命令不支持 Upsert,這使一些增量 ETL 的工作非常不方便。
不過有一種利用 staging 表的方式實現(xiàn) bulk upsert,大致步驟如下:
一. 目標(biāo)表
二. 把增量數(shù)據(jù)批量插入中間表
CREATE TABLE IF NOT EXISTS staging LIKE customers INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES; COPY staging (id, email, first_name, last_name) FROM STDIN WITH DELIMITER "," NULL "N" CSV;
三. 把目標(biāo)表中與 staging 表沖突部分刪掉
DELETE FROM customers USING staging WHERE customers.id = staging.id
四. 把 staging 表批量插入到目標(biāo)表,因為沖突部分已經(jīng)刪掉,所以這步不會有任何沖突。
INSERT INTO customers (SELECT * FROM staging);
五. 把 staging 表清空
TRUNCATE TABLE staging;
上面過程確實很麻煩,如果使用 kiba-plus 的話,只需要簡單的 DSL:
destination Kiba::Plus::Destination::PgBulk2, { :connect_url => DEST_URL, :table_name => "customers", :truncate => false, :columns => [:id, :email, :first_name, :last_name], :incremental => true, :unique_by => :id }
相關(guān)鏈接:
https://wiki.postgresql.org/w...
http://www.silota.com/blog/am...
https://hashrocket.com/blog/p...
http://docs.aws.amazon.com/re...
http://stackoverflow.com/ques...
文章版權(quán)歸作者所有,未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請注明本文地址:http://systransis.cn/yun/39019.html
摘要:用例測試表繼續(xù)用上述的創(chuàng)建測試文件,內(nèi)容如下創(chuàng)建文件,內(nèi)容如下根據(jù)文件加載數(shù)據(jù)檢查數(shù)據(jù)是否轉(zhuǎn)換成功初試主要是針對大數(shù)據(jù)量的且一般是批量裝載的方式。 Trafodion 的數(shù)據(jù)加載主要包括兩種方法,即 Trickle Load(持續(xù)加載) 和 Bulk Load(批量加載)。下表介紹了兩種加載方法的區(qū)別: 類型 描述 方法/工具 Trickle Load 數(shù)據(jù)量較小,立即插入 ...
閱讀 2774·2021-11-17 09:33
閱讀 3109·2021-10-25 09:44
閱讀 1216·2021-10-11 10:59
閱讀 2410·2021-09-27 13:34
閱讀 2918·2021-09-07 10:19
閱讀 2146·2019-08-29 18:46
閱讀 1541·2019-08-29 12:55
閱讀 935·2019-08-23 17:11