COPY INTO <table>

This command loads data into Databend from files in a variety of locations.

Supported File Locations

Your data files must be located in one of these locations for COPY INTO to work:

Named internal stage: Databend internal named stages. Files can be staged using the PUT to Stage API.
Named external stage: Stages created in Supported Object Storage Solutions.
External location:
- Buckets created in Supported Object Storage Solutions.
- Remote servers from where you can access the files by their URL (starting with "https://...").
- IPFS.

Syntax

COPY INTO [<database>.]<table_name>
FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] ) ]
[ copyOptions ]

Where:

internalStage

internalStage ::= @<internal_stage_name>[/<path>]

externalStage

externalStage ::= @<external_stage_name>[/<path>]

externalLocation

AWS S3 Compatible Object Storage Service

externalLocation ::=
  's3://<bucket>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-ID>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>'
        SESSION_TOKEN = '<your-session-token>'
        REGION = '<region-name>'
        ENABLE_VIRTUAL_HOST_STYLE = 'true|false'
  )

Parameter	Description	Required
`s3://<bucket>[<path>]`	External files located at the AWS S3 compatible object storage.	Required
ENDPOINT_URL	The bucket endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Required
ACCESS_KEY_ID	Your access key ID for connecting the AWS S3 compatible object storage. If not provided, Databend will access the bucket anonymously.	Optional
SECRET_ACCESS_KEY	Your secret access key for connecting the AWS S3 compatible object storage.	Optional
SESSION_TOKEN	Your temporary credential for connecting the AWS S3 service	Optional
REGION	AWS region name. For example, us-east-1.	Optional
ENABLE_VIRTUAL_HOST_STYLE	If you use virtual hosting to address the bucket, set it to "true".	Optional

Azure Blob storage

externalLocation ::=
  'azblob://<container>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCOUT_NAME = '<your-account-name>'
        ACCOUNT_KEY = '<your-account-key>'
  )

Parameter	Description	Required
`azblob://<container>[<path>]`	External files located at the Azure Blob storage.	Required
ENDPOINT_URL	The container endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Required
ACCOUNT_NAME	Your account name for connecting the Azure Blob storage. If not provided, Databend will access the container anonymously.	Optional
ACCOUNT_KEY	Your account key for connecting the Azure Blob storage.	Optional

Google Cloud Storage

externalLocation ::=
  'gcs://<container>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        CREDENTIAL = '<your-credential>'
  )

Parameter	Description	Required
`gcs://<bucket>[<path>]`	External files located at the Google Cloud Storage	Required
ENDPOINT_URL	The container endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Optional
CREDENTIAL	Your credential for connecting the GCS. If not provided, Databend will access the container anonymously.	Optional

Huawei Object Storage

externalLocation ::=
  'obs://<container>[<path>]'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-id>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>'
  )

Parameter	Description	Required
`obs://<bucket>[<path>]`	External files located at the obs	Required
ENDPOINT_URL	The container endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`.	Required
ACCESS_KEY_ID	Your access key ID for connecting the OBS. If not provided, Databend will access the bucket anonymously.	Optional
SECRET_ACCESS_KEY	Your secret access key for connecting the OBS.	Optional

Remote Files

externalLocation ::=
  'https://<url>'

You can use glob patterns to specify moran than one file. For example, use

ontime_200{6,7,8}.csv to represents ontime_2006.csv,ontime_2007.csv,ontime_2008.csv.
ontime_200[6-8].csv to represents ontime_2006.csv,ontime_2007.csv,ontime_2008.csv.

IPFS

externalLocation ::=
  'ipfs://<your-ipfs-hash>'
  CONNECTION = (ENDPOINT_URL = 'https://<your-ipfs-gateway>')

FILES = ( 'file_name' [ , 'file_name' ... ] )

Specifies a list of one or more files names (separated by commas) to be loaded.

PATTERN = 'regex_pattern'

A PCRE2-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. Click here to see an example. For PCRE2 syntax, see http://www.pcre.org/current/doc/html/pcre2syntax.html.

formatTypeOptions

formatTypeOptions ::=
  RECORD_DELIMITER = '<character>'
  FIELD_DELIMITER = '<character>'
  SKIP_HEADER = <integer>
  COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE

`RECORD_DELIMITER = '<character>'`

Description: One character that separate records in an input file.

Default: '\n'

`FIELD_DELIMITER = '<character>'`

Description: One character that separate fields in an input file.

Default: ',' (comma)

`SKIP_HEADER = '<integer>'`

Description: Number of lines at the start of the file to skip.

Default: 0

`COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE`

Description: String that represents the compression algorithm.

Default: NONE

Values:

Values	Notes
`AUTO`	Auto detect compression via file extensions
`GZIP`
`BZ2`
`BROTLI`	Must be specified if loading/unloading Brotli-compressed files.
`ZSTD`	Zstandard v0.8 (and higher) is supported.
`DEFLATE`	Deflate-compressed files (with zlib header, RFC1950).
`RAW_DEFLATE`	Deflate-compressed files (without any header, RFC1951).
`XZ`
`NONE`	Indicates that the files have not been compressed.

copyOptions

copyOptions ::=
  [ SIZE_LIMIT = <num> ]
  [ PURGE = <bool> ]
  [ FORCE = <bool> ]

Parameters	Description	Required
`SIZE_LIMIT = <num>`	Specifies the maximum rows of data to be loaded for a given COPY statement. Defaults to `0` meaning no limits.	Optional
`PURGE = <bool>`	If `True`, the command will purge the files in the stage after they are loaded successfully into the table. Default: `False`.	Optional
`FORCE = <bool>`	Defaults to `False` meaning the command will skip duplicate files in the stage when copying data. If `True`, duplicate files will not be skipped.	Optional

Examples

Loading Data from an Internal Stage

COPY INTO mytable FROM @my_internal_s1 pattern = 'books.*parquet' file_format = (type = 'PARQUET');

Loading Data from an External Stage

COPY INTO mytable FROM @my_external_s1 pattern = 'books.*parquet' file_format = (type = 'PARQUET');

Loading Data from External Locations

AWS S3 compatible object storage services

This example reads 10 rows from a CSV file and inserts them into a table:

COPY INTO mytable
  FROM 's3://mybucket/data.csv'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-ID>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>')
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1) size_limit=10;

This example reads 10 rows from a CSV file compressed as GZIP and inserts them into a table:

COPY INTO mytable
  FROM 's3://mybucket/data.csv.gz'
  CONNECTION = (
        ENDPOINT_URL = 'https://<endpoint-URL>'
        ACCESS_KEY_ID = '<your-access-key-ID>'
        SECRET_ACCESS_KEY = '<your-secret-access-key>')
  FILE_FORMAT = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1 compression = 'GZIP') size_limit=10;

This example loads data from a CSV file without specifying the endpoint URL:

COPY INTO mytable
  FROM 's3://mybucket/data.csv'
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1) size_limit=10;

This is an example loading data from a Parquet file:

COPY INTO mytable
  FROM 's3://mybucket/data.parquet'
  CONNECTION = (
      ACCESS_KEY_ID = '<your-access-key-ID>'
      SECRET_ACCESS_KEY = '<your-secret-access-key>')
  FILE_FORMAT = (type = 'PARQUET');

Azure Blob storage

This example reads data from a CSV file and inserts it into a table:

COPY INTO mytable
    FROM 'azblob://mybucket/data.csv'
    CONNECTION = (
        ENDPOINT_URL = 'https://<account_name>.blob.core.windows.net'
        ACCOUNT_NAME = '<account_name>'
        ACCOUNT_KEY = '<account_key>'
    )
    FILE_FORMAT = (type = 'CSV');

Remote Files

This example reads data from three remote CSV files and inserts it into a table:

COPY INTO mytable
    FROM 'https://repo.databend.rs/dataset/stateful/ontime_200{6,7,8}_200.csv'
    FILE_FORMAT = (type = 'CSV');

IPFS

This example reads data from a CSV file on IPFS and inserts it into a table:

COPY INTO mytable
    FROM 'ipfs://<your-ipfs-hash>' connection = (endpoint_url = 'https://<your-ipfs-gateway>')
    FILE_FORMAT = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1);

Loading Data with Pattern Matching

This example uses pattern matching to only load from CSV files containing sales in their names:

COPY INTO mytable
  FROM 's3://mybucket/'
  PATTERN = '.*sales.*[.]csv'
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1);

Where .* is interpreted as zero or more occurrences of any character. The square brackets escape the period character (.) that precedes a file extension.

If you want to load from all the CSV files, use PATTERN = '.*[.]csv':

COPY INTO mytable
  FROM 's3://mybucket/'
  PATTERN = '.*[.]csv'
  FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1);

Tutorials

Here are some tutorials to help you get started with COPY INTO:

Tutorial: Load from an internal stage: In this tutorial, you will create an internal stage, stage a sample file, and then load data from the file into Databend with the COPY INTO command.
Tutorial: Load from an Amazon S3 bucket: In this tutorial, you will upload a sample file to your Amazon S3 bucket, and then load data from the file into Databend with the COPY INTO command.
Tutorial: Load from a remote file: In this tutorial, you will load data from a remote sample file into Databend with the COPY INTO command.

Supported File Locations​

Syntax​

internalStage​

externalStage​

externalLocation​

FILES = ( 'file_name' [ , 'file_name' ... ] )​

PATTERN = 'regex_pattern'​

formatTypeOptions​

RECORD_DELIMITER = '<character>'​

FIELD_DELIMITER = '<character>'​

SKIP_HEADER = '<integer>'​

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE​

copyOptions​

Examples​

Loading Data from an Internal Stage​

Loading Data from an External Stage​

Loading Data from External Locations​

Loading Data with Pattern Matching​

Tutorials​