Supported Types

PyMongoArrow currently supports a small subset of all BSON types. Support for additional types will be added in subsequent releases.

Note

PyMongoArrow does not currently fully support extension types with Pandas/NumPy or Arrow. However, they can be used in schemas. This means that ObjectId and Decimal128 are not fully supported in Pandas DataFrames or Arrow Tables. Instead, the schema type will be converted to a string or object representation of the type. For more information see Extension Types.

Note

For more information about BSON types, see the BSON specification.

BSON Type

Type Identifiers

String

py.str, an instance of pyarrow.string

Embedded document

py.dict, and instance of pyarrow.struct

Embedded array

py.list, an instance of pyarrow.list_,

ObjectId

py.bytes, bson.ObjectId, an instance of pymongoarrow.types.ObjectIdType, an instance of pyarrow.FixedSizeBinaryScalar

Boolean

an instance of bool_, bool

64-bit binary floating point

py.float, an instance of pyarrow.float64()

32-bit integer

an instance of pyarrow.int32()

64-bit integer

int, bson.int64.Int64, an instance of pyarrow.int64()

UTC datetime

an instance of timestamp with ms resolution, py.datetime.datetime

Type identifiers can be used to specify that a field is of a certain type during pymongoarrow.api.Schema declaration. For example, if your data has fields ‘f1’ and ‘f2’ bearing types 32-bit integer and UTC datetime respectively, and ‘_id’ that is an ObjectId, your schema can be defined as:

schema = Schema({
  '_id': ObjectId,
  'f1': pyarrow.int32(),
  'f2': pyarrow.timestamp('ms')
})

Unsupported data types in a schema cause a ValueError identifying the field and its data type.