api – PyMongoArrow APIs

class pymongoarrow.api.Schema(schema)

A mapping of field names to data types.

To create a schema, provide its constructor a mapping of field names to their expected types, e.g.:

schema1 = Schema({'field_1': int, 'field_2': float})

Each key in schema is a field name and its corresponding value is the expected type of the data contained in the named field.

Data types can be specified as pyarrow type instances (e.g. an instance of pyarrow.int64), bson types (e.g. bson.Int64), or python type-identifiers (e.g. int, float). To see a complete list of supported data types and their corresponding type-identifiers, see Supported Types.

pymongoarrow.api.aggregate_arrow_all(collection, pipeline, *, schema, **kwargs)

Method that returns the results of an aggregation pipeline as a pyarrow.Table instance.

Parameters
  • collection: Instance of Collection. against which to run the aggregate operation.

  • pipeline: A list of aggregation pipeline stages.

  • schema: Instance of Schema.

Additional keyword-arguments passed to this method will be passed directly to the underlying aggregate operation.

Returns

An instance of class:pyarrow.Table.

pymongoarrow.api.aggregate_numpy_all(collection, pipeline, *, schema, **kwargs)

Method that returns the results of an aggregation pipeline as a dict instance whose keys are field names and values are ndarray instances bearing the appropriate dtype.

Parameters
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema: Instance of Schema.

Additional keyword-arguments passed to this method will be passed directly to the underlying aggregate operation.

This method attempts to create each NumPy array as a view on the Arrow data corresponding to each field in the result set. When this is not possible, the underlying data is copied into a new NumPy array. See pyarrow.Array.to_numpy() for more information.

NumPy arrays returned by this method that are views on Arrow data are not writable. Users seeking to modify such arrays must first create an editable copy using numpy.copy().

Returns

An instance of dict.

pymongoarrow.api.aggregate_pandas_all(collection, pipeline, *, schema, **kwargs)

Method that returns the results of an aggregation pipeline as a pandas.DataFrame instance.

Parameters
  • collection: Instance of Collection. against which to run the find operation.

  • pipeline: A list of aggregation pipeline stages.

  • schema: Instance of Schema.

Additional keyword-arguments passed to this method will be passed directly to the underlying aggregate operation.

Returns

An instance of class:pandas.DataFrame.

pymongoarrow.api.find_arrow_all(collection, query, *, schema, **kwargs)

Method that returns the results of a find query as a pyarrow.Table instance.

Parameters
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema: Instance of Schema.

Additional keyword-arguments passed to this method will be passed directly to the underlying find operation.

Returns

An instance of class:pyarrow.Table.

pymongoarrow.api.find_numpy_all(collection, query, *, schema, **kwargs)

Method that returns the results of a find query as a dict instance whose keys are field names and values are ndarray instances bearing the appropriate dtype.

Parameters
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema: Instance of Schema.

Additional keyword-arguments passed to this method will be passed directly to the underlying find operation.

This method attempts to create each NumPy array as a view on the Arrow data corresponding to each field in the result set. When this is not possible, the underlying data is copied into a new NumPy array. See pyarrow.Array.to_numpy() for more information.

NumPy arrays returned by this method that are views on Arrow data are not writable. Users seeking to modify such arrays must first create an editable copy using numpy.copy().

Returns

An instance of dict.

pymongoarrow.api.find_pandas_all(collection, query, *, schema, **kwargs)

Method that returns the results of a find query as a pandas.DataFrame instance.

Parameters
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema: Instance of Schema.

Additional keyword-arguments passed to this method will be passed directly to the underlying find operation.

Returns

An instance of class:pandas.DataFrame.

pymongoarrow.api.write(collection, tabular)

Write data from tabular into the given MongoDB collection.

Parameters
  • collection: Instance of Collection. against which to run the operation.

  • tabular: A tabular data store to use for the write operation.

Returns

An instance of result.ArrowWriteResult.