api – PyMongoArrow APIs

class pymongoarrow.api.Schema(schema)

Create a Schema instance from a mapping or an iterable.

Parameters:
  • schema: A mapping.

classmethod from_arrow(aschema)

Create a Schema instance from a Schema

Parameters:
  • aschema: PyArrow Schema

Parameters:

aschema (Schema)

to_arrow()

Output the Schema as an instance of class:~pyarrow.Schema.

pymongoarrow.api.aggregate_arrow_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)

Method that returns the results of an aggregation pipeline as a pyarrow.Table instance.

Parameters:
  • collection: Instance of Collection. against which to run the aggregate operation.

  • pipeline: A list of aggregation pipeline stages.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

Additional keyword-arguments passed to this method will be passed directly to the underlying aggregate operation.

Returns:

An instance of class:pyarrow.Table.

pymongoarrow.api.aggregate_numpy_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)

Method that returns the results of an aggregation pipeline as a dict instance whose keys are field names and values are ndarray instances bearing the appropriate dtype.

Parameters:
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

Additional keyword-arguments passed to this method will be passed directly to the underlying aggregate operation.

This method attempts to create each NumPy array as a view on the Arrow data corresponding to each field in the result set. When this is not possible, the underlying data is copied into a new NumPy array. See pyarrow.Array.to_numpy() for more information.

NumPy arrays returned by this method that are views on Arrow data are not writable. Users seeking to modify such arrays must first create an editable copy using numpy.copy().

Returns:

An instance of dict.

pymongoarrow.api.aggregate_pandas_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)

Method that returns the results of an aggregation pipeline as a pandas.DataFrame instance.

Parameters:
  • collection: Instance of Collection. against which to run the find operation.

  • pipeline: A list of aggregation pipeline stages.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

Additional keyword-arguments passed to this method will be passed directly to the underlying aggregate operation.

Returns:

An instance of class:pandas.DataFrame.

pymongoarrow.api.aggregate_polars_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)

Method that returns the results of an aggregation pipeline as a polars.DataFrame instance.

Parameters:
  • collection: Instance of Collection. against which to run the find operation.

  • pipeline: A list of aggregation pipeline stages.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

Additional keyword-arguments passed to this method will be passed directly to the underlying aggregate operation.

Returns:

An instance of class:polars.DataFrame.

pymongoarrow.api.find_arrow_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)

Method that returns the results of a find query as a pyarrow.Table instance.

Parameters:
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

  • parallelism (optional): Controls how batch processing is parallelized. Possible values are:

    • “off”: (default) Disable parallelism and use the single-process behavior.

    • “threads”: Always use a threaded implementation.

    • “processes”: Always use a multiprocess implementation.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

Additional keyword-arguments passed to this method will be passed directly to the underlying find operation.

Returns:

An instance of class:pyarrow.Table.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

pymongoarrow.api.find_numpy_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)

Method that returns the results of a find query as a dict instance whose keys are field names and values are ndarray instances bearing the appropriate dtype.

Parameters:
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

  • parallelism (optional): Controls how batch processing is parallelized.
    Possible values are:
    • “off”: (default) Disable parallelism and use the single-process behavior.

    • “threads”: Always use a threaded implementation.

    • “processes”: Always use a multiprocess implementation.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

Additional keyword-arguments passed to this method will be passed directly to the underlying find operation.

This method attempts to create each NumPy array as a view on the Arrow data corresponding to each field in the result set. When this is not possible, the underlying data is copied into a new NumPy array. See pyarrow.Array.to_numpy() for more information.

NumPy arrays returned by this method that are views on Arrow data are not writable. Users seeking to modify such arrays must first create an editable copy using numpy.copy().

Returns:

An instance of dict.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

pymongoarrow.api.find_pandas_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)

Method that returns the results of a find query as a pandas.DataFrame instance.

Parameters:
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

  • parallelism (optional): Controls how batch processing is parallelized. Possible values are:

    • “off”: (default) Disable parallelism and use the single-process behavior.

    • “threads”: Always use a threaded implementation.

    • “processes”: Always use a multiprocess implementation.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

Additional keyword-arguments passed to this method will be passed directly to the underlying find operation.

Returns:

An instance of class:pandas.DataFrame.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

pymongoarrow.api.find_polars_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)

Method that returns the results of a find query as a polars.DataFrame instance.

Parameters:
  • collection: Instance of Collection. against which to run the find operation.

  • query: A mapping containing the query to use for the find operation.

  • schema (optional): Instance of Schema. If the schema is not given, it will be inferred using the data in the result set.

  • allow_invalid (optional): If set to True, results will have all fields that do not conform to the schema silently converted to NaN.

  • parallelism (optional): Controls how batch processing is parallelized. Possible values are:

    • “off”: (default) Disable parallelism and use the single-process behavior.

    • “threads”: Always use a threaded implementation.

    • “processes”: Always use a multiprocess implementation.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

Additional keyword-arguments passed to this method will be passed directly to the underlying find operation.

Returns:

An instance of class:polars.DataFrame.

Parameters:

parallelism (Literal['threads', 'processes', 'off'])

Added in version 1.3.

pymongoarrow.api.write(collection, tabular, *, exclude_none=False, auto_convert=True)

Write data from tabular into the given MongoDB collection.

Parameters:
  • collection: Instance of Collection. against which to run the operation.

  • tabular: A tabular data store to use for the write operation.

  • exclude_none: Whether to skip writing null fields in documents.

  • auto_convert (optional): Whether to attempt a best-effort conversion of unsupported types.

Returns:

An instance of result.ArrowWriteResult.

Parameters:
  • collection (Collection)

  • exclude_none (bool)

  • auto_convert (bool)