api – PyMongoArrow APIs¶
- class pymongoarrow.api.Schema(schema)¶
Create a
Schemainstance from a mapping or an iterable.- Parameters:
schema: A mapping.
- classmethod from_arrow(aschema)¶
Create a
Schemainstance from aSchema- Parameters:
aschema: PyArrow Schema
- Parameters:
aschema (Schema)
- to_arrow()¶
Output the Schema as an instance of class:~pyarrow.Schema.
- pymongoarrow.api.aggregate_arrow_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)¶
Method that returns the results of an aggregation pipeline as a
pyarrow.Tableinstance.- Parameters:
collection: Instance of
Collection. against which to run theaggregateoperation.pipeline: A list of aggregation pipeline stages.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.
Additional keyword-arguments passed to this method will be passed directly to the underlying
aggregateoperation.- Returns:
An instance of class:pyarrow.Table.
- pymongoarrow.api.aggregate_numpy_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)¶
Method that returns the results of an aggregation pipeline as a
dictinstance whose keys are field names and values arendarrayinstances bearing the appropriate dtype.- Parameters:
collection: Instance of
Collection. against which to run thefindoperation.query: A mapping containing the query to use for the find operation.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.
Additional keyword-arguments passed to this method will be passed directly to the underlying
aggregateoperation.This method attempts to create each NumPy array as a view on the Arrow data corresponding to each field in the result set. When this is not possible, the underlying data is copied into a new NumPy array. See
pyarrow.Array.to_numpy()for more information.NumPy arrays returned by this method that are views on Arrow data are not writable. Users seeking to modify such arrays must first create an editable copy using
numpy.copy().- Returns:
An instance of
dict.
- pymongoarrow.api.aggregate_pandas_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)¶
Method that returns the results of an aggregation pipeline as a
pandas.DataFrameinstance.- Parameters:
collection: Instance of
Collection. against which to run thefindoperation.pipeline: A list of aggregation pipeline stages.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.
Additional keyword-arguments passed to this method will be passed directly to the underlying
aggregateoperation.- Returns:
An instance of class:pandas.DataFrame.
- pymongoarrow.api.aggregate_polars_all(collection, pipeline, *, schema=None, allow_invalid=False, **kwargs)¶
Method that returns the results of an aggregation pipeline as a
polars.DataFrameinstance.- Parameters:
collection: Instance of
Collection. against which to run thefindoperation.pipeline: A list of aggregation pipeline stages.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.
Additional keyword-arguments passed to this method will be passed directly to the underlying
aggregateoperation.- Returns:
An instance of class:polars.DataFrame.
- pymongoarrow.api.find_arrow_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)¶
Method that returns the results of a find query as a
pyarrow.Tableinstance.- Parameters:
collection: Instance of
Collection. against which to run thefindoperation.query: A mapping containing the query to use for the find operation.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.parallelism (optional): Controls how batch processing is parallelized. Possible values are:
“off”: (default) Disable parallelism and use the single-process behavior.
“threads”: Always use a threaded implementation.
“processes”: Always use a multiprocess implementation.
- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
Additional keyword-arguments passed to this method will be passed directly to the underlying
findoperation.- Returns:
An instance of class:pyarrow.Table.
- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
- pymongoarrow.api.find_numpy_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)¶
Method that returns the results of a find query as a
dictinstance whose keys are field names and values arendarrayinstances bearing the appropriate dtype.- Parameters:
collection: Instance of
Collection. against which to run thefindoperation.query: A mapping containing the query to use for the find operation.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.- parallelism (optional): Controls how batch processing is parallelized.
- Possible values are:
“off”: (default) Disable parallelism and use the single-process behavior.
“threads”: Always use a threaded implementation.
“processes”: Always use a multiprocess implementation.
- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
Additional keyword-arguments passed to this method will be passed directly to the underlying
findoperation.This method attempts to create each NumPy array as a view on the Arrow data corresponding to each field in the result set. When this is not possible, the underlying data is copied into a new NumPy array. See
pyarrow.Array.to_numpy()for more information.NumPy arrays returned by this method that are views on Arrow data are not writable. Users seeking to modify such arrays must first create an editable copy using
numpy.copy().- Returns:
An instance of
dict.- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
- pymongoarrow.api.find_pandas_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)¶
Method that returns the results of a find query as a
pandas.DataFrameinstance.- Parameters:
collection: Instance of
Collection. against which to run thefindoperation.query: A mapping containing the query to use for the find operation.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.parallelism (optional): Controls how batch processing is parallelized. Possible values are:
“off”: (default) Disable parallelism and use the single-process behavior.
“threads”: Always use a threaded implementation.
“processes”: Always use a multiprocess implementation.
- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
Additional keyword-arguments passed to this method will be passed directly to the underlying
findoperation.- Returns:
An instance of class:pandas.DataFrame.
- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
- pymongoarrow.api.find_polars_all(collection, query, *, schema=None, allow_invalid=False, parallelism='off', **kwargs)¶
Method that returns the results of a find query as a
polars.DataFrameinstance.- Parameters:
collection: Instance of
Collection. against which to run thefindoperation.query: A mapping containing the query to use for the find operation.
schema (optional): Instance of
Schema. If the schema is not given, it will be inferred using the data in the result set.allow_invalid (optional): If set to
True, results will have all fields that do not conform to the schema silently converted to NaN.parallelism (optional): Controls how batch processing is parallelized. Possible values are:
“off”: (default) Disable parallelism and use the single-process behavior.
“threads”: Always use a threaded implementation.
“processes”: Always use a multiprocess implementation.
- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
Additional keyword-arguments passed to this method will be passed directly to the underlying
findoperation.- Returns:
An instance of class:polars.DataFrame.
- Parameters:
parallelism (Literal['threads', 'processes', 'off'])
Added in version 1.3.
- pymongoarrow.api.write(collection, tabular, *, exclude_none=False, auto_convert=True)¶
Write data from tabular into the given MongoDB collection.
- Parameters:
collection: Instance of
Collection. against which to run the operation.tabular: A tabular data store to use for the write operation.
exclude_none: Whether to skip writing null fields in documents.
auto_convert (optional): Whether to attempt a best-effort conversion of unsupported types.
- Returns:
An instance of
result.ArrowWriteResult.- Parameters:
collection (Collection)
exclude_none (bool)
auto_convert (bool)