pandas_types – Pandas Extension Types

class pymongoarrow.pandas_types.PandasBSONDtype

The base class for BSON Pandas extension data types.

classmethod construct_array_type()

Return the array type associated with this dtype.

Returns

type

Return type:

type_t[ExtensionArray]

classmethod construct_from_string(string)

Construct this type from a string.

This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as period[h] (where H means hourly frequency).

By default, in the abstract class, just the name of the type is expected. But subclasses can overwrite this method to accept parameters.

Parameters

stringstr

The name of the type, for example category.

Returns

ExtensionDtype

Instance of the dtype.

Raises

TypeError

If a class cannot be constructed from this ‘string’.

Examples

For extension dtypes with arguments the following may be an adequate implementation.

>>> import re
>>> @classmethod
... def construct_from_string(cls, string):
...     pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
...     match = pattern.match(string)
...     if match:
...         return cls(**match.groupdict())
...     else:
...         raise TypeError(
...             f"Cannot construct a '{cls.__name__}' from '{string}'"
...         )
empty(shape)

Construct an ExtensionArray of this dtype with the given shape.

Analogous to numpy.empty.

Parameters

shape : int or tuple[int]

Returns

ExtensionArray

Parameters:

shape (Shape)

Return type:

ExtensionArray

index_class

The Index subclass to return from Index.__new__ when this dtype is encountered.

classmethod is_dtype(dtype)

Check if we match ‘dtype’.

Parameters

dtypeobject

The object to check.

Returns

bool

Notes

The default implementation is True if

  1. cls.construct_from_string(dtype) is an instance of cls.

  2. dtype is an object and is an instance of cls

  3. dtype has a dtype attribute, and any of the above conditions is true for dtype.dtype.

Parameters:

dtype (object)

Return type:

bool

property kind: str

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See Also

numpy.dtype.kind

property name: str

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

property names: list[str] | None

Ordered list of field names, or None if there are no fields.

This is for compatibility with NumPy arrays, and may be removed in the future.

property type: type_t[Any]

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class pymongoarrow.pandas_types.PandasBSONExtensionArray(values, dtype, copy=False)

The base class for Pandas BSON extension arrays.

argmax(skipna=True)

Return the index of maximum value.

In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmin : Return the index of the minimum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmax()
3
Parameters:

skipna (bool)

Return type:

int

argmin(skipna=True)

Return the index of minimum value.

In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmax : Return the index of the maximum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmin()
1
Parameters:

skipna (bool)

Return type:

int

argsort(*, ascending=True, kind='quicksort', na_position='last', **kwargs)

Return the indices that would sort this array.

Parameters

ascendingbool, default True

Whether the indices should result in an ascending or descending sort.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional

Sorting algorithm.

na_position{‘first’, ‘last’}, default ‘last’

If 'first', put NaN values at the beginning. If 'last', put NaN values at the end.

*args, **kwargs:

Passed through to numpy.argsort().

Returns

np.ndarray[np.intp]

Array of indices that sort self. If NaN values are contained, NaN values are placed at the end.

See Also

numpy.argsort : Sorting implementation used internally.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argsort()
array([1, 2, 0, 4, 3])
Parameters:
  • ascending (bool)

  • kind (SortKind)

  • na_position (str)

Return type:

np.ndarray

astype(dtype, copy=True)

Cast to a NumPy array or ExtensionArray with ‘dtype’.

Parameters

dtypestr or dtype

Typecode or data-type to which the array is cast.

copybool, default True

Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.

Returns

np.ndarray or pandas.api.extensions.ExtensionArray

An ExtensionArray if dtype is ExtensionDtype, otherwise a Numpy ndarray with dtype for its dtype.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

Casting to another ExtensionDtype returns an ExtensionArray:

>>> arr1 = arr.astype('Float64')
>>> arr1
<FloatingArray>
[1.0, 2.0, 3.0]
Length: 3, dtype: Float64
>>> arr1.dtype
Float64Dtype()

Otherwise, we will get a Numpy ndarray:

>>> arr2 = arr.astype('float64')
>>> arr2
array([1., 2., 3.])
>>> arr2.dtype
dtype('float64')
Parameters:
  • dtype (AstypeArg)

  • copy (bool)

Return type:

ArrayLike

copy()

Return a copy of the array.

Returns

ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.copy()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
dropna()

Return ExtensionArray without NA values.

Returns

Examples

>>> pd.array([1, 2, np.nan]).dropna()
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64
Return type:

Self

property dtype

An instance of ExtensionDtype.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
duplicated(keep='first')

Return boolean ndarray denoting duplicate values.

Parameters

keep{‘first’, ‘last’, False}, default ‘first’
  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns

ndarray[bool]

Examples

>>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated()
array([False,  True, False, False,  True])
Parameters:

keep (Literal['first', 'last', False])

Return type:

npt.NDArray[np.bool_]

equals(other)

Return if another array is equivalent to this array.

Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).

Parameters

otherExtensionArray

Array to compare to this Array.

Returns

boolean

Whether the arrays are equivalent.

Examples

>>> arr1 = pd.array([1, 2, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
True
Parameters:

other (object)

Return type:

bool

factorize(use_na_sentinel=True)

Encode the extension array as an enumerated type.

Parameters

use_na_sentinelbool, default True

If True, the sentinel -1 will be used for NaN values. If False, NaN values will be encoded as non-negative integers and will not drop the NaN from the uniques of the values.

Added in version 1.5.0.

Returns

codesndarray

An integer NumPy array that’s an indexer into the original ExtensionArray.

uniquesExtensionArray

An ExtensionArray containing the unique values of self.

Note

uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.

See Also

factorize : Top-level factorize method that dispatches here.

Notes

pandas.factorize() offers a sort keyword as well.

Examples

>>> idx1 = pd.PeriodIndex(["2014-01", "2014-01", "2014-02", "2014-02",
...                       "2014-03", "2014-03"], freq="M")
>>> arr, idx = idx1.factorize()
>>> arr
array([0, 0, 1, 1, 2, 2])
>>> idx
PeriodIndex(['2014-01', '2014-02', '2014-03'], dtype='period[M]')
Parameters:

use_na_sentinel (bool)

Return type:

tuple[ndarray, ExtensionArray]

fillna(value=None, method=None, limit=None, copy=True)

Fill NA/NaN values using the specified method.

Parameters

valuescalar, array-like

If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like “value” can be given. It’s expected that the array-like have the same length as ‘self’.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series:

  • pad / ffill: propagate last valid observation forward to next valid.

  • backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.0.

limitint, default None

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.

Deprecated since version 2.1.0.

copybool, default True

Whether to make a copy of the data before filling. If False, then the original should be modified and no new memory should be allocated. For ExtensionArray subclasses that cannot do this, it is at the author’s discretion whether to ignore “copy=False” or to raise. The base class implementation ignores the keyword in pad/backfill cases.

Returns

ExtensionArray

With NA/NaN filled.

Examples

>>> arr = pd.array([np.nan, np.nan, 2, 3, np.nan, np.nan])
>>> arr.fillna(0)
<IntegerArray>
[0, 0, 2, 3, 0, 0]
Length: 6, dtype: Int64
Parameters:
  • value (object | ArrayLike | None)

  • method (FillnaOptions | None)

  • limit (int | None)

  • copy (bool)

Return type:

Self

insert(loc, item)

Insert an item at the given position.

Parameters

loc : int item : scalar-like

Returns

same type as self

Notes

This method should be both type and dtype-preserving. If the item cannot be held in an array of this type/dtype, either ValueError or TypeError should be raised.

The default implementation relies on _from_sequence to raise on invalid items.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.insert(2, -1)
<IntegerArray>
[1, 2, -1, 3]
Length: 4, dtype: Int64
Parameters:

loc (int)

Return type:

Self

interpolate(*, method, axis, index, limit, limit_direction, limit_area, copy, **kwargs)

See DataFrame.interpolate.__doc__.

Examples

>>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3]))
>>> arr.interpolate(method="linear",
...                 limit=3,
...                 limit_direction="forward",
...                 index=pd.Index([1, 2, 3, 4]),
...                 fill_value=1,
...                 copy=False,
...                 axis=0,
...                 limit_area="inside"
...                 )
<NumpyExtensionArray>
[0.0, 1.0, 2.0, 3.0]
Length: 4, dtype: float64
Parameters:
  • method (InterpolateOptions)

  • axis (int)

  • index (Index)

  • copy (bool)

Return type:

Self

isin(values)

Pointwise comparison for set containment in the given values.

Roughly equivalent to np.array([x in values for x in self])

Parameters

values : np.ndarray or ExtensionArray

Returns

np.ndarray[bool]

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.isin([1])
<BooleanArray>
[True, False, False]
Length: 3, dtype: boolean
Parameters:

values (ArrayLike)

Return type:

npt.NDArray[np.bool_]

isna()

A 1-D array indicating if each value is missing.

Returns

numpy.ndarray or pandas.api.extensions.ExtensionArray

In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

Examples

>>> arr = pd.array([1, 2, np.nan, np.nan])
>>> arr.isna()
array([False, False,  True,  True])
map(mapper, na_action=None)

Map values using an input mapping or function.

Parameters

mapperfunction, dict, or Series

Mapping correspondence.

na_action{None, ‘ignore’}, default None

If ‘ignore’, propagate NA values, without passing them to the mapping correspondence. If ‘ignore’ is not supported, a NotImplementedError should be raised.

Returns

Union[ndarray, Index, ExtensionArray]

The output of the mapping function applied to the array. If the function returns a tuple with more than one element a MultiIndex will be returned.

nbytes()

The number of bytes needed to store this object in memory.

Examples

>>> pd.array([1, 2, 3]).nbytes
27
property ndim: int

Extension Arrays are only allowed to be 1-dimensional.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.ndim
1
ravel(order='C')

Return a flattened view on this array.

Parameters

order : {None, ‘C’, ‘F’, ‘A’, ‘K’}, default ‘C’

Returns

ExtensionArray

Notes

  • Because ExtensionArrays are 1D-only, this is a no-op.

  • The “order” argument is ignored, is for compatibility with NumPy.

Examples

>>> pd.array([1, 2, 3]).ravel()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

order (Literal['C', 'F', 'A', 'K'] | None)

Return type:

ExtensionArray

repeat(repeats, axis=None)

Repeat elements of a ExtensionArray.

Returns a new ExtensionArray where each element of the current ExtensionArray is repeated consecutively a given number of times.

Parameters

repeatsint or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty ExtensionArray.

axisNone

Must be None. Has no effect but is accepted for compatibility with numpy.

Returns

ExtensionArray

Newly created ExtensionArray with repeated elements.

See Also

Series.repeat : Equivalent function for Series. Index.repeat : Equivalent function for Index. numpy.repeat : Similar method for numpy.ndarray. ExtensionArray.take : Take arbitrary positions.

Examples

>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat(2)
['a', 'a', 'b', 'b', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat([1, 2, 3])
['a', 'b', 'b', 'c', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
Parameters:
  • repeats (int | Sequence[int])

  • axis (AxisInt | None)

Return type:

Self

searchsorted(value, side='left', sorter=None)

Find indices where elements should be inserted to maintain order.

Find the indices into a sorted array self (a) such that, if the corresponding elements in value were inserted before the indices, the order of self would be preserved.

Assuming that self is sorted:

side

returned index i satisfies

left

self[i-1] < value <= self[i]

right

self[i-1] <= value < self[i]

Parameters

valuearray-like, list or scalar

Value(s) to insert into self.

side{‘left’, ‘right’}, optional

If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of self).

sorter1-D array-like, optional

Optional array of integer indices that sort array a into ascending order. They are typically the result of argsort.

Returns

array of ints or int

If value is array-like, array of insertion points. If value is scalar, a single integer.

See Also

numpy.searchsorted : Similar method from NumPy.

Examples

>>> arr = pd.array([1, 2, 3, 5])
>>> arr.searchsorted([4])
array([3])
Parameters:
  • value (NumpyValueArrayLike | ExtensionArray)

  • side (Literal['left', 'right'])

  • sorter (NumpySorter | None)

Return type:

npt.NDArray[np.intp] | np.intp

property shape: Shape

Return a tuple of the array dimensions.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shape
(3,)
shift(periods=1, fill_value=None)

Shift values by desired number.

Newly introduced missing values are filled with self.dtype.na_value.

Parameters

periodsint, default 1

The number of periods to shift. Negative values are allowed for shifting backwards.

fill_valueobject, optional

The scalar value to use for newly introduced missing values. The default is self.dtype.na_value.

Returns

ExtensionArray

Shifted.

Notes

If self is empty or periods is 0, a copy of self is returned.

If periods > len(self), then an array of size len(self) is returned, with all values filled with self.dtype.na_value.

For 2-dimensional ExtensionArrays, we are always shifting along axis=0.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shift(2)
<IntegerArray>
[<NA>, <NA>, 1]
Length: 3, dtype: Int64
Parameters:
  • periods (int)

  • fill_value (object)

Return type:

ExtensionArray

property size: int

The number of elements in the array.

take(indexer, allow_fill=False, fill_value=None)

Take elements from an array.

Parameters

indicessequence of int or one-dimensional np.ndarray of int

Indices to be taken.

allow_fillbool, default False

How to handle negative values in indices.

  • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

  • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

fill_valueany, optional

Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.

Returns

ExtensionArray

Raises

IndexError

When the indices are out of bounds for the array.

ValueError

When indices contains negative values other than -1 and allow_fill is True.

See Also

numpy.take : Take elements from an array along an axis. api.extensions.take : Take elements from an array.

Notes

ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.

Examples

Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
                  allow_fill=allow_fill)
    return self._from_sequence(result, dtype=self.dtype)
to_numpy(dtype=None, copy=False, na_value=<no_default>)

Convert to a NumPy ndarray.

This is similar to numpy.asarray(), but may provide additional control over how the conversion is done.

Parameters

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default False

Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_valueAny, optional

The value to use for missing values. The default value depends on dtype and the type of the array.

Returns

numpy.ndarray

Parameters:
  • dtype (npt.DTypeLike | None)

  • copy (bool)

  • na_value (object)

Return type:

np.ndarray

tolist()

Return a list of the values.

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Returns

list

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.tolist()
[1, 2, 3]
Return type:

list

transpose(*axes)

Return a transposed view on this array.

Because ExtensionArrays are always 1D, this is a no-op. It is included for compatibility with np.ndarray.

Returns

ExtensionArray

Examples

>>> pd.array([1, 2, 3]).transpose()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

axes (int)

Return type:

ExtensionArray

unique()

Compute the ExtensionArray of unique values.

Returns

pandas.api.extensions.ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3, 1, 2, 3])
>>> arr.unique()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Return type:

Self

view(dtype=None)

Return a view on the array.

Parameters

dtypestr, np.dtype, or ExtensionDtype, optional

Default None.

Returns

ExtensionArray or np.ndarray

A view on the ExtensionArray’s data.

Examples

This gives view on the underlying data of an ExtensionArray and is not a copy. Modifications on either the view or the original ExtensionArray will be reflectd on the underlying data:

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.view()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[2, 2, 3]
Length: 3, dtype: Int64
Parameters:

dtype (Dtype | None)

Return type:

ArrayLike

class pymongoarrow.pandas_types.PandasBinary(subtype)

A pandas extension type for BSON Binary data type.

classmethod construct_array_type()

Return the array type associated with this dtype.

Returns

type

Return type:

type[PandasBinaryArray]

classmethod construct_from_string(string)

Construct this type from a string.

This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as period[h] (where H means hourly frequency).

By default, in the abstract class, just the name of the type is expected. But subclasses can overwrite this method to accept parameters.

Parameters

stringstr

The name of the type, for example category.

Returns

ExtensionDtype

Instance of the dtype.

Raises

TypeError

If a class cannot be constructed from this ‘string’.

Examples

For extension dtypes with arguments the following may be an adequate implementation.

>>> import re
>>> @classmethod
... def construct_from_string(cls, string):
...     pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
...     match = pattern.match(string)
...     if match:
...         return cls(**match.groupdict())
...     else:
...         raise TypeError(
...             f"Cannot construct a '{cls.__name__}' from '{string}'"
...         )
empty(shape)

Construct an ExtensionArray of this dtype with the given shape.

Analogous to numpy.empty.

Parameters

shape : int or tuple[int]

Returns

ExtensionArray

Parameters:

shape (Shape)

Return type:

ExtensionArray

index_class

The Index subclass to return from Index.__new__ when this dtype is encountered.

classmethod is_dtype(dtype)

Check if we match ‘dtype’.

Parameters

dtypeobject

The object to check.

Returns

bool

Notes

The default implementation is True if

  1. cls.construct_from_string(dtype) is an instance of cls.

  2. dtype is an object and is an instance of cls

  3. dtype has a dtype attribute, and any of the above conditions is true for dtype.dtype.

Parameters:

dtype (object)

Return type:

bool

property kind: str

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See Also

numpy.dtype.kind

property name: str

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

property names: list[str] | None

Ordered list of field names, or None if there are no fields.

This is for compatibility with NumPy arrays, and may be removed in the future.

type

alias of Binary

class pymongoarrow.pandas_types.PandasBinaryArray(values, dtype, copy=False)

A pandas extension type for BSON Binary data arrays.

argmax(skipna=True)

Return the index of maximum value.

In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmin : Return the index of the minimum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmax()
3
Parameters:

skipna (bool)

Return type:

int

argmin(skipna=True)

Return the index of minimum value.

In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmax : Return the index of the maximum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmin()
1
Parameters:

skipna (bool)

Return type:

int

argsort(*, ascending=True, kind='quicksort', na_position='last', **kwargs)

Return the indices that would sort this array.

Parameters

ascendingbool, default True

Whether the indices should result in an ascending or descending sort.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional

Sorting algorithm.

na_position{‘first’, ‘last’}, default ‘last’

If 'first', put NaN values at the beginning. If 'last', put NaN values at the end.

*args, **kwargs:

Passed through to numpy.argsort().

Returns

np.ndarray[np.intp]

Array of indices that sort self. If NaN values are contained, NaN values are placed at the end.

See Also

numpy.argsort : Sorting implementation used internally.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argsort()
array([1, 2, 0, 4, 3])
Parameters:
  • ascending (bool)

  • kind (SortKind)

  • na_position (str)

Return type:

np.ndarray

astype(dtype, copy=True)

Cast to a NumPy array or ExtensionArray with ‘dtype’.

Parameters

dtypestr or dtype

Typecode or data-type to which the array is cast.

copybool, default True

Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.

Returns

np.ndarray or pandas.api.extensions.ExtensionArray

An ExtensionArray if dtype is ExtensionDtype, otherwise a Numpy ndarray with dtype for its dtype.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

Casting to another ExtensionDtype returns an ExtensionArray:

>>> arr1 = arr.astype('Float64')
>>> arr1
<FloatingArray>
[1.0, 2.0, 3.0]
Length: 3, dtype: Float64
>>> arr1.dtype
Float64Dtype()

Otherwise, we will get a Numpy ndarray:

>>> arr2 = arr.astype('float64')
>>> arr2
array([1., 2., 3.])
>>> arr2.dtype
dtype('float64')
Parameters:
  • dtype (AstypeArg)

  • copy (bool)

Return type:

ArrayLike

copy()

Return a copy of the array.

Returns

ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.copy()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
dropna()

Return ExtensionArray without NA values.

Returns

Examples

>>> pd.array([1, 2, np.nan]).dropna()
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64
Return type:

Self

property dtype

An instance of ExtensionDtype.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
duplicated(keep='first')

Return boolean ndarray denoting duplicate values.

Parameters

keep{‘first’, ‘last’, False}, default ‘first’
  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns

ndarray[bool]

Examples

>>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated()
array([False,  True, False, False,  True])
Parameters:

keep (Literal['first', 'last', False])

Return type:

npt.NDArray[np.bool_]

equals(other)

Return if another array is equivalent to this array.

Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).

Parameters

otherExtensionArray

Array to compare to this Array.

Returns

boolean

Whether the arrays are equivalent.

Examples

>>> arr1 = pd.array([1, 2, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
True
Parameters:

other (object)

Return type:

bool

factorize(use_na_sentinel=True)

Encode the extension array as an enumerated type.

Parameters

use_na_sentinelbool, default True

If True, the sentinel -1 will be used for NaN values. If False, NaN values will be encoded as non-negative integers and will not drop the NaN from the uniques of the values.

Added in version 1.5.0.

Returns

codesndarray

An integer NumPy array that’s an indexer into the original ExtensionArray.

uniquesExtensionArray

An ExtensionArray containing the unique values of self.

Note

uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.

See Also

factorize : Top-level factorize method that dispatches here.

Notes

pandas.factorize() offers a sort keyword as well.

Examples

>>> idx1 = pd.PeriodIndex(["2014-01", "2014-01", "2014-02", "2014-02",
...                       "2014-03", "2014-03"], freq="M")
>>> arr, idx = idx1.factorize()
>>> arr
array([0, 0, 1, 1, 2, 2])
>>> idx
PeriodIndex(['2014-01', '2014-02', '2014-03'], dtype='period[M]')
Parameters:

use_na_sentinel (bool)

Return type:

tuple[ndarray, ExtensionArray]

fillna(value=None, method=None, limit=None, copy=True)

Fill NA/NaN values using the specified method.

Parameters

valuescalar, array-like

If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like “value” can be given. It’s expected that the array-like have the same length as ‘self’.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series:

  • pad / ffill: propagate last valid observation forward to next valid.

  • backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.0.

limitint, default None

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.

Deprecated since version 2.1.0.

copybool, default True

Whether to make a copy of the data before filling. If False, then the original should be modified and no new memory should be allocated. For ExtensionArray subclasses that cannot do this, it is at the author’s discretion whether to ignore “copy=False” or to raise. The base class implementation ignores the keyword in pad/backfill cases.

Returns

ExtensionArray

With NA/NaN filled.

Examples

>>> arr = pd.array([np.nan, np.nan, 2, 3, np.nan, np.nan])
>>> arr.fillna(0)
<IntegerArray>
[0, 0, 2, 3, 0, 0]
Length: 6, dtype: Int64
Parameters:
  • value (object | ArrayLike | None)

  • method (FillnaOptions | None)

  • limit (int | None)

  • copy (bool)

Return type:

Self

insert(loc, item)

Insert an item at the given position.

Parameters

loc : int item : scalar-like

Returns

same type as self

Notes

This method should be both type and dtype-preserving. If the item cannot be held in an array of this type/dtype, either ValueError or TypeError should be raised.

The default implementation relies on _from_sequence to raise on invalid items.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.insert(2, -1)
<IntegerArray>
[1, 2, -1, 3]
Length: 4, dtype: Int64
Parameters:

loc (int)

Return type:

Self

interpolate(*, method, axis, index, limit, limit_direction, limit_area, copy, **kwargs)

See DataFrame.interpolate.__doc__.

Examples

>>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3]))
>>> arr.interpolate(method="linear",
...                 limit=3,
...                 limit_direction="forward",
...                 index=pd.Index([1, 2, 3, 4]),
...                 fill_value=1,
...                 copy=False,
...                 axis=0,
...                 limit_area="inside"
...                 )
<NumpyExtensionArray>
[0.0, 1.0, 2.0, 3.0]
Length: 4, dtype: float64
Parameters:
  • method (InterpolateOptions)

  • axis (int)

  • index (Index)

  • copy (bool)

Return type:

Self

isin(values)

Pointwise comparison for set containment in the given values.

Roughly equivalent to np.array([x in values for x in self])

Parameters

values : np.ndarray or ExtensionArray

Returns

np.ndarray[bool]

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.isin([1])
<BooleanArray>
[True, False, False]
Length: 3, dtype: boolean
Parameters:

values (ArrayLike)

Return type:

npt.NDArray[np.bool_]

isna()

A 1-D array indicating if each value is missing.

Returns

numpy.ndarray or pandas.api.extensions.ExtensionArray

In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

Examples

>>> arr = pd.array([1, 2, np.nan, np.nan])
>>> arr.isna()
array([False, False,  True,  True])
map(mapper, na_action=None)

Map values using an input mapping or function.

Parameters

mapperfunction, dict, or Series

Mapping correspondence.

na_action{None, ‘ignore’}, default None

If ‘ignore’, propagate NA values, without passing them to the mapping correspondence. If ‘ignore’ is not supported, a NotImplementedError should be raised.

Returns

Union[ndarray, Index, ExtensionArray]

The output of the mapping function applied to the array. If the function returns a tuple with more than one element a MultiIndex will be returned.

nbytes()

The number of bytes needed to store this object in memory.

Examples

>>> pd.array([1, 2, 3]).nbytes
27
property ndim: int

Extension Arrays are only allowed to be 1-dimensional.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.ndim
1
ravel(order='C')

Return a flattened view on this array.

Parameters

order : {None, ‘C’, ‘F’, ‘A’, ‘K’}, default ‘C’

Returns

ExtensionArray

Notes

  • Because ExtensionArrays are 1D-only, this is a no-op.

  • The “order” argument is ignored, is for compatibility with NumPy.

Examples

>>> pd.array([1, 2, 3]).ravel()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

order (Literal['C', 'F', 'A', 'K'] | None)

Return type:

ExtensionArray

repeat(repeats, axis=None)

Repeat elements of a ExtensionArray.

Returns a new ExtensionArray where each element of the current ExtensionArray is repeated consecutively a given number of times.

Parameters

repeatsint or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty ExtensionArray.

axisNone

Must be None. Has no effect but is accepted for compatibility with numpy.

Returns

ExtensionArray

Newly created ExtensionArray with repeated elements.

See Also

Series.repeat : Equivalent function for Series. Index.repeat : Equivalent function for Index. numpy.repeat : Similar method for numpy.ndarray. ExtensionArray.take : Take arbitrary positions.

Examples

>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat(2)
['a', 'a', 'b', 'b', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat([1, 2, 3])
['a', 'b', 'b', 'c', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
Parameters:
  • repeats (int | Sequence[int])

  • axis (AxisInt | None)

Return type:

Self

searchsorted(value, side='left', sorter=None)

Find indices where elements should be inserted to maintain order.

Find the indices into a sorted array self (a) such that, if the corresponding elements in value were inserted before the indices, the order of self would be preserved.

Assuming that self is sorted:

side

returned index i satisfies

left

self[i-1] < value <= self[i]

right

self[i-1] <= value < self[i]

Parameters

valuearray-like, list or scalar

Value(s) to insert into self.

side{‘left’, ‘right’}, optional

If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of self).

sorter1-D array-like, optional

Optional array of integer indices that sort array a into ascending order. They are typically the result of argsort.

Returns

array of ints or int

If value is array-like, array of insertion points. If value is scalar, a single integer.

See Also

numpy.searchsorted : Similar method from NumPy.

Examples

>>> arr = pd.array([1, 2, 3, 5])
>>> arr.searchsorted([4])
array([3])
Parameters:
  • value (NumpyValueArrayLike | ExtensionArray)

  • side (Literal['left', 'right'])

  • sorter (NumpySorter | None)

Return type:

npt.NDArray[np.intp] | np.intp

property shape: Shape

Return a tuple of the array dimensions.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shape
(3,)
shift(periods=1, fill_value=None)

Shift values by desired number.

Newly introduced missing values are filled with self.dtype.na_value.

Parameters

periodsint, default 1

The number of periods to shift. Negative values are allowed for shifting backwards.

fill_valueobject, optional

The scalar value to use for newly introduced missing values. The default is self.dtype.na_value.

Returns

ExtensionArray

Shifted.

Notes

If self is empty or periods is 0, a copy of self is returned.

If periods > len(self), then an array of size len(self) is returned, with all values filled with self.dtype.na_value.

For 2-dimensional ExtensionArrays, we are always shifting along axis=0.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shift(2)
<IntegerArray>
[<NA>, <NA>, 1]
Length: 3, dtype: Int64
Parameters:
  • periods (int)

  • fill_value (object)

Return type:

ExtensionArray

property size: int

The number of elements in the array.

take(indexer, allow_fill=False, fill_value=None)

Take elements from an array.

Parameters

indicessequence of int or one-dimensional np.ndarray of int

Indices to be taken.

allow_fillbool, default False

How to handle negative values in indices.

  • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

  • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

fill_valueany, optional

Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.

Returns

ExtensionArray

Raises

IndexError

When the indices are out of bounds for the array.

ValueError

When indices contains negative values other than -1 and allow_fill is True.

See Also

numpy.take : Take elements from an array along an axis. api.extensions.take : Take elements from an array.

Notes

ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.

Examples

Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
                  allow_fill=allow_fill)
    return self._from_sequence(result, dtype=self.dtype)
to_numpy(dtype=None, copy=False, na_value=<no_default>)

Convert to a NumPy ndarray.

This is similar to numpy.asarray(), but may provide additional control over how the conversion is done.

Parameters

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default False

Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_valueAny, optional

The value to use for missing values. The default value depends on dtype and the type of the array.

Returns

numpy.ndarray

Parameters:
  • dtype (npt.DTypeLike | None)

  • copy (bool)

  • na_value (object)

Return type:

np.ndarray

tolist()

Return a list of the values.

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Returns

list

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.tolist()
[1, 2, 3]
Return type:

list

transpose(*axes)

Return a transposed view on this array.

Because ExtensionArrays are always 1D, this is a no-op. It is included for compatibility with np.ndarray.

Returns

ExtensionArray

Examples

>>> pd.array([1, 2, 3]).transpose()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

axes (int)

Return type:

ExtensionArray

unique()

Compute the ExtensionArray of unique values.

Returns

pandas.api.extensions.ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3, 1, 2, 3])
>>> arr.unique()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Return type:

Self

view(dtype=None)

Return a view on the array.

Parameters

dtypestr, np.dtype, or ExtensionDtype, optional

Default None.

Returns

ExtensionArray or np.ndarray

A view on the ExtensionArray’s data.

Examples

This gives view on the underlying data of an ExtensionArray and is not a copy. Modifications on either the view or the original ExtensionArray will be reflectd on the underlying data:

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.view()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[2, 2, 3]
Length: 3, dtype: Int64
Parameters:

dtype (Dtype | None)

Return type:

ArrayLike

class pymongoarrow.pandas_types.PandasCode

A pandas extension type for BSON Code data type.

classmethod construct_array_type()

Return the array type associated with this dtype.

Returns

type

Return type:

type[PandasCodeArray]

classmethod construct_from_string(string)

Construct this type from a string.

This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as period[h] (where H means hourly frequency).

By default, in the abstract class, just the name of the type is expected. But subclasses can overwrite this method to accept parameters.

Parameters

stringstr

The name of the type, for example category.

Returns

ExtensionDtype

Instance of the dtype.

Raises

TypeError

If a class cannot be constructed from this ‘string’.

Examples

For extension dtypes with arguments the following may be an adequate implementation.

>>> import re
>>> @classmethod
... def construct_from_string(cls, string):
...     pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
...     match = pattern.match(string)
...     if match:
...         return cls(**match.groupdict())
...     else:
...         raise TypeError(
...             f"Cannot construct a '{cls.__name__}' from '{string}'"
...         )
empty(shape)

Construct an ExtensionArray of this dtype with the given shape.

Analogous to numpy.empty.

Parameters

shape : int or tuple[int]

Returns

ExtensionArray

Parameters:

shape (Shape)

Return type:

ExtensionArray

index_class

The Index subclass to return from Index.__new__ when this dtype is encountered.

classmethod is_dtype(dtype)

Check if we match ‘dtype’.

Parameters

dtypeobject

The object to check.

Returns

bool

Notes

The default implementation is True if

  1. cls.construct_from_string(dtype) is an instance of cls.

  2. dtype is an object and is an instance of cls

  3. dtype has a dtype attribute, and any of the above conditions is true for dtype.dtype.

Parameters:

dtype (object)

Return type:

bool

property kind: str

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See Also

numpy.dtype.kind

property name: str

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

property names: list[str] | None

Ordered list of field names, or None if there are no fields.

This is for compatibility with NumPy arrays, and may be removed in the future.

type

alias of Code

class pymongoarrow.pandas_types.PandasCodeArray(values, dtype, copy=False)

A pandas extension type for BSON Code data arrays.

argmax(skipna=True)

Return the index of maximum value.

In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmin : Return the index of the minimum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmax()
3
Parameters:

skipna (bool)

Return type:

int

argmin(skipna=True)

Return the index of minimum value.

In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmax : Return the index of the maximum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmin()
1
Parameters:

skipna (bool)

Return type:

int

argsort(*, ascending=True, kind='quicksort', na_position='last', **kwargs)

Return the indices that would sort this array.

Parameters

ascendingbool, default True

Whether the indices should result in an ascending or descending sort.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional

Sorting algorithm.

na_position{‘first’, ‘last’}, default ‘last’

If 'first', put NaN values at the beginning. If 'last', put NaN values at the end.

*args, **kwargs:

Passed through to numpy.argsort().

Returns

np.ndarray[np.intp]

Array of indices that sort self. If NaN values are contained, NaN values are placed at the end.

See Also

numpy.argsort : Sorting implementation used internally.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argsort()
array([1, 2, 0, 4, 3])
Parameters:
  • ascending (bool)

  • kind (SortKind)

  • na_position (str)

Return type:

np.ndarray

astype(dtype, copy=True)

Cast to a NumPy array or ExtensionArray with ‘dtype’.

Parameters

dtypestr or dtype

Typecode or data-type to which the array is cast.

copybool, default True

Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.

Returns

np.ndarray or pandas.api.extensions.ExtensionArray

An ExtensionArray if dtype is ExtensionDtype, otherwise a Numpy ndarray with dtype for its dtype.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

Casting to another ExtensionDtype returns an ExtensionArray:

>>> arr1 = arr.astype('Float64')
>>> arr1
<FloatingArray>
[1.0, 2.0, 3.0]
Length: 3, dtype: Float64
>>> arr1.dtype
Float64Dtype()

Otherwise, we will get a Numpy ndarray:

>>> arr2 = arr.astype('float64')
>>> arr2
array([1., 2., 3.])
>>> arr2.dtype
dtype('float64')
Parameters:
  • dtype (AstypeArg)

  • copy (bool)

Return type:

ArrayLike

copy()

Return a copy of the array.

Returns

ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.copy()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
dropna()

Return ExtensionArray without NA values.

Returns

Examples

>>> pd.array([1, 2, np.nan]).dropna()
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64
Return type:

Self

property dtype

An instance of ExtensionDtype.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
duplicated(keep='first')

Return boolean ndarray denoting duplicate values.

Parameters

keep{‘first’, ‘last’, False}, default ‘first’
  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns

ndarray[bool]

Examples

>>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated()
array([False,  True, False, False,  True])
Parameters:

keep (Literal['first', 'last', False])

Return type:

npt.NDArray[np.bool_]

equals(other)

Return if another array is equivalent to this array.

Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).

Parameters

otherExtensionArray

Array to compare to this Array.

Returns

boolean

Whether the arrays are equivalent.

Examples

>>> arr1 = pd.array([1, 2, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
True
Parameters:

other (object)

Return type:

bool

factorize(use_na_sentinel=True)

Encode the extension array as an enumerated type.

Parameters

use_na_sentinelbool, default True

If True, the sentinel -1 will be used for NaN values. If False, NaN values will be encoded as non-negative integers and will not drop the NaN from the uniques of the values.

Added in version 1.5.0.

Returns

codesndarray

An integer NumPy array that’s an indexer into the original ExtensionArray.

uniquesExtensionArray

An ExtensionArray containing the unique values of self.

Note

uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.

See Also

factorize : Top-level factorize method that dispatches here.

Notes

pandas.factorize() offers a sort keyword as well.

Examples

>>> idx1 = pd.PeriodIndex(["2014-01", "2014-01", "2014-02", "2014-02",
...                       "2014-03", "2014-03"], freq="M")
>>> arr, idx = idx1.factorize()
>>> arr
array([0, 0, 1, 1, 2, 2])
>>> idx
PeriodIndex(['2014-01', '2014-02', '2014-03'], dtype='period[M]')
Parameters:

use_na_sentinel (bool)

Return type:

tuple[ndarray, ExtensionArray]

fillna(value=None, method=None, limit=None, copy=True)

Fill NA/NaN values using the specified method.

Parameters

valuescalar, array-like

If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like “value” can be given. It’s expected that the array-like have the same length as ‘self’.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series:

  • pad / ffill: propagate last valid observation forward to next valid.

  • backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.0.

limitint, default None

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.

Deprecated since version 2.1.0.

copybool, default True

Whether to make a copy of the data before filling. If False, then the original should be modified and no new memory should be allocated. For ExtensionArray subclasses that cannot do this, it is at the author’s discretion whether to ignore “copy=False” or to raise. The base class implementation ignores the keyword in pad/backfill cases.

Returns

ExtensionArray

With NA/NaN filled.

Examples

>>> arr = pd.array([np.nan, np.nan, 2, 3, np.nan, np.nan])
>>> arr.fillna(0)
<IntegerArray>
[0, 0, 2, 3, 0, 0]
Length: 6, dtype: Int64
Parameters:
  • value (object | ArrayLike | None)

  • method (FillnaOptions | None)

  • limit (int | None)

  • copy (bool)

Return type:

Self

insert(loc, item)

Insert an item at the given position.

Parameters

loc : int item : scalar-like

Returns

same type as self

Notes

This method should be both type and dtype-preserving. If the item cannot be held in an array of this type/dtype, either ValueError or TypeError should be raised.

The default implementation relies on _from_sequence to raise on invalid items.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.insert(2, -1)
<IntegerArray>
[1, 2, -1, 3]
Length: 4, dtype: Int64
Parameters:

loc (int)

Return type:

Self

interpolate(*, method, axis, index, limit, limit_direction, limit_area, copy, **kwargs)

See DataFrame.interpolate.__doc__.

Examples

>>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3]))
>>> arr.interpolate(method="linear",
...                 limit=3,
...                 limit_direction="forward",
...                 index=pd.Index([1, 2, 3, 4]),
...                 fill_value=1,
...                 copy=False,
...                 axis=0,
...                 limit_area="inside"
...                 )
<NumpyExtensionArray>
[0.0, 1.0, 2.0, 3.0]
Length: 4, dtype: float64
Parameters:
  • method (InterpolateOptions)

  • axis (int)

  • index (Index)

  • copy (bool)

Return type:

Self

isin(values)

Pointwise comparison for set containment in the given values.

Roughly equivalent to np.array([x in values for x in self])

Parameters

values : np.ndarray or ExtensionArray

Returns

np.ndarray[bool]

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.isin([1])
<BooleanArray>
[True, False, False]
Length: 3, dtype: boolean
Parameters:

values (ArrayLike)

Return type:

npt.NDArray[np.bool_]

isna()

A 1-D array indicating if each value is missing.

Returns

numpy.ndarray or pandas.api.extensions.ExtensionArray

In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

Examples

>>> arr = pd.array([1, 2, np.nan, np.nan])
>>> arr.isna()
array([False, False,  True,  True])
map(mapper, na_action=None)

Map values using an input mapping or function.

Parameters

mapperfunction, dict, or Series

Mapping correspondence.

na_action{None, ‘ignore’}, default None

If ‘ignore’, propagate NA values, without passing them to the mapping correspondence. If ‘ignore’ is not supported, a NotImplementedError should be raised.

Returns

Union[ndarray, Index, ExtensionArray]

The output of the mapping function applied to the array. If the function returns a tuple with more than one element a MultiIndex will be returned.

nbytes()

The number of bytes needed to store this object in memory.

Examples

>>> pd.array([1, 2, 3]).nbytes
27
property ndim: int

Extension Arrays are only allowed to be 1-dimensional.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.ndim
1
ravel(order='C')

Return a flattened view on this array.

Parameters

order : {None, ‘C’, ‘F’, ‘A’, ‘K’}, default ‘C’

Returns

ExtensionArray

Notes

  • Because ExtensionArrays are 1D-only, this is a no-op.

  • The “order” argument is ignored, is for compatibility with NumPy.

Examples

>>> pd.array([1, 2, 3]).ravel()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

order (Literal['C', 'F', 'A', 'K'] | None)

Return type:

ExtensionArray

repeat(repeats, axis=None)

Repeat elements of a ExtensionArray.

Returns a new ExtensionArray where each element of the current ExtensionArray is repeated consecutively a given number of times.

Parameters

repeatsint or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty ExtensionArray.

axisNone

Must be None. Has no effect but is accepted for compatibility with numpy.

Returns

ExtensionArray

Newly created ExtensionArray with repeated elements.

See Also

Series.repeat : Equivalent function for Series. Index.repeat : Equivalent function for Index. numpy.repeat : Similar method for numpy.ndarray. ExtensionArray.take : Take arbitrary positions.

Examples

>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat(2)
['a', 'a', 'b', 'b', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat([1, 2, 3])
['a', 'b', 'b', 'c', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
Parameters:
  • repeats (int | Sequence[int])

  • axis (AxisInt | None)

Return type:

Self

searchsorted(value, side='left', sorter=None)

Find indices where elements should be inserted to maintain order.

Find the indices into a sorted array self (a) such that, if the corresponding elements in value were inserted before the indices, the order of self would be preserved.

Assuming that self is sorted:

side

returned index i satisfies

left

self[i-1] < value <= self[i]

right

self[i-1] <= value < self[i]

Parameters

valuearray-like, list or scalar

Value(s) to insert into self.

side{‘left’, ‘right’}, optional

If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of self).

sorter1-D array-like, optional

Optional array of integer indices that sort array a into ascending order. They are typically the result of argsort.

Returns

array of ints or int

If value is array-like, array of insertion points. If value is scalar, a single integer.

See Also

numpy.searchsorted : Similar method from NumPy.

Examples

>>> arr = pd.array([1, 2, 3, 5])
>>> arr.searchsorted([4])
array([3])
Parameters:
  • value (NumpyValueArrayLike | ExtensionArray)

  • side (Literal['left', 'right'])

  • sorter (NumpySorter | None)

Return type:

npt.NDArray[np.intp] | np.intp

property shape: Shape

Return a tuple of the array dimensions.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shape
(3,)
shift(periods=1, fill_value=None)

Shift values by desired number.

Newly introduced missing values are filled with self.dtype.na_value.

Parameters

periodsint, default 1

The number of periods to shift. Negative values are allowed for shifting backwards.

fill_valueobject, optional

The scalar value to use for newly introduced missing values. The default is self.dtype.na_value.

Returns

ExtensionArray

Shifted.

Notes

If self is empty or periods is 0, a copy of self is returned.

If periods > len(self), then an array of size len(self) is returned, with all values filled with self.dtype.na_value.

For 2-dimensional ExtensionArrays, we are always shifting along axis=0.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shift(2)
<IntegerArray>
[<NA>, <NA>, 1]
Length: 3, dtype: Int64
Parameters:
  • periods (int)

  • fill_value (object)

Return type:

ExtensionArray

property size: int

The number of elements in the array.

take(indexer, allow_fill=False, fill_value=None)

Take elements from an array.

Parameters

indicessequence of int or one-dimensional np.ndarray of int

Indices to be taken.

allow_fillbool, default False

How to handle negative values in indices.

  • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

  • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

fill_valueany, optional

Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.

Returns

ExtensionArray

Raises

IndexError

When the indices are out of bounds for the array.

ValueError

When indices contains negative values other than -1 and allow_fill is True.

See Also

numpy.take : Take elements from an array along an axis. api.extensions.take : Take elements from an array.

Notes

ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.

Examples

Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
                  allow_fill=allow_fill)
    return self._from_sequence(result, dtype=self.dtype)
to_numpy(dtype=None, copy=False, na_value=<no_default>)

Convert to a NumPy ndarray.

This is similar to numpy.asarray(), but may provide additional control over how the conversion is done.

Parameters

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default False

Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_valueAny, optional

The value to use for missing values. The default value depends on dtype and the type of the array.

Returns

numpy.ndarray

Parameters:
  • dtype (npt.DTypeLike | None)

  • copy (bool)

  • na_value (object)

Return type:

np.ndarray

tolist()

Return a list of the values.

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Returns

list

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.tolist()
[1, 2, 3]
Return type:

list

transpose(*axes)

Return a transposed view on this array.

Because ExtensionArrays are always 1D, this is a no-op. It is included for compatibility with np.ndarray.

Returns

ExtensionArray

Examples

>>> pd.array([1, 2, 3]).transpose()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

axes (int)

Return type:

ExtensionArray

unique()

Compute the ExtensionArray of unique values.

Returns

pandas.api.extensions.ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3, 1, 2, 3])
>>> arr.unique()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Return type:

Self

view(dtype=None)

Return a view on the array.

Parameters

dtypestr, np.dtype, or ExtensionDtype, optional

Default None.

Returns

ExtensionArray or np.ndarray

A view on the ExtensionArray’s data.

Examples

This gives view on the underlying data of an ExtensionArray and is not a copy. Modifications on either the view or the original ExtensionArray will be reflectd on the underlying data:

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.view()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[2, 2, 3]
Length: 3, dtype: Int64
Parameters:

dtype (Dtype | None)

Return type:

ArrayLike

class pymongoarrow.pandas_types.PandasDecimal128

A pandas extension type for BSON Decimal128 data type.

classmethod construct_array_type()

Return the array type associated with this dtype.

Returns

type

Return type:

type[PandasDecimal128Array]

classmethod construct_from_string(string)

Construct this type from a string.

This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as period[h] (where H means hourly frequency).

By default, in the abstract class, just the name of the type is expected. But subclasses can overwrite this method to accept parameters.

Parameters

stringstr

The name of the type, for example category.

Returns

ExtensionDtype

Instance of the dtype.

Raises

TypeError

If a class cannot be constructed from this ‘string’.

Examples

For extension dtypes with arguments the following may be an adequate implementation.

>>> import re
>>> @classmethod
... def construct_from_string(cls, string):
...     pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
...     match = pattern.match(string)
...     if match:
...         return cls(**match.groupdict())
...     else:
...         raise TypeError(
...             f"Cannot construct a '{cls.__name__}' from '{string}'"
...         )
empty(shape)

Construct an ExtensionArray of this dtype with the given shape.

Analogous to numpy.empty.

Parameters

shape : int or tuple[int]

Returns

ExtensionArray

Parameters:

shape (Shape)

Return type:

ExtensionArray

index_class

The Index subclass to return from Index.__new__ when this dtype is encountered.

classmethod is_dtype(dtype)

Check if we match ‘dtype’.

Parameters

dtypeobject

The object to check.

Returns

bool

Notes

The default implementation is True if

  1. cls.construct_from_string(dtype) is an instance of cls.

  2. dtype is an object and is an instance of cls

  3. dtype has a dtype attribute, and any of the above conditions is true for dtype.dtype.

Parameters:

dtype (object)

Return type:

bool

property kind: str

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See Also

numpy.dtype.kind

property name: str

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

property names: list[str] | None

Ordered list of field names, or None if there are no fields.

This is for compatibility with NumPy arrays, and may be removed in the future.

type

alias of Decimal128

class pymongoarrow.pandas_types.PandasDecimal128Array(values, dtype, copy=False)

A pandas extension type for BSON Binary data arrays.

argmax(skipna=True)

Return the index of maximum value.

In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmin : Return the index of the minimum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmax()
3
Parameters:

skipna (bool)

Return type:

int

argmin(skipna=True)

Return the index of minimum value.

In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmax : Return the index of the maximum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmin()
1
Parameters:

skipna (bool)

Return type:

int

argsort(*, ascending=True, kind='quicksort', na_position='last', **kwargs)

Return the indices that would sort this array.

Parameters

ascendingbool, default True

Whether the indices should result in an ascending or descending sort.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional

Sorting algorithm.

na_position{‘first’, ‘last’}, default ‘last’

If 'first', put NaN values at the beginning. If 'last', put NaN values at the end.

*args, **kwargs:

Passed through to numpy.argsort().

Returns

np.ndarray[np.intp]

Array of indices that sort self. If NaN values are contained, NaN values are placed at the end.

See Also

numpy.argsort : Sorting implementation used internally.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argsort()
array([1, 2, 0, 4, 3])
Parameters:
  • ascending (bool)

  • kind (SortKind)

  • na_position (str)

Return type:

np.ndarray

astype(dtype, copy=True)

Cast to a NumPy array or ExtensionArray with ‘dtype’.

Parameters

dtypestr or dtype

Typecode or data-type to which the array is cast.

copybool, default True

Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.

Returns

np.ndarray or pandas.api.extensions.ExtensionArray

An ExtensionArray if dtype is ExtensionDtype, otherwise a Numpy ndarray with dtype for its dtype.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

Casting to another ExtensionDtype returns an ExtensionArray:

>>> arr1 = arr.astype('Float64')
>>> arr1
<FloatingArray>
[1.0, 2.0, 3.0]
Length: 3, dtype: Float64
>>> arr1.dtype
Float64Dtype()

Otherwise, we will get a Numpy ndarray:

>>> arr2 = arr.astype('float64')
>>> arr2
array([1., 2., 3.])
>>> arr2.dtype
dtype('float64')
Parameters:
  • dtype (AstypeArg)

  • copy (bool)

Return type:

ArrayLike

copy()

Return a copy of the array.

Returns

ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.copy()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
dropna()

Return ExtensionArray without NA values.

Returns

Examples

>>> pd.array([1, 2, np.nan]).dropna()
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64
Return type:

Self

property dtype

An instance of ExtensionDtype.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
duplicated(keep='first')

Return boolean ndarray denoting duplicate values.

Parameters

keep{‘first’, ‘last’, False}, default ‘first’
  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns

ndarray[bool]

Examples

>>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated()
array([False,  True, False, False,  True])
Parameters:

keep (Literal['first', 'last', False])

Return type:

npt.NDArray[np.bool_]

equals(other)

Return if another array is equivalent to this array.

Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).

Parameters

otherExtensionArray

Array to compare to this Array.

Returns

boolean

Whether the arrays are equivalent.

Examples

>>> arr1 = pd.array([1, 2, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
True
Parameters:

other (object)

Return type:

bool

factorize(use_na_sentinel=True)

Encode the extension array as an enumerated type.

Parameters

use_na_sentinelbool, default True

If True, the sentinel -1 will be used for NaN values. If False, NaN values will be encoded as non-negative integers and will not drop the NaN from the uniques of the values.

Added in version 1.5.0.

Returns

codesndarray

An integer NumPy array that’s an indexer into the original ExtensionArray.

uniquesExtensionArray

An ExtensionArray containing the unique values of self.

Note

uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.

See Also

factorize : Top-level factorize method that dispatches here.

Notes

pandas.factorize() offers a sort keyword as well.

Examples

>>> idx1 = pd.PeriodIndex(["2014-01", "2014-01", "2014-02", "2014-02",
...                       "2014-03", "2014-03"], freq="M")
>>> arr, idx = idx1.factorize()
>>> arr
array([0, 0, 1, 1, 2, 2])
>>> idx
PeriodIndex(['2014-01', '2014-02', '2014-03'], dtype='period[M]')
Parameters:

use_na_sentinel (bool)

Return type:

tuple[ndarray, ExtensionArray]

fillna(value=None, method=None, limit=None, copy=True)

Fill NA/NaN values using the specified method.

Parameters

valuescalar, array-like

If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like “value” can be given. It’s expected that the array-like have the same length as ‘self’.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series:

  • pad / ffill: propagate last valid observation forward to next valid.

  • backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.0.

limitint, default None

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.

Deprecated since version 2.1.0.

copybool, default True

Whether to make a copy of the data before filling. If False, then the original should be modified and no new memory should be allocated. For ExtensionArray subclasses that cannot do this, it is at the author’s discretion whether to ignore “copy=False” or to raise. The base class implementation ignores the keyword in pad/backfill cases.

Returns

ExtensionArray

With NA/NaN filled.

Examples

>>> arr = pd.array([np.nan, np.nan, 2, 3, np.nan, np.nan])
>>> arr.fillna(0)
<IntegerArray>
[0, 0, 2, 3, 0, 0]
Length: 6, dtype: Int64
Parameters:
  • value (object | ArrayLike | None)

  • method (FillnaOptions | None)

  • limit (int | None)

  • copy (bool)

Return type:

Self

insert(loc, item)

Insert an item at the given position.

Parameters

loc : int item : scalar-like

Returns

same type as self

Notes

This method should be both type and dtype-preserving. If the item cannot be held in an array of this type/dtype, either ValueError or TypeError should be raised.

The default implementation relies on _from_sequence to raise on invalid items.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.insert(2, -1)
<IntegerArray>
[1, 2, -1, 3]
Length: 4, dtype: Int64
Parameters:

loc (int)

Return type:

Self

interpolate(*, method, axis, index, limit, limit_direction, limit_area, copy, **kwargs)

See DataFrame.interpolate.__doc__.

Examples

>>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3]))
>>> arr.interpolate(method="linear",
...                 limit=3,
...                 limit_direction="forward",
...                 index=pd.Index([1, 2, 3, 4]),
...                 fill_value=1,
...                 copy=False,
...                 axis=0,
...                 limit_area="inside"
...                 )
<NumpyExtensionArray>
[0.0, 1.0, 2.0, 3.0]
Length: 4, dtype: float64
Parameters:
  • method (InterpolateOptions)

  • axis (int)

  • index (Index)

  • copy (bool)

Return type:

Self

isin(values)

Pointwise comparison for set containment in the given values.

Roughly equivalent to np.array([x in values for x in self])

Parameters

values : np.ndarray or ExtensionArray

Returns

np.ndarray[bool]

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.isin([1])
<BooleanArray>
[True, False, False]
Length: 3, dtype: boolean
Parameters:

values (ArrayLike)

Return type:

npt.NDArray[np.bool_]

isna()

A 1-D array indicating if each value is missing.

Returns

numpy.ndarray or pandas.api.extensions.ExtensionArray

In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

Examples

>>> arr = pd.array([1, 2, np.nan, np.nan])
>>> arr.isna()
array([False, False,  True,  True])
map(mapper, na_action=None)

Map values using an input mapping or function.

Parameters

mapperfunction, dict, or Series

Mapping correspondence.

na_action{None, ‘ignore’}, default None

If ‘ignore’, propagate NA values, without passing them to the mapping correspondence. If ‘ignore’ is not supported, a NotImplementedError should be raised.

Returns

Union[ndarray, Index, ExtensionArray]

The output of the mapping function applied to the array. If the function returns a tuple with more than one element a MultiIndex will be returned.

nbytes()

The number of bytes needed to store this object in memory.

Examples

>>> pd.array([1, 2, 3]).nbytes
27
property ndim: int

Extension Arrays are only allowed to be 1-dimensional.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.ndim
1
ravel(order='C')

Return a flattened view on this array.

Parameters

order : {None, ‘C’, ‘F’, ‘A’, ‘K’}, default ‘C’

Returns

ExtensionArray

Notes

  • Because ExtensionArrays are 1D-only, this is a no-op.

  • The “order” argument is ignored, is for compatibility with NumPy.

Examples

>>> pd.array([1, 2, 3]).ravel()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

order (Literal['C', 'F', 'A', 'K'] | None)

Return type:

ExtensionArray

repeat(repeats, axis=None)

Repeat elements of a ExtensionArray.

Returns a new ExtensionArray where each element of the current ExtensionArray is repeated consecutively a given number of times.

Parameters

repeatsint or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty ExtensionArray.

axisNone

Must be None. Has no effect but is accepted for compatibility with numpy.

Returns

ExtensionArray

Newly created ExtensionArray with repeated elements.

See Also

Series.repeat : Equivalent function for Series. Index.repeat : Equivalent function for Index. numpy.repeat : Similar method for numpy.ndarray. ExtensionArray.take : Take arbitrary positions.

Examples

>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat(2)
['a', 'a', 'b', 'b', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat([1, 2, 3])
['a', 'b', 'b', 'c', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
Parameters:
  • repeats (int | Sequence[int])

  • axis (AxisInt | None)

Return type:

Self

searchsorted(value, side='left', sorter=None)

Find indices where elements should be inserted to maintain order.

Find the indices into a sorted array self (a) such that, if the corresponding elements in value were inserted before the indices, the order of self would be preserved.

Assuming that self is sorted:

side

returned index i satisfies

left

self[i-1] < value <= self[i]

right

self[i-1] <= value < self[i]

Parameters

valuearray-like, list or scalar

Value(s) to insert into self.

side{‘left’, ‘right’}, optional

If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of self).

sorter1-D array-like, optional

Optional array of integer indices that sort array a into ascending order. They are typically the result of argsort.

Returns

array of ints or int

If value is array-like, array of insertion points. If value is scalar, a single integer.

See Also

numpy.searchsorted : Similar method from NumPy.

Examples

>>> arr = pd.array([1, 2, 3, 5])
>>> arr.searchsorted([4])
array([3])
Parameters:
  • value (NumpyValueArrayLike | ExtensionArray)

  • side (Literal['left', 'right'])

  • sorter (NumpySorter | None)

Return type:

npt.NDArray[np.intp] | np.intp

property shape: Shape

Return a tuple of the array dimensions.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shape
(3,)
shift(periods=1, fill_value=None)

Shift values by desired number.

Newly introduced missing values are filled with self.dtype.na_value.

Parameters

periodsint, default 1

The number of periods to shift. Negative values are allowed for shifting backwards.

fill_valueobject, optional

The scalar value to use for newly introduced missing values. The default is self.dtype.na_value.

Returns

ExtensionArray

Shifted.

Notes

If self is empty or periods is 0, a copy of self is returned.

If periods > len(self), then an array of size len(self) is returned, with all values filled with self.dtype.na_value.

For 2-dimensional ExtensionArrays, we are always shifting along axis=0.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shift(2)
<IntegerArray>
[<NA>, <NA>, 1]
Length: 3, dtype: Int64
Parameters:
  • periods (int)

  • fill_value (object)

Return type:

ExtensionArray

property size: int

The number of elements in the array.

take(indexer, allow_fill=False, fill_value=None)

Take elements from an array.

Parameters

indicessequence of int or one-dimensional np.ndarray of int

Indices to be taken.

allow_fillbool, default False

How to handle negative values in indices.

  • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

  • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

fill_valueany, optional

Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.

Returns

ExtensionArray

Raises

IndexError

When the indices are out of bounds for the array.

ValueError

When indices contains negative values other than -1 and allow_fill is True.

See Also

numpy.take : Take elements from an array along an axis. api.extensions.take : Take elements from an array.

Notes

ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.

Examples

Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
                  allow_fill=allow_fill)
    return self._from_sequence(result, dtype=self.dtype)
to_numpy(dtype=None, copy=False, na_value=<no_default>)

Convert to a NumPy ndarray.

This is similar to numpy.asarray(), but may provide additional control over how the conversion is done.

Parameters

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default False

Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_valueAny, optional

The value to use for missing values. The default value depends on dtype and the type of the array.

Returns

numpy.ndarray

Parameters:
  • dtype (npt.DTypeLike | None)

  • copy (bool)

  • na_value (object)

Return type:

np.ndarray

tolist()

Return a list of the values.

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Returns

list

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.tolist()
[1, 2, 3]
Return type:

list

transpose(*axes)

Return a transposed view on this array.

Because ExtensionArrays are always 1D, this is a no-op. It is included for compatibility with np.ndarray.

Returns

ExtensionArray

Examples

>>> pd.array([1, 2, 3]).transpose()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

axes (int)

Return type:

ExtensionArray

unique()

Compute the ExtensionArray of unique values.

Returns

pandas.api.extensions.ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3, 1, 2, 3])
>>> arr.unique()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Return type:

Self

view(dtype=None)

Return a view on the array.

Parameters

dtypestr, np.dtype, or ExtensionDtype, optional

Default None.

Returns

ExtensionArray or np.ndarray

A view on the ExtensionArray’s data.

Examples

This gives view on the underlying data of an ExtensionArray and is not a copy. Modifications on either the view or the original ExtensionArray will be reflectd on the underlying data:

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.view()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[2, 2, 3]
Length: 3, dtype: Int64
Parameters:

dtype (Dtype | None)

Return type:

ArrayLike

class pymongoarrow.pandas_types.PandasObjectId

A pandas extension type for BSON ObjectId data type.

classmethod construct_array_type()

Return the array type associated with this dtype.

Returns

type

Return type:

type[PandasObjectIdArray]

classmethod construct_from_string(string)

Construct this type from a string.

This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as period[h] (where H means hourly frequency).

By default, in the abstract class, just the name of the type is expected. But subclasses can overwrite this method to accept parameters.

Parameters

stringstr

The name of the type, for example category.

Returns

ExtensionDtype

Instance of the dtype.

Raises

TypeError

If a class cannot be constructed from this ‘string’.

Examples

For extension dtypes with arguments the following may be an adequate implementation.

>>> import re
>>> @classmethod
... def construct_from_string(cls, string):
...     pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
...     match = pattern.match(string)
...     if match:
...         return cls(**match.groupdict())
...     else:
...         raise TypeError(
...             f"Cannot construct a '{cls.__name__}' from '{string}'"
...         )
empty(shape)

Construct an ExtensionArray of this dtype with the given shape.

Analogous to numpy.empty.

Parameters

shape : int or tuple[int]

Returns

ExtensionArray

Parameters:

shape (Shape)

Return type:

ExtensionArray

index_class

The Index subclass to return from Index.__new__ when this dtype is encountered.

classmethod is_dtype(dtype)

Check if we match ‘dtype’.

Parameters

dtypeobject

The object to check.

Returns

bool

Notes

The default implementation is True if

  1. cls.construct_from_string(dtype) is an instance of cls.

  2. dtype is an object and is an instance of cls

  3. dtype has a dtype attribute, and any of the above conditions is true for dtype.dtype.

Parameters:

dtype (object)

Return type:

bool

property kind: str

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See Also

numpy.dtype.kind

property name: str

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

property names: list[str] | None

Ordered list of field names, or None if there are no fields.

This is for compatibility with NumPy arrays, and may be removed in the future.

type

alias of ObjectId

class pymongoarrow.pandas_types.PandasObjectIdArray(values, dtype, copy=False)

A pandas extension type for BSON Binary data arrays.

argmax(skipna=True)

Return the index of maximum value.

In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmin : Return the index of the minimum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmax()
3
Parameters:

skipna (bool)

Return type:

int

argmin(skipna=True)

Return the index of minimum value.

In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned.

Parameters

skipna : bool, default True

Returns

int

See Also

ExtensionArray.argmax : Return the index of the maximum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmin()
1
Parameters:

skipna (bool)

Return type:

int

argsort(*, ascending=True, kind='quicksort', na_position='last', **kwargs)

Return the indices that would sort this array.

Parameters

ascendingbool, default True

Whether the indices should result in an ascending or descending sort.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional

Sorting algorithm.

na_position{‘first’, ‘last’}, default ‘last’

If 'first', put NaN values at the beginning. If 'last', put NaN values at the end.

*args, **kwargs:

Passed through to numpy.argsort().

Returns

np.ndarray[np.intp]

Array of indices that sort self. If NaN values are contained, NaN values are placed at the end.

See Also

numpy.argsort : Sorting implementation used internally.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argsort()
array([1, 2, 0, 4, 3])
Parameters:
  • ascending (bool)

  • kind (SortKind)

  • na_position (str)

Return type:

np.ndarray

astype(dtype, copy=True)

Cast to a NumPy array or ExtensionArray with ‘dtype’.

Parameters

dtypestr or dtype

Typecode or data-type to which the array is cast.

copybool, default True

Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.

Returns

np.ndarray or pandas.api.extensions.ExtensionArray

An ExtensionArray if dtype is ExtensionDtype, otherwise a Numpy ndarray with dtype for its dtype.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

Casting to another ExtensionDtype returns an ExtensionArray:

>>> arr1 = arr.astype('Float64')
>>> arr1
<FloatingArray>
[1.0, 2.0, 3.0]
Length: 3, dtype: Float64
>>> arr1.dtype
Float64Dtype()

Otherwise, we will get a Numpy ndarray:

>>> arr2 = arr.astype('float64')
>>> arr2
array([1., 2., 3.])
>>> arr2.dtype
dtype('float64')
Parameters:
  • dtype (AstypeArg)

  • copy (bool)

Return type:

ArrayLike

copy()

Return a copy of the array.

Returns

ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.copy()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
dropna()

Return ExtensionArray without NA values.

Returns

Examples

>>> pd.array([1, 2, np.nan]).dropna()
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64
Return type:

Self

property dtype

An instance of ExtensionDtype.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
duplicated(keep='first')

Return boolean ndarray denoting duplicate values.

Parameters

keep{‘first’, ‘last’, False}, default ‘first’
  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns

ndarray[bool]

Examples

>>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated()
array([False,  True, False, False,  True])
Parameters:

keep (Literal['first', 'last', False])

Return type:

npt.NDArray[np.bool_]

equals(other)

Return if another array is equivalent to this array.

Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).

Parameters

otherExtensionArray

Array to compare to this Array.

Returns

boolean

Whether the arrays are equivalent.

Examples

>>> arr1 = pd.array([1, 2, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
True
Parameters:

other (object)

Return type:

bool

factorize(use_na_sentinel=True)

Encode the extension array as an enumerated type.

Parameters

use_na_sentinelbool, default True

If True, the sentinel -1 will be used for NaN values. If False, NaN values will be encoded as non-negative integers and will not drop the NaN from the uniques of the values.

Added in version 1.5.0.

Returns

codesndarray

An integer NumPy array that’s an indexer into the original ExtensionArray.

uniquesExtensionArray

An ExtensionArray containing the unique values of self.

Note

uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.

See Also

factorize : Top-level factorize method that dispatches here.

Notes

pandas.factorize() offers a sort keyword as well.

Examples

>>> idx1 = pd.PeriodIndex(["2014-01", "2014-01", "2014-02", "2014-02",
...                       "2014-03", "2014-03"], freq="M")
>>> arr, idx = idx1.factorize()
>>> arr
array([0, 0, 1, 1, 2, 2])
>>> idx
PeriodIndex(['2014-01', '2014-02', '2014-03'], dtype='period[M]')
Parameters:

use_na_sentinel (bool)

Return type:

tuple[ndarray, ExtensionArray]

fillna(value=None, method=None, limit=None, copy=True)

Fill NA/NaN values using the specified method.

Parameters

valuescalar, array-like

If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like “value” can be given. It’s expected that the array-like have the same length as ‘self’.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series:

  • pad / ffill: propagate last valid observation forward to next valid.

  • backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.0.

limitint, default None

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.

Deprecated since version 2.1.0.

copybool, default True

Whether to make a copy of the data before filling. If False, then the original should be modified and no new memory should be allocated. For ExtensionArray subclasses that cannot do this, it is at the author’s discretion whether to ignore “copy=False” or to raise. The base class implementation ignores the keyword in pad/backfill cases.

Returns

ExtensionArray

With NA/NaN filled.

Examples

>>> arr = pd.array([np.nan, np.nan, 2, 3, np.nan, np.nan])
>>> arr.fillna(0)
<IntegerArray>
[0, 0, 2, 3, 0, 0]
Length: 6, dtype: Int64
Parameters:
  • value (object | ArrayLike | None)

  • method (FillnaOptions | None)

  • limit (int | None)

  • copy (bool)

Return type:

Self

insert(loc, item)

Insert an item at the given position.

Parameters

loc : int item : scalar-like

Returns

same type as self

Notes

This method should be both type and dtype-preserving. If the item cannot be held in an array of this type/dtype, either ValueError or TypeError should be raised.

The default implementation relies on _from_sequence to raise on invalid items.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.insert(2, -1)
<IntegerArray>
[1, 2, -1, 3]
Length: 4, dtype: Int64
Parameters:

loc (int)

Return type:

Self

interpolate(*, method, axis, index, limit, limit_direction, limit_area, copy, **kwargs)

See DataFrame.interpolate.__doc__.

Examples

>>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3]))
>>> arr.interpolate(method="linear",
...                 limit=3,
...                 limit_direction="forward",
...                 index=pd.Index([1, 2, 3, 4]),
...                 fill_value=1,
...                 copy=False,
...                 axis=0,
...                 limit_area="inside"
...                 )
<NumpyExtensionArray>
[0.0, 1.0, 2.0, 3.0]
Length: 4, dtype: float64
Parameters:
  • method (InterpolateOptions)

  • axis (int)

  • index (Index)

  • copy (bool)

Return type:

Self

isin(values)

Pointwise comparison for set containment in the given values.

Roughly equivalent to np.array([x in values for x in self])

Parameters

values : np.ndarray or ExtensionArray

Returns

np.ndarray[bool]

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.isin([1])
<BooleanArray>
[True, False, False]
Length: 3, dtype: boolean
Parameters:

values (ArrayLike)

Return type:

npt.NDArray[np.bool_]

isna()

A 1-D array indicating if each value is missing.

Returns

numpy.ndarray or pandas.api.extensions.ExtensionArray

In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

Examples

>>> arr = pd.array([1, 2, np.nan, np.nan])
>>> arr.isna()
array([False, False,  True,  True])
map(mapper, na_action=None)

Map values using an input mapping or function.

Parameters

mapperfunction, dict, or Series

Mapping correspondence.

na_action{None, ‘ignore’}, default None

If ‘ignore’, propagate NA values, without passing them to the mapping correspondence. If ‘ignore’ is not supported, a NotImplementedError should be raised.

Returns

Union[ndarray, Index, ExtensionArray]

The output of the mapping function applied to the array. If the function returns a tuple with more than one element a MultiIndex will be returned.

nbytes()

The number of bytes needed to store this object in memory.

Examples

>>> pd.array([1, 2, 3]).nbytes
27
property ndim: int

Extension Arrays are only allowed to be 1-dimensional.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.ndim
1
ravel(order='C')

Return a flattened view on this array.

Parameters

order : {None, ‘C’, ‘F’, ‘A’, ‘K’}, default ‘C’

Returns

ExtensionArray

Notes

  • Because ExtensionArrays are 1D-only, this is a no-op.

  • The “order” argument is ignored, is for compatibility with NumPy.

Examples

>>> pd.array([1, 2, 3]).ravel()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

order (Literal['C', 'F', 'A', 'K'] | None)

Return type:

ExtensionArray

repeat(repeats, axis=None)

Repeat elements of a ExtensionArray.

Returns a new ExtensionArray where each element of the current ExtensionArray is repeated consecutively a given number of times.

Parameters

repeatsint or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty ExtensionArray.

axisNone

Must be None. Has no effect but is accepted for compatibility with numpy.

Returns

ExtensionArray

Newly created ExtensionArray with repeated elements.

See Also

Series.repeat : Equivalent function for Series. Index.repeat : Equivalent function for Index. numpy.repeat : Similar method for numpy.ndarray. ExtensionArray.take : Take arbitrary positions.

Examples

>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat(2)
['a', 'a', 'b', 'b', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.repeat([1, 2, 3])
['a', 'b', 'b', 'c', 'c', 'c']
Categories (3, object): ['a', 'b', 'c']
Parameters:
  • repeats (int | Sequence[int])

  • axis (AxisInt | None)

Return type:

Self

searchsorted(value, side='left', sorter=None)

Find indices where elements should be inserted to maintain order.

Find the indices into a sorted array self (a) such that, if the corresponding elements in value were inserted before the indices, the order of self would be preserved.

Assuming that self is sorted:

side

returned index i satisfies

left

self[i-1] < value <= self[i]

right

self[i-1] <= value < self[i]

Parameters

valuearray-like, list or scalar

Value(s) to insert into self.

side{‘left’, ‘right’}, optional

If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of self).

sorter1-D array-like, optional

Optional array of integer indices that sort array a into ascending order. They are typically the result of argsort.

Returns

array of ints or int

If value is array-like, array of insertion points. If value is scalar, a single integer.

See Also

numpy.searchsorted : Similar method from NumPy.

Examples

>>> arr = pd.array([1, 2, 3, 5])
>>> arr.searchsorted([4])
array([3])
Parameters:
  • value (NumpyValueArrayLike | ExtensionArray)

  • side (Literal['left', 'right'])

  • sorter (NumpySorter | None)

Return type:

npt.NDArray[np.intp] | np.intp

property shape: Shape

Return a tuple of the array dimensions.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shape
(3,)
shift(periods=1, fill_value=None)

Shift values by desired number.

Newly introduced missing values are filled with self.dtype.na_value.

Parameters

periodsint, default 1

The number of periods to shift. Negative values are allowed for shifting backwards.

fill_valueobject, optional

The scalar value to use for newly introduced missing values. The default is self.dtype.na_value.

Returns

ExtensionArray

Shifted.

Notes

If self is empty or periods is 0, a copy of self is returned.

If periods > len(self), then an array of size len(self) is returned, with all values filled with self.dtype.na_value.

For 2-dimensional ExtensionArrays, we are always shifting along axis=0.

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.shift(2)
<IntegerArray>
[<NA>, <NA>, 1]
Length: 3, dtype: Int64
Parameters:
  • periods (int)

  • fill_value (object)

Return type:

ExtensionArray

property size: int

The number of elements in the array.

take(indexer, allow_fill=False, fill_value=None)

Take elements from an array.

Parameters

indicessequence of int or one-dimensional np.ndarray of int

Indices to be taken.

allow_fillbool, default False

How to handle negative values in indices.

  • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

  • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

fill_valueany, optional

Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.

Returns

ExtensionArray

Raises

IndexError

When the indices are out of bounds for the array.

ValueError

When indices contains negative values other than -1 and allow_fill is True.

See Also

numpy.take : Take elements from an array along an axis. api.extensions.take : Take elements from an array.

Notes

ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.

Examples

Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
                  allow_fill=allow_fill)
    return self._from_sequence(result, dtype=self.dtype)
to_numpy(dtype=None, copy=False, na_value=<no_default>)

Convert to a NumPy ndarray.

This is similar to numpy.asarray(), but may provide additional control over how the conversion is done.

Parameters

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default False

Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_valueAny, optional

The value to use for missing values. The default value depends on dtype and the type of the array.

Returns

numpy.ndarray

Parameters:
  • dtype (npt.DTypeLike | None)

  • copy (bool)

  • na_value (object)

Return type:

np.ndarray

tolist()

Return a list of the values.

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Returns

list

Examples

>>> arr = pd.array([1, 2, 3])
>>> arr.tolist()
[1, 2, 3]
Return type:

list

transpose(*axes)

Return a transposed view on this array.

Because ExtensionArrays are always 1D, this is a no-op. It is included for compatibility with np.ndarray.

Returns

ExtensionArray

Examples

>>> pd.array([1, 2, 3]).transpose()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Parameters:

axes (int)

Return type:

ExtensionArray

unique()

Compute the ExtensionArray of unique values.

Returns

pandas.api.extensions.ExtensionArray

Examples

>>> arr = pd.array([1, 2, 3, 1, 2, 3])
>>> arr.unique()
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64
Return type:

Self

view(dtype=None)

Return a view on the array.

Parameters

dtypestr, np.dtype, or ExtensionDtype, optional

Default None.

Returns

ExtensionArray or np.ndarray

A view on the ExtensionArray’s data.

Examples

This gives view on the underlying data of an ExtensionArray and is not a copy. Modifications on either the view or the original ExtensionArray will be reflectd on the underlying data:

>>> arr = pd.array([1, 2, 3])
>>> arr2 = arr.view()
>>> arr[0] = 2
>>> arr2
<IntegerArray>
[2, 2, 3]
Length: 3, dtype: Int64
Parameters:

dtype (Dtype | None)

Return type:

ArrayLike