Skip to main content
Version: 1.7

apify-sdk-python

Index

Async Resource Clients

Classes

Methods

Properties

Constants

Scrapy integration

Storages

Storage data

Event managers

Events

Event data

Storage clients

Request loaders

Async Resource Clients

run_func_at_interval_async

  • async run_func_at_interval_async(func, interval_secs): None
  • Parameters

    • func: Callable
    • interval_secs: float

    Returns None

Methods

__delitem__

  • __delitem__(key): None
  • Remove an item from the cache.


    Parameters

    • key: str

    Returns None

__get__

  • Call the getter with the right object.


    Parameters

    • obj: DualPropertyOwner | None

      The instance of class T on which the getter will be called

    • owner: type[DualPropertyOwner]

      The class object of class T on which the getter will be called, if obj is None

    Returns DualPropertyType

    The result of the getter.

__getattr__

  • __getattr__(name): Any
  • Parameters

    • name: str

    Returns Any

__getitem__

  • __getitem__(key): T
  • Get an item from the cache. Move it to the end if present.


    Parameters

    • key: str

    Returns T

__init__

  • __init__(config): None
  • Create an instance of the EventManager.


    Parameters

    • config: Configuration

      The actor configuration to be used in this event manager.

    Returns None

__init__

  • __init__(getter): None
  • Initialize the dualproperty.


    Parameters

    • getter: Callable[..., DualPropertyType]

      The getter of the property. It should accept either an instance or a class as its first argument.

    Returns None

__init__

  • __init__(max_length): None
  • Create a LRUCache with a specific max_length.


    Parameters

    • max_length: int

    Returns None

__init__

  • __init__(*, local_data_directory, write_metadata, persist_storage): None
  • Initialize the MemoryStorageClient.


    Parameters

    • optionalkeyword-onlylocal_data_directory: str | None = None

      A local directory where all data will be persisted

    • optionalkeyword-onlywrite_metadata: bool | None = None

      Whether to persist metadata of the storages as well

    • optionalkeyword-onlypersist_storage: bool | None = None

      Whether to persist the data to the local_data_directory or just keep them in memory

    Returns None

__init__

  • __init__(*, base_storage_directory, memory_storage_client): None
  • Initialize the DatasetCollectionClient with the passed arguments.


    Parameters

    • keyword-onlybase_storage_directory: str
    • keyword-onlymemory_storage_client: MemoryStorageClient

    Returns None

__init__

  • __init__(*, base_storage_directory, memory_storage_client, id, name): None
  • Initialize the DatasetClient.


    Parameters

    • keyword-onlybase_storage_directory: str
    • keyword-onlymemory_storage_client: MemoryStorageClient
    • optionalkeyword-onlyid: str | None = None
    • optionalkeyword-onlyname: str | None = None

    Returns None

__init__

  • __init__(*, base_storage_directory, memory_storage_client, id, name): None
  • Initialize the KeyValueStoreClient.


    Parameters

    • keyword-onlybase_storage_directory: str
    • keyword-onlymemory_storage_client: MemoryStorageClient
    • optionalkeyword-onlyid: str | None = None
    • optionalkeyword-onlyname: str | None = None

    Returns None

__init__

  • __init__(*, base_storage_directory, memory_storage_client, id, name): None
  • Initialize the RequestQueueClient.


    Parameters

    • keyword-onlybase_storage_directory: str
    • keyword-onlymemory_storage_client: MemoryStorageClient
    • optionalkeyword-onlyid: str | None = None
    • optionalkeyword-onlyname: str | None = None

    Returns None

__init__

  • __init__(*, base_storage_directory, memory_storage_client, id, name): None
  • Initialize the BaseResourceClient.


    Parameters

    • keyword-onlybase_storage_directory: str
    • keyword-onlymemory_storage_client: MemoryStorageClient
    • optionalkeyword-onlyid: str | None = None
    • optionalkeyword-onlyname: str | None = None

    Returns None

__init__

  • __init__(): None
  • Create a StorageClientManager instance.


    Returns None

__init__

  • __init__(id, name, client, config): None
  • Initialize the storage.

    Do not use this method directly, but use Actor.open_<STORAGE>() instead.


    Parameters

    • id: str

      The storage id

    • name: str | None

      The storage name

    • client: ApifyClientAsync | MemoryStorageClient

      The storage client

    • config: Configuration

      The configuration

    Returns None

__iter__

  • __iter__(): Iterator[str]
  • Iterate over the keys of the cache in order of insertion.


    Returns Iterator[str]

__len__

  • __len__(): int
  • Get the number of items in the cache.


    Returns int

__setitem__

  • __setitem__(key, value): None
  • Add an item to the cache. Remove least used item if max_length exceeded.


    Parameters

    • key: str
    • value: T

    Returns None

add_request

  • async add_request(request, *, forefront): dict
  • Add a request to the queue.


    Parameters

    • request: dict

      The request to add to the queue

    • optionalkeyword-onlyforefront: bool | None = None

      Whether to add the request to the head or the end of the queue

    Returns dict

    dict: The added request.

budget_ow

  • budget_ow(value, predicate, value_name): None
  • Budget version of ow.


    Parameters

    • value: ((dict | str) | float) | bool
    • predicate: dict[str, tuple[type, bool]] | tuple[type, bool]
    • optionalvalue_name: str | None = None

    Returns None

close

  • async close(event_listeners_timeout_secs): None
  • Initialize the event manager.

    This will stop listening for the platform events, and it will wait for all the event listeners to finish.


    Parameters

    • optionalevent_listeners_timeout_secs: float | None = None

      Optional timeout after which the pending event listeners are canceled.

    Returns None

compute_short_hash

  • compute_short_hash(data, *, length): str
  • Computes a hexadecimal SHA-256 hash of the provided data and returns a substring (prefix) of it.


    Parameters

    • data: bytes

      The binary data to be hashed.

    • optionalkeyword-onlylength: int = 8

      The length of the hash to be returned.

    Returns str

    A substring (prefix) of the hexadecimal hash of the data.

compute_unique_key

  • compute_unique_key(url, method, payload, *, keep_url_fragment, use_extended_unique_key): str
  • Computes a unique key for caching & deduplication of requests.

    This function computes a unique key by normalizing the provided URL and method. If 'use_extended_unique_key' is True and a payload is provided, the payload is hashed and included in the key. Otherwise, the unique key is just the normalized URL.


    Parameters

    • url: str

      The request URL.

    • optionalmethod: str = 'GET'

      The HTTP method, defaults to 'GET'.

    • optionalpayload: bytes | None = None

      The request payload, defaults to None.

    • optionalkeyword-onlykeep_url_fragment: bool = False

      A flag indicating whether to keep the URL fragment, defaults to False.

    • optionalkeyword-onlyuse_extended_unique_key: bool = False

      A flag indicating whether to include a hashed payload in the key, defaults to False.

    Returns str

    A string representing the unique key for the request.

crypto_random_object_id

  • crypto_random_object_id(length): str
  • Python reimplementation of cryptoRandomObjectId from @apify/utilities.


    Parameters

    • optionallength: int = 17

    Returns str

dataset

  • dataset(dataset_id): DatasetClient
  • Retrieve the sub-client for manipulating a single dataset.


    Parameters

    • dataset_id: str

      ID of the dataset to be manipulated

    Returns DatasetClient

datasets

  • datasets(): DatasetCollectionClient
  • Retrieve the sub-client for manipulating datasets.


    Returns DatasetCollectionClient

decrypt_input_secrets

  • decrypt_input_secrets(private_key, input): Any
  • Decrypt input secrets.


    Parameters

    • private_key: rsa.RSAPrivateKey
    • input: Any

    Returns Any

delete

  • async delete(): None
  • Delete the dataset.


    Returns None

delete

  • async delete(): None
  • Delete the key-value store.


    Returns None

delete

  • async delete(): None
  • Delete the request queue.


    Returns None

delete_record

  • async delete_record(key): None
  • Delete the specified record from the key-value store.


    Parameters

    • key: str

      The key of the record which to delete

    Returns None

delete_request

  • async delete_request(*, request_id, entity_directory): None
  • Parameters

    • keyword-onlyrequest_id: str
    • keyword-onlyentity_directory: str

    Returns None

delete_request

  • async delete_request(request_id): None
  • Delete a request from the queue.


    Parameters

    • request_id: str

      ID of the request to delete.

    Returns None

emit

  • emit(event_name, data): None
  • Emit an actor event manually.


    Parameters

    • event_name: ActorEventTypes

      The actor event which should be emitted.

    • data: Any

      The data that should be emitted with the event.

    Returns None

fetch_and_parse_env_var

  • fetch_and_parse_env_var(env_var, default): Any
  • Parameters

    • env_var: Any
    • optionaldefault: Any = None

    Returns Any

force_remove

  • async force_remove(filename): None
  • JS-like rm(filename, { force: true }).


    Parameters

    • filename: str

    Returns None

force_rename

  • async force_rename(src_dir, dst_dir): None
  • Rename a directory. Checks for existence of source directory and removes destination directory if it exists.


    Parameters

    • src_dir: str
    • dst_dir: str

    Returns None

get

  • async get(): dict | None
  • Retrieve the dataset.


    Returns dict | None

    dict, optional: The retrieved dataset, or None, if it does not exist

get

  • async get(): dict | None
  • Retrieve the key-value store.


    Returns dict | None

    dict, optional: The retrieved key-value store, or None if it does not exist

get

  • async get(): dict | None
  • Retrieve the request queue.


    Returns dict | None

    dict, optional: The retrieved request queue, or None, if it does not exist

get

  • async get(): dict | None
  • Retrieve the storage.


    Returns dict | None

    dict, optional: The retrieved storage, or None, if it does not exist

get_basic_auth_header

  • get_basic_auth_header(username, password, auth_encoding): bytes
  • Generate a basic authentication header for the given username and password.


    Parameters

    • username: str
    • password: str
    • optionalauth_encoding: str = 'latin-1'

    Returns bytes

get_cpu_usage_percent

  • get_cpu_usage_percent(): float
  • Returns float

get_items_as_bytes

  • async get_items_as_bytes(_args, _kwargs): bytes
  • Parameters

    • _args: Any
    • _kwargs: Any

    Returns bytes

get_memory_usage_bytes

  • get_memory_usage_bytes(): int
  • Returns int

get_or_create

  • async get_or_create(*, name, schema, _id): dict
  • Retrieve a named key-value store, or create a new one when it doesn't exist.


    Parameters

    • optionalkeyword-onlyname: str | None = None

      The name of the key-value store to retrieve or create.

    • optionalkeyword-onlyschema: dict | None = None

      The schema of the key-value store

    • optionalkeyword-only_id: str | None = None

    Returns dict

    dict: The retrieved or newly-created key-value store.

get_or_create

  • async get_or_create(*, name, schema, _id): dict
  • Retrieve a named storage, or create a new one when it doesn't exist.


    Parameters

    • optionalkeyword-onlyname: str | None = None

      The name of the storage to retrieve or create.

    • optionalkeyword-onlyschema: dict | None = None

      The schema of the storage

    • optionalkeyword-only_id: str | None = None

    Returns dict

    dict: The retrieved or newly-created storage.

get_or_create

  • async get_or_create(*, name, schema, _id): dict
  • Retrieve a named request queue, or create a new one when it doesn't exist.


    Parameters

    • optionalkeyword-onlyname: str | None = None

      The name of the request queue to retrieve or create.

    • optionalkeyword-onlyschema: dict | None = None

      The schema of the request queue

    • optionalkeyword-only_id: str | None = None

    Returns dict

    dict: The retrieved or newly-created request queue.

get_or_create

  • async get_or_create(*, name, schema, _id): dict
  • Retrieve a named dataset, or create a new one when it doesn't exist.


    Parameters

    • optionalkeyword-onlyname: str | None = None

      The name of the dataset to retrieve or create.

    • optionalkeyword-onlyschema: dict | None = None

      The schema of the dataset

    • optionalkeyword-only_id: str | None = None

    Returns dict

    dict: The retrieved or newly-created dataset.

get_record

  • async get_record(key): dict | None
  • Retrieve the given record from the key-value store.


    Parameters

    • key: str

      Key of the record to retrieve

    Returns dict | None

    dict, optional: The requested record, or None, if the record does not exist

get_record_as_bytes

  • async get_record_as_bytes(key): dict | None
  • Retrieve the given record from the key-value store, without parsing it.


    Parameters

    • key: str

      Key of the record to retrieve

    Returns dict | None

    dict, optional: The requested record, or None, if the record does not exist

get_request

  • async get_request(request_id): dict | None
  • Retrieve a request from the queue.


    Parameters

    • request_id: str

      ID of the request to retrieve

    Returns dict | None

    dict, optional: The retrieved request, or None, if it did not exist.

get_running_event_loop_id

  • get_running_event_loop_id(): int
  • Get the ID of the currently running event loop.

    It could be useful mainly for debugging purposes.


    Returns int

    The ID of the event loop.

get_storage_client

  • get_storage_client(force_cloud): ApifyClientAsync | MemoryStorageClient
  • Get the current storage client instance.


    Parameters

    • optionalforce_cloud: bool = False

    Returns ApifyClientAsync | MemoryStorageClient

    ApifyClientAsync or MemoryStorageClient: The current storage client instance.

get_system_info

  • get_system_info(): dict
  • Returns dict

guess_file_extension

  • guess_file_extension(content_type): str | None
  • Guess the file extension based on content type.


    Parameters

    • content_type: str

    Returns str | None

init

  • async init(): None
  • Initialize the event manager.

    When running this on the Apify Platform, this will start processing events send by the platform to the events websocket and emitting them as events that can be listened to by the Actor.on() method.


    Returns None

is_running_in_ipython

  • is_running_in_ipython(): bool
  • Returns bool

is_url

  • is_url(url): bool
  • Check if the given string is a valid URL.


    Parameters

    • url: str

    Returns bool

items

  • items(): ItemsView[str, T]
  • Iterate over the pairs of (key, value) in the cache in order of insertion.


    Returns ItemsView[str, T]

iterate_items

  • async iterate_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden): AsyncIterator[dict]
  • Iterate over the items in the dataset.


    Parameters

    • optionalkeyword-onlyoffset: int = 0

      Number of items that should be skipped at the start. The default value is 0

    • optionalkeyword-onlylimit: int | None = None

      Maximum number of items to return. By default there is no limit.

    • optionalkeyword-onlyclean: bool | None = None

      If True, returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The clean parameter is just a shortcut for skip_hidden=True and skip_empty=True parameters. Note that since some objects might be skipped from the output, that the result might contain less items than the limit value.

    • optionalkeyword-onlydesc: bool | None = None

      By default, results are returned in the same order as they were stored. To reverse the order, set this parameter to True.

    • optionalkeyword-onlyfields: list[str] | None = None

      A list of fields which should be picked from the items, only these fields will remain in the resulting record objects. Note that the fields in the outputted items are sorted the same way as they are specified in the fields parameter. You can use this feature to effectively fix the output format.

    • optionalkeyword-onlyomit: list[str] | None = None

      A list of fields which should be omitted from the items.

    • optionalkeyword-onlyunwind: str | None = None

      Name of a field which should be unwound. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object, then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.

    • optionalkeyword-onlyskip_empty: bool | None = None

      If True, then empty items are skipped from the output. Note that if used, the results might contain less items than the limit value.

    • optionalkeyword-onlyskip_hidden: bool | None = None

      If True, then hidden fields are skipped from the output, i.e. fields starting with the # character.

    Returns AsyncIterator[dict]

key_value_store

  • key_value_store(key_value_store_id): KeyValueStoreClient
  • Retrieve the sub-client for manipulating a single key-value store.


    Parameters

    • key_value_store_id: str

      ID of the key-value store to be manipulated

    Returns KeyValueStoreClient

key_value_stores

  • key_value_stores(): KeyValueStoreCollectionClient
  • Retrieve the sub-client for manipulating key-value stores.


    Returns KeyValueStoreCollectionClient

list

  • async list(): ListPage
  • List the available key-value stores.


    Returns ListPage

    ListPage: The list of available key-value stores matching the specified filters.

list

  • async list(): ListPage
  • List the available storages.


    Returns ListPage

    ListPage: The list of available storages matching the specified filters.

list

  • async list(): ListPage
  • List the available request queues.


    Returns ListPage

    ListPage: The list of available request queues matching the specified filters.

list

  • async list(): ListPage
  • List the available datasets.


    Returns ListPage

    ListPage: The list of available datasets matching the specified filters.

list_head

  • async list_head(*, limit): dict
  • Retrieve a given number of requests from the beginning of the queue.


    Parameters

    • optionalkeyword-onlylimit: int | None = None

      How many requests to retrieve

    Returns dict

    dict: The desired number of requests from the beginning of the queue.

list_items

  • async list_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden, flatten, view): ListPage
  • List the items of the dataset.


    Parameters

    • optionalkeyword-onlyoffset: int | None = 0

      Number of items that should be skipped at the start. The default value is 0

    • optionalkeyword-onlylimit: int | None = LIST_ITEMS_LIMIT

      Maximum number of items to return. By default there is no limit.

    • optionalkeyword-onlyclean: bool | None = None

      If True, returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The clean parameter is just a shortcut for skip_hidden=True and skip_empty=True parameters. Note that since some objects might be skipped from the output, that the result might contain less items than the limit value.

    • optionalkeyword-onlydesc: bool | None = None

      By default, results are returned in the same order as they were stored. To reverse the order, set this parameter to True.

    • optionalkeyword-onlyfields: list[str] | None = None

      A list of fields which should be picked from the items, only these fields will remain in the resulting record objects. Note that the fields in the outputted items are sorted the same way as they are specified in the fields parameter. You can use this feature to effectively fix the output format.

    • optionalkeyword-onlyomit: list[str] | None = None

      A list of fields which should be omitted from the items.

    • optionalkeyword-onlyunwind: str | None = None

      Name of a field which should be unwound. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object, then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.

    • optionalkeyword-onlyskip_empty: bool | None = None

      If True, then empty items are skipped from the output. Note that if used, the results might contain less items than the limit value.

    • optionalkeyword-onlyskip_hidden: bool | None = None

      If True, then hidden fields are skipped from the output, i.e. fields starting with the # character.

    • optionalkeyword-onlyflatten: list[str] | None = None

      A list of fields that should be flattened

    • optionalkeyword-onlyview: str | None = None

      Name of the dataset view to be used

    Returns ListPage

    ListPage: A page of the list of dataset items according to the specified filters.

list_keys

  • async list_keys(*, limit, exclusive_start_key): dict
  • List the keys in the key-value store.


    Parameters

    • optionalkeyword-onlylimit: int = DEFAULT_API_PARAM_LIMIT

      Number of keys to be returned. Maximum value is 1000

    • optionalkeyword-onlyexclusive_start_key: str | None = None

      All keys up to this one (including) are skipped from the result

    Returns dict

    dict: The list of keys in the key-value store matching the given arguments

load_private_key

  • load_private_key(private_key_file_base64, private_key_password): rsa.RSAPrivateKey
  • Parameters

    • private_key_file_base64: str
    • private_key_password: str

    Returns rsa.RSAPrivateKey

maybe_parse_body

  • maybe_parse_body(body, content_type): Any
  • Parameters

    • body: bytes
    • content_type: str

    Returns Any

maybe_parse_bool

  • maybe_parse_bool(val): bool
  • Parameters

    • val: str | None

    Returns bool

maybe_parse_datetime

  • maybe_parse_datetime(val): datetime | str
  • Parameters

    • val: str

    Returns datetime | str

maybe_parse_float

  • maybe_parse_float(val): float | None
  • Parameters

    • val: str

    Returns float | None

maybe_parse_int

  • maybe_parse_int(val): int | None
  • Parameters

    • val: str

    Returns int | None

normalize_url

  • normalize_url(url, *, keep_url_fragment): str
  • Normalizes a URL.

    This function cleans and standardizes a URL by removing leading and trailing whitespaces, converting the scheme and netloc to lower case, stripping unwanted tracking parameters (specifically those beginning with 'utm_'), sorting the remaining query parameters alphabetically, and optionally retaining the URL fragment. The goal is to ensure that URLs that are functionally identical but differ in trivial ways (such as parameter order or casing) are treated as the same.


    Parameters

    • url: str

      The URL to be normalized.

    • optionalkeyword-onlykeep_url_fragment: bool = False

      Flag to determine whether the fragment part of the URL should be retained.

    Returns str

    A string containing the normalized URL.

off

  • off(event_name, listener): None
  • Remove a listener, or all listeners, from an actor event.


    Parameters

    • event_name: ActorEventTypes

      The actor event for which to remove listeners.

    • optionallistener: Callable | None = None

      The listener which is supposed to be removed. If not passed, all listeners of this event are removed.

    Returns None

on

  • on(event_name, listener): Callable
  • Add an event listener to the event manager.


    Parameters

    • event_name: ActorEventTypes

      The actor event for which to listen to.

    • listener: ListenerType

      The function which is to be called when the event is emitted (can be async). Must accept either zero or one arguments (the first argument will be the event data).

    Returns Callable

open

  • async open(*, id, name, force_cloud, config): BaseStorage
  • Open a storage, or return a cached storage object if it was opened before.

    Opens a storage with the given ID or name. Returns the cached storage object if the storage was opened before.


    Parameters

    • optionalkeyword-onlyid: str | None = None

      ID of the storage to be opened. If neither id nor name are provided, the method returns the default storage associated with the actor run. If the storage with the given ID does not exist, it raises an error.

    • optionalkeyword-onlyname: str | None = None

      Name of the storage to be opened. If neither id nor name are provided, the method returns the default storage associated with the actor run. If the storage with the given name does not exist, it is created.

    • optionalkeyword-onlyforce_cloud: bool = False

      If set to True, it will open a storage on the Apify Platform even when running the actor locally. Defaults to False.

    • optionalkeyword-onlyconfig: Configuration | None = None

      A Configuration instance, uses global configuration if omitted.

    Returns BaseStorage

    An instance of the storage.

open_queue_with_custom_client

  • Open a Request Queue with custom Apify Client.

    TODO: add support for custom client to Actor.open_request_queue(), so that we don't have to do this hacky workaround


    Returns RequestQueue

push_items

  • async push_items(items): None
  • Push items to the dataset.


    Parameters

    • items: JSONSerializable

      The items which to push in the dataset. Either a stringified JSON, a dictionary, or a list of strings or dictionaries.

    Returns None

raise_on_duplicate_storage

  • raise_on_duplicate_storage(client_type, key_name, value): NoReturn
  • Parameters

    Returns NoReturn

raise_on_non_existing_storage

  • raise_on_non_existing_storage(client_type, id): NoReturn

request_queue

  • request_queue(request_queue_id, *, client_key): RequestQueueClient
  • Retrieve the sub-client for manipulating a single request queue.


    Parameters

    • request_queue_id: str

      ID of the request queue to be manipulated

    • optionalkeyword-onlyclient_key: str | None = None

      A unique identifier of the client accessing the request queue

    Returns RequestQueueClient

request_queues

  • request_queues(): RequestQueueCollectionClient
  • Retrieve the sub-client for manipulating request queues.


    Returns RequestQueueCollectionClient

set_cloud_client

  • set_cloud_client(client): None
  • Set the storage client.


    Parameters

    • client: ApifyClientAsync

      The instance of a storage client.

    Returns None

set_config

  • set_config(config): None
  • Set the config for the StorageClientManager.


    Parameters

    • config: Configuration

      The configuration this StorageClientManager should use.

    Returns None

set_record

  • async set_record(key, value, content_type): None
  • Set a value to the given record in the key-value store.


    Parameters

    • key: str

      The key of the record to save the value to

    • value: Any

      The value to save into the record

    • optionalcontent_type: str | None = None

      The content type of the saved value

    Returns None

stream_items

  • async stream_items(_args, _kwargs): AsyncIterator
  • Parameters

    • _args: Any
    • _kwargs: Any

    Returns AsyncIterator

stream_record

  • async stream_record(_key): AsyncIterator[dict | None]
  • Parameters

    • _key: str

    Returns AsyncIterator[dict | None]

to_apify_request

  • to_apify_request(scrapy_request, spider): dict | None
  • Convert a Scrapy request to an Apify request.


    Parameters

    • scrapy_request: Request

      The Scrapy request to be converted.

    • spider: Spider

      The Scrapy spider that the request is associated with.

    Returns dict | None

    The converted Apify request if the conversion was successful, otherwise None.

to_scrapy_request

  • to_scrapy_request(apify_request, spider): Request
  • Convert an Apify request to a Scrapy request.


    Parameters

    • apify_request: dict

      The Apify request to be converted.

    • spider: Spider

      The Scrapy spider that the request is associated with.

    Returns Request

    The converted Scrapy request.

unique_key_to_request_id

  • unique_key_to_request_id(unique_key): str
  • Generate request ID based on unique key in a deterministic way.


    Parameters

    • unique_key: str

    Returns str

update

  • async update(*, name): dict
  • Update the dataset with specified fields.


    Parameters

    • optionalkeyword-onlyname: str | None = None

      The new name for the dataset

    Returns dict

    dict: The updated dataset

update

  • async update(*, name): dict
  • Update the key-value store with specified fields.


    Parameters

    • optionalkeyword-onlyname: str | None = None

      The new name for key-value store

    Returns dict

    dict: The updated key-value store

update

  • async update(*, name): dict
  • Update the request queue with specified fields.


    Parameters

    • optionalkeyword-onlyname: str | None = None

      The new name for the request queue

    Returns dict

    dict: The updated request queue

update_metadata

  • async update_metadata(*, data, entity_directory, write_metadata): None
  • Parameters

    • keyword-onlydata: dict
    • keyword-onlyentity_directory: str
    • keyword-onlywrite_metadata: bool

    Returns None

update_request

  • async update_request(request, *, forefront): dict
  • Update a request in the queue.


    Parameters

    • request: dict

      The updated request

    • optionalkeyword-onlyforefront: bool | None = None

      Whether to put the updated request in the beginning or the end of the queue

    Returns dict

    dict: The updated request

update_request_queue_item

  • async update_request_queue_item(*, request_id, request, entity_directory, persist_storage): None
  • Parameters

    • keyword-onlyrequest_id: str
    • keyword-onlyrequest: dict
    • keyword-onlyentity_directory: str
    • keyword-onlypersist_storage: bool

    Returns None

values

  • values(): ValuesView[T]
  • Iterate over the values in the cache in order of insertion.


    Returns ValuesView[T]

wait_for_all_listeners_to_complete

  • async wait_for_all_listeners_to_complete(*, timeout_secs): None
  • Wait for all event listeners which are currently being executed to complete.


    Parameters

    • optionalkeyword-onlytimeout_secs: float | None = None

      Timeout for the wait. If the event listeners don't finish until the timeout, they will be canceled.

    Returns None

wrap_internal

  • wrap_internal(implementation, metadata_source): MetadataType

Properties

__version__

__version__: Undefined

API_PROCESSED_REQUESTS_DELAY_MILLIS

API_PROCESSED_REQUESTS_DELAY_MILLIS: Undefined

APIFY_PROXY_VALUE_REGEX

APIFY_PROXY_VALUE_REGEX: Undefined

BASE64_REGEXP

BASE64_REGEXP: Undefined

BaseResourceClientType

BaseResourceClientType: Undefined

BaseResourceCollectionClientType

BaseResourceCollectionClientType: Undefined

COUNTRY_CODE_REGEX

COUNTRY_CODE_REGEX: Undefined

DEFAULT_API_PARAM_LIMIT

DEFAULT_API_PARAM_LIMIT: Undefined

DEPRECATED_NAMES

DEPRECATED_NAMES: Undefined

DualPropertyOwner

DualPropertyOwner: Undefined

DualPropertyType

DualPropertyType: Undefined

EFFECTIVE_LIMIT_BYTES

EFFECTIVE_LIMIT_BYTES: Undefined

ENCRYPTED_INPUT_VALUE_PREFIX

ENCRYPTED_INPUT_VALUE_PREFIX: Undefined

ENCRYPTED_INPUT_VALUE_REGEXP

ENCRYPTED_INPUT_VALUE_REGEXP: Undefined

ENCRYPTION_AUTH_TAG_LENGTH

ENCRYPTION_AUTH_TAG_LENGTH: Undefined

ENCRYPTION_IV_LENGTH

ENCRYPTION_IV_LENGTH: Undefined

ENCRYPTION_KEY_LENGTH

ENCRYPTION_KEY_LENGTH: Undefined

EVENT_LISTENERS_TIMEOUT_SECS

EVENT_LISTENERS_TIMEOUT_SECS: Undefined

ImplementationType

ImplementationType: Undefined

LIST_ITEMS_LIMIT

LIST_ITEMS_LIMIT: Undefined

ListenerType

ListenerType: Undefined

ListOrDictOrAny

ListOrDictOrAny: Undefined

LOCAL_ENTRY_NAME_DIGITS

LOCAL_ENTRY_NAME_DIGITS: Undefined

logger

logger: Undefined

logger

logger: Undefined

logger_name

logger_name: Undefined

MainReturnType

MainReturnType: Undefined

MAX_CACHED_REQUESTS

MAX_CACHED_REQUESTS: Undefined

MAX_PAYLOAD_SIZE_BYTES

MAX_PAYLOAD_SIZE_BYTES: Undefined

MAX_QUERIES_FOR_CONSISTENCY

MAX_QUERIES_FOR_CONSISTENCY: Undefined

MetadataType

MetadataType: Undefined

nested_event_loop

nested_event_loop: asyncio.AbstractEventLoop

PARSE_DATE_FIELDS_KEY_SUFFIX

PARSE_DATE_FIELDS_KEY_SUFFIX: Undefined

PARSE_DATE_FIELDS_MAX_DEPTH

PARSE_DATE_FIELDS_MAX_DEPTH: Undefined

QUERY_HEAD_BUFFER

QUERY_HEAD_BUFFER: Undefined

QUERY_HEAD_MIN_LENGTH

QUERY_HEAD_MIN_LENGTH: Undefined

RECENTLY_HANDLED_CACHE_SIZE

RECENTLY_HANDLED_CACHE_SIZE: Undefined

REQUEST_ID_LENGTH

REQUEST_ID_LENGTH: Undefined

REQUEST_QUEUE_HEAD_MAX_LIMIT

REQUEST_QUEUE_HEAD_MAX_LIMIT: Undefined

ResourceClientType

ResourceClientType: Undefined

noqa: PLC0105

SAFETY_BUFFER_PERCENT

SAFETY_BUFFER_PERCENT: Undefined

SESSION_ID_MAX_LENGTH

SESSION_ID_MAX_LENGTH: Undefined

STORAGE_CONSISTENCY_DELAY_MILLIS

STORAGE_CONSISTENCY_DELAY_MILLIS: Undefined

T

T: Undefined

T

T: Undefined

T

T: Undefined

Scrapy integration

apply_apify_settings

  • apply_apify_settings(*, settings, proxy_config): Settings
  • Integrates Apify configuration into a Scrapy project settings.

    Note: The function directly modifies the passed settings object and also returns it.


    Parameters

    • optionalkeyword-onlysettings: Settings | None = None

      Scrapy project settings to be modified.

    • optionalkeyword-onlyproxy_config: dict | None = None

      Proxy configuration to be stored in the settings.

    Returns Settings

    Scrapy project settings with custom configurations.

Page Options