Helpers Module

This module provides different helper functions for the AdLibAPI object, such as validation and data processing tasks.

NumberValidator Class

class AdDownloader.helpers.NumberValidator[source]

static validate_number(answers, current)[source]

Checks whether the input is a valid number.

Parameters:: document (document) – A document representing user’s number input.
Returns:: True if the text of the document represents a valid number, False otherwise.
Return type:: bool

DateValidator Class

class AdDownloader.helpers.DateValidator[source]

A class representing a date validator.

static validate_date(answers, current)[source]

Checks whether the input is a valid date in the format Y-m-d (e.g. “2023-12-31”).

Parameters:: document (document) – A document representing user’s date input.
Returns:: True if the text of the document represents a valid date, False otherwise.
Return type:: bool

CountryValidator Class

class AdDownloader.helpers.CountryValidator[source]

A class representing a country code validator.

static validate_country(answers, current)[source]

Checks whether the input is a valid country code.

Parameters:: document (document) – A document representing user’s country code input.
Returns:: True if the text of the document represents a valid country code, False otherwise.
Return type:: bool

ExcelValidator Class

class AdDownloader.helpers.ExcelValidator[source]

A class representing a valid Excel/CSV page-ids file validator.

static validate_excel(answers, current)[source]

Checks whether the input is a valid Excel or CSV file.

Parameters:: document (document) – A document representing user’s Excel/CSV file name input.
Returns:: True if the text of the document represents a valid Excel/CSV file containing a column page_id, False otherwise.
Return type:: bool

is_valid_excel_file Function

AdDownloader.helpers.is_valid_excel_file(file)[source]

Checks whether the input file name is a valid Excel file.

Parameters:: file (str) – A path to an Excel file.
Returns:: True if the string represents a valid path to an excel file, False otherwise.
Return type:: bool

Example:

>>> is_valid_excel_file("example.xlsx")
True

is_valid_page_ids_file Function

AdDownloader.helpers.is_valid_page_ids_file(file)[source]

Checks whether the input file name is a valid Excel or CSV file for page IDs.

Parameters:: file (str) – A path to an Excel or CSV file, relative to the data folder.
Returns:: True if the string represents a valid, readable Excel or CSV file, False otherwise.
Return type:: bool

Example:

>>> is_valid_page_ids_file("example.csv")
True

load_json_from_folder Function

AdDownloader.helpers.load_json_from_folder(folder_path)[source]

Load all the JSON files from the specified folder and merge then into a dataframe.

Parameters:: folder_path (str) – A path to a folder containing JSON files with ad data.
Returns:: A dataframe containing information retrieved from all JSON files of the folder.
Return type:: pandas.DataFrame

Example:

>>> folder_path = 'path/to/json/folder'
>>> loaded_data = load_json_from_folder(folder_path)
>>> print(loaded_data.head())

flatten_age_country_gender Function

AdDownloader.helpers.flatten_age_country_gender(row, target_country)[source]

Flatten an entry row containing the age_country_gender_reach_breakdown by putting it into wide format for a given target country.

Parameters:

row (list) – A row in JSON format containing age_country_gender_reach_breakdown data.
target_country (str) – The target country for which the reach data will be processed.

Returns:

A list with the processed age_gender_reach data.

Return type:

list

Example:

>>> row_example = [{"country": "NL", "age_gender_breakdowns": [{"age_range": "18-24", "male": 100, "female": 50, "unknown": 10}, ...]}]
>>> target_country_example = "NL"
>>> flattened_data = flatten_age_country_gender(row_example, target_country_example)
>>> print(flattened_data)
[{'country': 'NL', 'age_range': '18-24', 'male': 100, 'female': 50, 'unknown': 10}, ...]

flatten_demographic_distribution Function

AdDownloader.helpers.flatten_demographic_distribution(row)[source]

Flatten the demographic distribution data from a single row into a dictionary.

This function takes a single row of demographic distribution data, which is typically a list of dictionaries containing percentage, age, and gender information. It flattens this nested structure into a dictionary with keys formatted as “{gender}_{age}” and corresponding percentage values.

Parameters:: row (list) – A row of demographic distribution data, typically a list of dictionaries.
Returns:: A list where keys are formatted as “{gender}_{age}” and values are the corresponding percentage values.
Return type:: list

Example:

>>> row_example = [{'percentage': '0.113043', 'age': '45-54', 'gender': 'male'}, {'percentage': '0.008696', 'age': '25-34', 'gender': 'female'}, ...]
>>> flattened_data = flatten_demographic_distribution(row_example)
>>> print(flattened_data)
{'male_45-54': 0.113043, 'female_25-34': 0.008696, ...}

transform_data Function

AdDownloader.helpers.transform_data(project_name, country, ad_type)[source]

Transform all the data from a given project with a target country by flattening its age_country_gender_reach_breakdown column. This function will work if there exists a folder ‘output/{project_name/json}’ containing raw downloaded data in JSON format. The transformed data is saved inside ‘output/{project_name}/ads_data’, where original_data.xlsx is the original downloaded data and processed_data.xlsx contains flattened age_country_gender_reach_breakdown columns.

Parameters:

project_name (str) – The name of the current project.
country (str) – The target country for which the data will be transformed.
ad_type (str) – The type of the ads that were retrieved (can be “All” or “Political”). Depending on the ad_type different processing will be done.

Returns:

If ad_type = “All” then a dataframe with the processed age_country_gender_reach_breakdown data, if not then a dataframe with the processed demographic_distribution.

Return type:

pandas.DataFrame

Example:

>>> project_name_example = "example_project"
>>> country_example = "NL"
>>> transformed_data = transform_data(project_name_example, country_example, "ALL")
>>> print(transformed_data.head())
      id ad_delivery_start_time ad_delivery_stop_time  ... unknown_45-54 unknown_55-64 unknown_65+
0  11111             2023-12-21            2023-12-21  ...           0.0           0.0         0.0

[1 rows x 33 columns]

configure_logging Function

AdDownloader.helpers.configure_logging(project_name)[source]

Configures and returns a logger with a file handler set to write logs to a specified project’s log file. This function creates a log file named ‘logs.log’ within a directory named after the project_name under the ‘output’ directory. It checks if the logger already has handlers to prevent adding multiple handlers that do the same thing, ensuring that each message is logged only once.

Parameters:: project_name (str) – The name of the project for which logging is being configured.
Returns:: A configured logger object that logs messages to ‘output/<project_name>/logs.log’.
Return type:: logging.Logger

close_logger Function

AdDownloader.helpers.close_logger(logger)[source]

Closes all handlers of the specified logger to ensure proper release of file resources.

Parameters:: logger (logging.Logger) – The logger instance whose handlers are to be closed.

hide_access_token Function

AdDownloader.helpers.hide_access_token(data)[source]

Remove the access token from ad_snapshot_url column. This can be readded by calling update_access_token().

Parameters:: data (pandas.DataFrame) – A dataframe containing a column ad_snapshot_url.
Returns:: A dataframe with the access token removed from the ad_snapshot_url column.
Return type:: pandas.DataFrame

Example:

>>> data = pd.read_excel('path/to/your/data.xlsx')
>>> data = hide_access_token(data)
>>> data.to_excel('path/to/your/data.xlsx', index=False)

update_access_token Function

AdDownloader.helpers.update_access_token(data, new_access_token=None)[source]

Update the ad_snapshot_url with a new access token given ad data.

Parameters:

data (pandas.DataFrame) – A dataframe containing a column ad_snapshot_url.
new_access_token (str) – The new access token, optional. If none is given, user will be prompted for inputting it.

Returns:

A dataframe with an updated access token in the ad_snapshot_url column.

Return type:

pandas.DataFrame

Example:

>>> data = pd.read_excel('path/to/your/data.xlsx')
>>> new_access_token = input("Provide an updated access token: ")
>>> data = update_access_token(data, new_access_token)

get_long_lived_token Function

AdDownloader.helpers.get_long_lived_token(access_token=None, app_id=None, app_secret=None, version='v25.0')[source]

Generate a Meta long-lived access token, that lasts around 60 days, given a valid short-lived access token. The long-lived access token and the expiration time will be saved in a meta_long_lived_token.txt file. The app_id and app_secret can be found inside your app at https://developers.facebook.com/apps/.

Parameters:

access_token (str) – A valid access token, optional. If none is given, user will be prompted for inputting it.
app_id (str) – A valid access token, optional. If none is given, user will be prompted for inputting it.
app_secret (str) – A valid access token, optional. If none is given, user will be prompted for inputting it.

calculate_image_hash Function

AdDownloader.helpers.calculate_image_hash(image_path)[source]

Calculate the MD5 hash of an image file. The MD5 hash is a 32-character hexadecimal number that uniquely represents the image’s pixel data, useful for verifying integrity and identifying duplicates.

Parameters:: image_path (str) – The path to the image file.
Returns:: The MD5 hash of the image.
Return type:: str

Example:

>>> image_path = 'path-to-your-image'
>>> calculate_image_hash(image_path)
'108f46130f45639cf388892306235fd5'

deduplicate_images Function

AdDownloader.helpers.deduplicate_images(image_folder, unique_img_folder)[source]

Deduplicate images in a folder and save unique images to a specified folder.

This function scans a folder for PNG/JPG/JPEG images, calculates the MD5 hash of each image, identifies duplicates, and saves only the unique images to a separate folder.

Parameters:

image_folder (str) – The path to the folder containing the original images.
unique_img_folder (str) – The path to the folder where unique images will be saved.

Example:

>>> image_folder = 'output/<project_name>/ads_images'
>>> unique_img_folder = 'output/<project_name>/unique_images'
>>> deduplicate_images(image_folder, unique_img_folder)
Found 57 duplicates and saved 143 unique images inside output/<project_name>/unique_images.