Helpers Module
This module provides different helper functions for the AdLibAPI object, such as validation and data processing tasks.
NumberValidator Class
DateValidator Class
- class AdDownloader.helpers.DateValidator[source]
A class representing a date validator.
- static validate_date(answers, current)[source]
Checks whether the input is a valid date in the format Y-m-d (e.g. “2023-12-31”).
- Parameters:
document (document) – A document representing user’s date input.
- Returns:
True if the text of the document represents a valid date, False otherwise.
- Return type:
CountryValidator Class
ExcelValidator Class
- class AdDownloader.helpers.ExcelValidator[source]
A class representing a valid Excel file validator.
is_valid_excel_file Function
- AdDownloader.helpers.is_valid_excel_file(file)[source]
Checks whether the input file name is a valid Excel file.
- Parameters:
file (str) – A path to an Excel file.
- Returns:
True if the string represents a valid path to an excel file, False otherwise.
- Return type:
Example:
>>> is_valid_excel_file("example.xlsx") True
load_json_from_folder Function
- AdDownloader.helpers.load_json_from_folder(folder_path)[source]
Load all the JSON files from the specified folder and merge then into a dataframe.
- Parameters:
file (str) – A path to a folder containing JSON files with ad data.
- Returns:
A dataframe containing information retrieved from all JSON files of the folder.
- Return type:
pandas.DataFrame
Example:
>>> folder_path = 'path/to/json/folder' >>> loaded_data = load_json_from_folder(folder_path) >>> print(loaded_data.head())
flatten_age_country_gender Function
- AdDownloader.helpers.flatten_age_country_gender(row, target_country)[source]
Flatten an entry row containing the age_country_gender_reach_breakdown by putting it into wide format for a given target country.
- Parameters:
- Returns:
A list with the processed age_gender_reach data.
- Return type:
Example:
>>> row_example = [{"country": "NL", "age_gender_breakdowns": [{"age_range": "18-24", "male": 100, "female": 50, "unknown": 10}, ...]}] >>> target_country_example = "NL" >>> flattened_data = flatten_age_country_gender(row_example, target_country_example) >>> print(flattened_data) [{'country': 'NL', 'age_range': '18-24', 'male': 100, 'female': 50, 'unknown': 10}, ...]
flatten_demographic_distribution Function
- AdDownloader.helpers.flatten_demographic_distribution(row)[source]
Flatten the demographic distribution data from a single row into a dictionary.
This function takes a single row of demographic distribution data, which is typically a list of dictionaries containing percentage, age, and gender information. It flattens this nested structure into a dictionary with keys formatted as “{gender}_{age}” and corresponding percentage values.
- Parameters:
row (list) – A row of demographic distribution data, typically a list of dictionaries.
- Returns:
A list where keys are formatted as “{gender}_{age}” and values are the corresponding percentage values.
- Return type:
Example:
>>> row_example = [{'percentage': '0.113043', 'age': '45-54', 'gender': 'male'}, {'percentage': '0.008696', 'age': '25-34', 'gender': 'female'}, ...] >>> flattened_data = flatten_demographic_distribution(row_example) >>> print(flattened_data) {'male_45-54': 0.113043, 'female_25-34': 0.008696, ...}
transform_data Function
- AdDownloader.helpers.transform_data(project_name, country, ad_type)[source]
Transform all the data from a given project with a target country by flattening its age_country_gender_reach_breakdown column. This function will work if there exists a folder ‘output/{project_name/json}’ containing raw downloaded data in JSON format. The transformed data is saved inside ‘output/{project_name}/ads_data’, where original_data.xlsx is the original downloaded data and processed_data.xlsx contains flattened age_country_gender_reach_breakdown columns.
- Parameters:
- Returns:
If ad_type = “All” then a dataframe with the processed age_country_gender_reach_breakdown data, if not then a dataframe with the processed demographic_distribution.
- Return type:
pandas.DataFrame
Example:
>>> project_name_example = "example_project" >>> country_example = "NL" >>> transformed_data = transform_data(project_name_example, country_example, "ALL") >>> print(transformed_data.head()) id ad_delivery_start_time ad_delivery_stop_time ... unknown_45-54 unknown_55-64 unknown_65+ 0 11111 2023-12-21 2023-12-21 ... 0.0 0.0 0.0 [1 rows x 33 columns]
configure_logging Function
- AdDownloader.helpers.configure_logging(project_name)[source]
Configures and returns a logger with a file handler set to write logs to a specified project’s log file. This function creates a log file named ‘logs.log’ within a directory named after the project_name under the ‘output’ directory. It checks if the logger already has handlers to prevent adding multiple handlers that do the same thing, ensuring that each message is logged only once.
- Parameters:
project_name (str) – The name of the project for which logging is being configured.
- Returns:
A configured logger object that logs messages to ‘output/<project_name>/logs.log’.
- Return type:
close_logger Function
- AdDownloader.helpers.close_logger(logger)[source]
Closes all handlers of the specified logger to ensure proper release of file resources.
- Parameters:
logger (logging.Logger) – The logger instance whose handlers are to be closed.
hide_access_token Function
- AdDownloader.helpers.hide_access_token(data)[source]
Remove the access token from ad_snapshot_url column. This can be readded by calling update_access_token().
- Parameters:
data (pandas.DataFrame) – A dataframe containing a column ad_snapshot_url.
- Returns:
A dataframe with the access token removed from the ad_snapshot_url column.
- Return type:
pandas.DataFrame
Example:
>>> data = pd.read_excel('path/to/your/data.xlsx') >>> data = hide_access_token(data) >>> data.to_excel('path/to/your/data.xlsx', index=False)
update_access_token Function
- AdDownloader.helpers.update_access_token(data, new_access_token=None)[source]
Update the ad_snapshot_url with a new access token given ad data.
- Parameters:
data (pandas.DataFrame) – A dataframe containing a column ad_snapshot_url.
new_access_token (str) – The new access token, optional. If none is given, user will be prompted for inputting it.
- Returns:
A dataframe with an updated access token in the ad_snapshot_url column.
- Return type:
pandas.DataFrame
Example:
>>> data = pd.read_excel('path/to/your/data.xlsx') >>> new_access_token = input("Provide an updated access token: ") >>> data = update_access_token(data, new_access_token)
get_long_lived_token Function
- AdDownloader.helpers.get_long_lived_token(access_token=None, app_id=None, app_secret=None, version='v20.0')[source]
Generate a Meta long-lived access token, that lasts around 60 days, given a valid short-lived access token. The long-lived access token and the expiration time will be saved in a meta_long_lived_token.txt file. The app_id and app_secret can be found inside your app at https://developers.facebook.com/apps/.
- Parameters:
access_token (str) – A valid access token, optional. If none is given, user will be prompted for inputting it.
app_id (str) – A valid access token, optional. If none is given, user will be prompted for inputting it.
app_secret (str) – A valid access token, optional. If none is given, user will be prompted for inputting it.
calculate_image_hash Function
- AdDownloader.helpers.calculate_image_hash(image_path)[source]
Calculate the MD5 hash of an image file. The MD5 hash is a 32-character hexadecimal number that uniquely represents the image’s pixel data, useful for verifying integrity and identifying duplicates.
- Parameters:
image_path (str) – The path to the image file.
- Returns:
The MD5 hash of the image.
- Return type:
Example:
>>> image_path = 'path-to-your-image' >>> calculate_image_hash(image_path) '108f46130f45639cf388892306235fd5'
deduplicate_images Function
- AdDownloader.helpers.deduplicate_images(image_folder, unique_img_folder)[source]
Deduplicate images in a folder and save unique images to a specified folder.
This function scans a folder for PNG images, calculates the MD5 hash of each image, identifies duplicates, and saves only the unique images to a separate folder.
- Parameters:
Example:
>>> image_folder = 'output/<project_name>/ads_images' >>> unique_img_folder = 'output/<project_name>/unique_images' >>> deduplicate_images(image_folder, unique_img_folder) Found 57 duplicates and saved 143 unique images inside output/<project_name>/unique_images.