Analysis Module =============== .. module:: AdDownloader.analysis :synopsis: Provides different analysis functions for the AdDownloader. This module provides different analysis functions for the AdLibAPI object, such as text and image analysis, and visualizations. load_data Function ------------------ .. autofunction:: load_data Example:: >>> from AdDownloader.analysis import * >>> data_path = "output//ads_data/_processed_data.xlsx" >>> data = load_data(data_path) preprocess Function ------------------- .. autofunction:: preprocess Example:: >>> tokens = data["ad_creative_bodies"].apply(preprocess) >>> tokens.head(3) 0 person earli vote open soon georgia wait take ... 1 2020 help turn year around find vote earli person 2 person earli vote open soon georgia wait take ... get_word_freq Function ---------------------- .. autofunction:: get_word_freq Example:: >>> freq_dist = get_word_freq(tokens) >>> print(f"Most common 3 keywords: {freq_dist[0:3]}") Most common 3 keywords: [('vote', 3273), ('elect', 1155), ('earli', 1125)] get_sentiment Function ---------------------- .. autofunction:: get_sentiment Example:: >>> textblb_sent, nltk_sent = get_sentiment(data["ad_creative_bodies"]) >>> nltk_sent.head(3) 0 {'neg': 0.0, 'neu': 0.859, 'pos': 0.141, 'comp... 1 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound... 2 {'neg': 0.098, 'neu': 0.644, 'pos': 0.258, 'co... >>> textblb_sent.head(3) 0 0.125000 1 0.112500 2 0.142857 get_topics Function ------------------- .. autofunction:: get_topics Example:: >>> lda_model, topics, coherence_lda, perplexity, log_likelihood, avg_similarity, topics_df = get_topics(tokens, nr_topics=5) Number of unique tokens: 435 Number of documents: 2000 Finished topic modeling for 5 topics. Coherence: 0.71; Perplexity: 51.78; Log-Likelihood: -104762.56; Similarity: 0.07 Topic 0: ['vote', 'elect', 'paramount', 'give', 'need', 'win', 'novemb', '3rd'] Topic 1: ['vote', 'earli', 'find', 'year', 'person', 'click', 'easi', 'wait'] ... >>> topics_df.head(3) dom_topic perc_contr topic_keywords 0 1 0.6444 vote, earli, find, year, person, click, easi, ... 1 1 0.6567 vote, earli, find, year, person, click, easi, ... 2 4 0.9138 ballot, return, vote, today, click, home, demo... get_topic_per_caption Function ------------------------------ .. autofunction:: get_topic_per_caption Example:: >>> vectorizer = CountVectorizer(stop_words = stop_words, max_features = 1000, min_df = 5, max_df = 0.95) >>> vect_text = vectorizer.fit_transform(tokens) # assuming the tokens are already processed captions >>> tf_feature_names = vectorizer.get_feature_names_out() >>> lda_model = LatentDirichletAllocation(n_components=5, learning_method='online', random_state=0, max_iter=10, learning_decay=0.7, learning_offset=10).fit(vect_text) >>> topics_df = get_topic_per_caption(lda_model, vect_text, tf_feature_names) >>> topics_df.head(3) dom_topic perc_contr topic_keywords 0 1 0.6444 vote, earli, find, year, person, click, easi, ... 1 1 0.6567 vote, earli, find, year, person, click, easi, ... 2 4 0.9138 ballot, return, vote, today, click, home, demo... start_text_analysis Function ---------------------------- .. autofunction:: start_text_analysis Example:: >>> # without topic modeling >>> tokens, freq_dist, textblb_sent, nltk_sent = start_text_analysis(data) >>> # with topic modeling >>> tokens, freq_dist, textblb_sent, nltk_sent, lda_model, topics, coherence_lda, perplexity, log_likelihood, avg_similarity, topics_df = start_text_analysis(data) >>> # for output see all examples from above transform_data_by_age Function ------------------------------ .. autofunction:: transform_data_by_age Example:: >>> import pandas as pd >>> data_path = "output//ads_data/_processed_data.xlsx" >>> data = pd.read_excel(data_path) >>> data_by_age = transform_data_by_age(data) >>> data_by_age.head(3) Reach Age Range 0 7.0 18-24 1 0.0 18-24 3 23.0 65+ transform_data_by_gender Function --------------------------------- .. autofunction:: transform_data_by_gender Example:: >>> # assuming data was already loaded >>> data_by_gender = transform_data_by_gender(data) >>> data_by_gender.head(3) Reach Gender 0 NaN female 1 68.0 female 2 243.0 male get_graphs Function ------------------- .. autofunction:: get_graphs Example:: >>> fig1, fig2, fig3, fig4, fig5, fig6, fig7, fig8, fig9, fig10 = get_graphs(data) >>> fig1.show() # will open a webpage with the graph, which can also be saved locally show_topics_top_pages Function ------------------------------ .. autofunction:: show_topics_top_pages Example:: >>> # using the output from `get_topics(tokens)` >>> fig = show_topics_top_pages(topics_df, data) >>> fig.show() blip_call Function ------------------ .. autofunction:: blip_call Example:: >>> images_path = "output//ads_images" >>> img_caption = blip_call(images_path, nr_images=20) # captioning >>> img_caption.head(3) ad_id img_caption 0 689539479274809 a group of people eating pizza together 1 352527490742823 a couple of people sitting at a table eating p... 2 891711935895560 a man and woman eating pizza together >>> img_content = blip_call(images_path, task="visual_question_answering", nr_images=20, questions="Are there people in this ad?") >>> img_content.head(5) ad_id Are there people in this ad? 0 723805182773873 yes 1 871823271403675 no 2 6398713840181656 yes extract_dominant_colors Function -------------------------------- .. autofunction:: extract_dominant_colors Example:: >>> image_files = [f for f in os.listdir(images_path) if f.endswith(('jpg', 'png', 'jpeg'))] >>> dominant_colors, percentages = extract_dominant_colors(os.path.join(images_path, image_files[2])) >>> for col, percentage in zip(dominant_colors, percentages): ... print(f"Color: {col}, Percentage: {percentage:.2f}%") ... Color: #3a2f28, Percentage: 41.99% Color: #dfcbac, Percentage: 32.76% Color: #817875, Percentage: 25.24% assess_image_quality Function ----------------------------- .. autofunction:: assess_image_quality Example:: >>> resolution, brightness, contrast, sharpness = assess_image_quality(os.path.join(images_path, image_files[2])) >>> print(f"Resolution: {resolution} pixels, Brightness: {brightness}, Contrast: {contrast}, Sharpness: {sharpness}") Resolution: 188400 pixels, Brightness: 142.5308, Contrast: 71.3726, Sharpness: 3691.4007 analyse_image Function ---------------------- .. autofunction:: analyse_image Example:: >>> analysis_result = analyse_image(os.path.join(images_path, image_files[2])) >>> print(analysis_result) {'ad_id': '1043287820216670', 'resolution': 188400, 'brightness': 142.53080148619958, 'contrast': 71.3726801705792, 'sharpness': 3691.40007606529, 'ncorners': 17, 'dom_color_1': '#817875', 'dom_color_1_prop': 41.943359375, 'dom_color_2': '#dfcbab', 'dom_color_2_prop': 32.8369140625, 'dom_color_3': '#3a2f28', 'dom_color_3_prop': 25.2197265625} analyse_image_folder Function ----------------------------- .. autofunction:: analyse_image_folder Example:: >>> df = analyse_image_folder(images_path, nr_images=20) >>> df.head(3) ad_id resolution brightness contrast sharpness ncorners dom_color_1 dom_color_1_prop dom_color_2 dom_color_2_prop dom_color_3 dom_color_3_prop 0 1039719343827470 187800 172.399936 60.601719 1585.668739 21 #ced2ce 55.395508 #a48b7d 28.369141 #464347 16.235352 1 1043131113478341 187800 108.217066 73.420019 903.498253 18 #1b1c17 45.996094 #96603d 33.593750 #dcbea0 20.410156 2 1043287820216670 188400 142.530801 71.372680 3691.400076 17 #3a2f28 41.992188 #817875 32.763672 #dfcbac 25.244141