Skip to content

Conversation

@sourcery-ai
Copy link

@sourcery-ai sourcery-ai bot commented Sep 16, 2022

Branch main refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch origin sourcery/main
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from MathMachado September 16, 2022 22:00
from pyspark.sql import functions as F
from faker import Faker
from collections import OrderedDict
from collections import OrderedDict
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 22-65 refactored with the following changes:

except:
print('File already exists')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the following improvement in Lines 47-59:

except:
print('File already exists')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the following improvement in Lines 50-62:

Comment on lines -67 to +73

print('---------------------------------------------------')
print('Processing Record Number: ', rec_cnt)

# Define the full API call for current record in the DataFrame
full_url = url_part1 + str(row['lat']) + "&lon=" + str(row['lon']) + url_part2 + api_key

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 67-89 refactored with the following changes:

Comment on lines +13 to +17
elif profile == 'default':
config = ProfileConfigProvider().get_config()
else:
if profile == 'default':
config = ProfileConfigProvider().get_config()
else:
config = ProfileConfigProvider(profile).get_config()

config = ProfileConfigProvider(profile).get_config()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function create_api_client refactored with the following changes:

return (
train_data
)
return features.filter(features.issue_year <= 2015)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function train_data refactored with the following changes:

Comment on lines -136 to +132
valid_data = features.filter(features.issue_year > 2015)

return (
valid_data
)
return features.filter(features.issue_year > 2015)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function valid_data refactored with the following changes:

Comment on lines -69 to +81
df = dlt.read_stream("sales_orders_cleaned").where("city == 'Los Angeles'")
df = dlt.read_stream("sales_orders_cleaned").where("city == 'Los Angeles'")
df = df.select(df.city, df.order_date, df.customer_id, df.customer_name, explode(df.ordered_products).alias("ordered_products_explode"))

dfAgg = df.groupBy(df.order_date, df.city, df.customer_id, df.customer_name, df.ordered_products_explode.curr.alias("currency"))\
.agg(sum(df.ordered_products_explode.price).alias("sales"), sum(df.ordered_products_explode.qty).alias("qantity"))

return dfAgg
return df.groupBy(
df.order_date,
df.city,
df.customer_id,
df.customer_name,
df.ordered_products_explode.curr.alias("currency"),
).agg(
sum(df.ordered_products_explode.price).alias("sales"),
sum(df.ordered_products_explode.qty).alias("qantity"),
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function sales_order_in_la refactored with the following changes:

Comment on lines -88 to +106
df = dlt.read_stream("sales_orders_cleaned").where("city == 'Chicago'")
df = dlt.read_stream("sales_orders_cleaned").where("city == 'Chicago'")
df = df.select(df.city, df.order_date, df.customer_id, df.customer_name, explode(df.ordered_products).alias("ordered_products_explode"))

dfAgg = df.groupBy(df.order_date, df.city, df.customer_id, df.customer_name, df.ordered_products_explode.curr.alias("currency"))\
.agg(sum(df.ordered_products_explode.price).alias("sales"), sum(df.ordered_products_explode.qty).alias("qantity"))

return dfAgg
return df.groupBy(
df.order_date,
df.city,
df.customer_id,
df.customer_name,
df.ordered_products_explode.curr.alias("currency"),
).agg(
sum(df.ordered_products_explode.price).alias("sales"),
sum(df.ordered_products_explode.qty).alias("qantity"),
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function sales_order_in_chicago refactored with the following changes:

Comment on lines -80 to +85
fname = self.filename + '/tweets_' + str(file_timestamp) + '.json'
fname = f'{self.filename}/tweets_{str(file_timestamp)}.json'


f = open(fname, 'w')
for tweet in self.tweet_stack:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n')
f.close()
with open(fname, 'w') as f:
for tweet in self.tweet_stack:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TweetStream.write_file refactored with the following changes:

@sourcery-ai
Copy link
Author

sourcery-ai bot commented Sep 16, 2022

Sourcery Code Quality Report

❌  Merging this PR will decrease code quality in the affected files by 0.01%.

Quality metrics Before After Change
Complexity 2.92 ⭐ 2.78 ⭐ -0.14 👍
Method Length 141.92 😞 141.00 😞 -0.92 👍
Working memory 10.36 😞 10.37 😞 0.01 👎
Quality 61.07% 🙂 61.06% 🙂 -0.01% 👎
Other metrics Before After Change
Lines 1024 1038 14
Changed files Quality Before Quality After Quality Change
change-data-capture-example/notebooks/1-CDC_DataGenerator.py 44.89% 😞 44.84% 😞 -0.05% 👎
divvy-bike-demo/python-divvybike-api-ingest-stationinformation.py 65.74% 🙂 65.74% 🙂 0.00%
divvy-bike-demo/python-divvybike-api-ingest-stationstatus.py 66.61% 🙂 66.61% 🙂 0.00%
divvy-bike-demo/python-weatherinfo-api-ingest.py 50.92% 🙂 50.92% 🙂 0.00%
dms-dlt-cdc-demo/resources/utils/dlt_runner.py 74.18% 🙂 74.67% 🙂 0.49% 👍
financial-services-examples/Personalization/00 - Customer Transaction & Behavioral Data Producer.py 27.16% 😞 27.16% 😞 0.00%
ml models/loan risk ml model.py 57.79% 🙂 57.62% 🙂 -0.17% 👎
python/DLT Event Log Queries.py 64.64% 🙂 66.11% 🙂 1.47% 👍
python/Loan Risk.py 81.70% ⭐ 81.93% ⭐ 0.23% 👍
python/Retail Sales.py 84.15% ⭐ 84.60% ⭐ 0.45% 👍
twitter-dlt-huggingface-demo/Twitter-Stream-S3.py 81.75% ⭐ 82.03% ⭐ 0.28% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
dms-dlt-cdc-demo/resources/utils/dlt_runner.py update_and_monitor 5 ⭐ 122 😞 8 🙂 65.44% 🙂 Try splitting into smaller methods
python/Loan Risk.py lendingclub_clean 0 ⭐ 219 ⛔ 5 ⭐ 68.72% 🙂 Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants