Posty

Wyświetlanie postów z styczeń, 2024

Navigating the ETL Process: A Case Study with Movie Metadata

Obraz
In today's entry, I delve into the ETL process (Extract, Transform, Load), focusing on extracting data from various sources, transforming it (which includes cleaning, aggregating, joining, etc.), and finally loading it into a target system.   Let's begin with data extraction. For this example, I used a dataset from  https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata : Both files in the dataset have an 'id' column, and after verifying, the row counts matched perfectly, confirming a consistent base for merging. Next, in the transformation phase, I combined the two files, keeping the more extensive dataset as the base. I then removed the redundant 'movie_id' and 'title' columns and renamed 'title_x' to 'title' for clarity. You can see that longer rows are presented in this way: Which makes analysis a little difficult so to see the whole we will add: Ok so let's check out what the genres column looks like: Well it is not very readab...

Utilizing Linear Regression in Python for Effective Gold Trading Strategy

Obraz
 Today, I'd like to share with you a simple trading idea utilizing Python and linear regression. Let's start by importing the necessary libraries: yfinance (yf): Used for fetching financial data, including gold prices in this case. pandas (pd): Used for data manipulation and storing data in DataFrames. numpy (np): Used for numerical calculations. LinearRegression from sklearn: Used to create a linear regression model. mplfinance (mpf): Used for creating financial charts, including candlestick charts. I define the symbol for gold as 'GC=F'. The data for gold is fetched from January 1, 2022, to January 11, 2024, with a daily interval The calculate_new_regression function takes a DataFrame and a starting index. It creates a new linear regression model using data from the specified index to the end of the DataFrame. It returns the predicted trend values, which are used as a new trend line. I call the calculate_new_regression function for the entire dataset, st...

Exploring Pearson Correlation in the World of Finance: A Deeper Understanding

Obraz
  Hello there!   As I embark on this blogging journey, I've decided to go with the flow and see where it takes me. Today, it struck me that a great idea for my first post could be about correlation - it's relatively simple, frequently used, and, I believe, often overinterpreted.   Let's start from scratch, avoiding getting bogged down in details or formulas. We'll begin with some code action. Let's examine the correlations of a few popular financial instruments.   I selected several instruments and downloaded their data from the past three years using yfinance. After transforming this data into a DataFrame (DF), everything seems set for correlation analysis. But have we chosen the right data to start these calculations? At first, it seems obvious - we're looking at closing prices. Let's generate some results. code in which we take data , create df and present correlation matrix results from the presentation of the first five rows of our df and correl...

Join Me on a Journey to Master Python and Machine Learning for Data Analysis and Trading

  Welcome to my blog!   They say the best way to learn is to teach, and that's exactly the philosophy I'm embracing as I embark on an exciting journey to master Python and machine learning for data analysis and trading. And guess what? You're invited to join me!   Why Python and machine learning, you ask? Well, in today's data-driven world, these skills are not just valuable; they're essential. Python's simplicity and versatility, combined with the powerful insights that machine learning can provide, make for an unbeatable combination in the realms of data analysis and financial trading.   But this isn't just about me. It's about us, learning and growing together. Whether you're a beginner curious about Python and machine learning, or you're further along in your journey, there's something here for everyone. I'll be diving into topics ranging from basic Python programming to advanced machine learning algorithms, all with a foc...