<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>csv Archives - Creatronix</title>
	<atom:link href="https://creatronix.de/tag/csv/feed/" rel="self" type="application/rss+xml" />
	<link>https://creatronix.de/tag/csv/</link>
	<description>My adventures in code &#38; business</description>
	<lastBuildDate>Mon, 06 Jan 2025 09:38:38 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Pandas Cheat Sheet</title>
		<link>https://creatronix.de/pandas-cheat-sheet/</link>
		
		<dc:creator><![CDATA[Jörn]]></dc:creator>
		<pubDate>Fri, 05 Mar 2021 13:17:19 +0000</pubDate>
				<category><![CDATA[Data Science & SQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[cheat sheet]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[cumsum]]></category>
		<category><![CDATA[cumulative sum]]></category>
		<category><![CDATA[datetime]]></category>
		<category><![CDATA[delimiter]]></category>
		<category><![CDATA[dropping columns]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[head tail]]></category>
		<category><![CDATA[header]]></category>
		<category><![CDATA[iloc]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[loc]]></category>
		<category><![CDATA[pandas]]></category>
		<category><![CDATA[renamin columns]]></category>
		<category><![CDATA[rolling sum]]></category>
		<category><![CDATA[shifting]]></category>
		<category><![CDATA[snippets]]></category>
		<category><![CDATA[splitting strings]]></category>
		<category><![CDATA[value_counts]]></category>
		<guid isPermaLink="false">http://creatronix.de/?p=3112</guid>

					<description><![CDATA[<p>If you are new to Pandas feel free to read Introduction to Pandas I&#8217;ve assembled some pandas code snippets Reading Data Reading CSV import pandas as pd # read from csv df = pd.read_csv("path_to_file") Can also be textfiles. file suffix is ignored The default limiter for comma separated value files is the comma. If you&#8230;</p>
<p>The post <a href="https://creatronix.de/pandas-cheat-sheet/">Pandas Cheat Sheet</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>If you are new to Pandas feel free to read <a href="https://creatronix.de/introduction-to-pandas/">Introduction to Pandas</a></p>
<p>I&#8217;ve assembled some pandas code snippets</p>
<h2>Reading Data</h2>
<h3>Reading CSV</h3>
<pre>import pandas as pd

# read from csv
df = pd.read_csv("path_to_file")</pre>
<p>Can also be textfiles. file suffix is ignored</p>
<p>The default limiter for comma separated value files is the comma. If you have data with another delimiter you can specify it via:</p>
<pre>delimiter=";"</pre>
<p>If your data has no header you can pass header=None into the function</p>
<pre>df = pd.read_csv("./aoc_day_01_data.txt", header=None)</pre>
<p>With skiprows you can start reading in at any row</p>
<pre>skiprows=8</pre>
<p>Sometimes you need to alter the encoding as well:</p>
<pre>encoding="cp1252"</pre>
<h3>Reading Excel</h3>
<p>You can read excel files as well but you need to install</p>
<pre>pip install openpyxl</pre>
<pre>df = pd.read_excel("./my_excel_sheet.xlsx")</pre>
<p>With sheet_name you can select the individual sheet:</p>
<pre>sheet_name="my_sheet_1"</pre>
<h2>Inspecting data</h2>
<h3>Basic information</h3>
<pre>df.describe()</pre>
<h3>Length</h3>
<pre>len(df)</pre>
<h3>showing entries</h3>
<pre>df.head()</pre>
<pre>df.tail(10)</pre>
<h3>Indexing</h3>
<pre><span class="n">df</span><span class="p">[</span><span class="s1">'A'</span><span class="p">]</span></pre>
<p>gives you column A</p>
<p>iloc gives you entries based on numerical index</p>
<pre>#      [row, column]
df.iloc[0,   0]</pre>
<pre>#     [row, column]
df.loc[:, :]</pre>
<h2>Data Cleaning</h2>
<h3>Dropping columns</h3>
<pre>del df["column_name"]</pre>
<h3>Renaming columns</h3>
<pre>df.columns = ["new_column_name", ...]</pre>
<h3>Comparing columns</h3>
<pre>df['increased'] = (df['shifted'] &gt; df['original'])</pre>
<h3>Shifting columns</h3>
<pre>df['shifted'] = df['original'].shift(-1)</pre>
<h2>Splitting</h2>
<h3>Splitting strings into individual columns</h3>
<p>&nbsp;</p>
<pre>df = pd.DataFrame(df["original"].str.split('').tolist())</pre>
<h2></h2>
<h2>Counting and Calculating</h2>
<h3>Summing columns</h3>
<pre>df["value"].sum()</pre>
<h3>Cumulative sum</h3>
<pre>df["aim"].cumsum()</pre>
<h3>Rolling sum</h3>
<pre>indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
df["rolling_sum"] = df.original.rolling(window=indexer).sum()</pre>
<h3>Counting value occurence</h3>
<p>&nbsp;</p>
<pre>df['increased'].value_counts(dropna=False)</pre>
<h3>Counting occurrences for all columns</h3>
<pre>df = pd.concat([df[column].value_counts() for column in df], axis = 1)</pre>
<h3>Convert column to datetime</h3>
<p>&nbsp;</p>
<pre>df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])</pre>
<h3>Convert datetime to minutes since midnight</h3>
<p>&nbsp;</p>
<pre>df_train.loc[:, 'msm'] = df_train.loc[:, "date"].dt.hour * 60 + df_train.loc[:, "date"].dt.minute</pre>
<p>The post <a href="https://creatronix.de/pandas-cheat-sheet/">Pandas Cheat Sheet</a> appeared first on <a href="https://creatronix.de">Creatronix</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
