Code Script 🚀

How can I pivot a dataframe

February 15, 2025

How can I pivot a dataframe

Information manipulation is a cornerstone of information investigation, and pivoting is a almighty method that tin reshape your information for deeper insights. Pivoting a dataframe, whether or not successful Python with Pandas, R, oregon another instruments, transforms information from a “agelong” format to a “broad” format, making it simpler to analyse and visualize relationships betwixt variables. This translation is indispensable for duties similar creating abstract studies, evaluating metrics crossed antithetic classes, and getting ready information for device studying fashions. This usher volition delve into the intricacies of pivoting dataframes, exploring assorted strategies, applicable examples, and champion practices for optimizing this important information manipulation accomplishment.

Knowing the Pivot Cognition

Ideate having income information organized by day, merchandise, and income magnitude. Successful this “agelong” format, all line represents a azygous merchantability. Pivoting permits you to restructure this information truthful that, for illustration, merchandise go columns, dates go rows, and the values are the income quantities. This “broad” format makes it importantly simpler to comparison income show crossed antithetic merchandise complete clip. The essence of pivoting lies successful choosing circumstantial columns to go fresh scale labels (rows), fresh file labels, and the values that populate the ensuing array.

Selecting the correct pivot relation relies upon connected your circumstantial wants and the complexity of your information. Elemental pivots mightiness affect utilizing the basal pivot methodology, piece much intricate transformations mightiness necessitate precocious strategies similar pivot_table for dealing with aggregations and duplicate values. Knowing the nuances of all technique is captious for businesslike information manipulation.

Pivoting with Pandas successful Python

Python’s Pandas room offers sturdy instruments for pivoting dataframes. The pivot methodology is the foundational relation, requiring you to specify the ‘scale’ (rows), ‘columns’, and ‘values’ from your first dataframe. Fto’s exemplify with an illustration:

import pandas arsenic pd information = {'Day': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'], 'Merchandise': ['A', 'B', 'A', 'B'], 'Income': [one hundred, one hundred fifty, one hundred twenty, one hundred eighty]} df = pd.DataFrame(information) pivot_df = df.pivot(scale='Day', columns='Merchandise', values='Income') mark(pivot_df) 

This codification snippet demonstrates a basal pivot cognition. The ensuing pivot_df volition person dates arsenic rows, merchandise arsenic columns, and income figures arsenic values. Nevertheless, the pivot methodology is constricted successful dealing with eventualities with duplicate entries. For specified circumstances, pivot_table comes to the rescue, providing functionalities similar aggregation utilizing features specified arsenic sum, average, oregon number.

Dealing with Duplicate Values with pivot_table

Once dealing with datasets containing duplicate entries for a fixed scale/file operation, pivot_table proves invaluable. By specifying an aggregation relation, you tin consolidate aggregate values into a azygous typical worth.

pivot_table_df = df.pivot_table(scale='Day', columns='Merchandise', values='Income', aggfunc='sum') 

Pivoting successful Another Information Investigation Instruments

Piece Python’s Pandas is wide utilized, another instruments similar R and SQL besides message pivoting capabilities. R’s reshape2 bundle offers the dcast relation for akin transformations. Successful SQL, the PIVOT function permits for dynamic file instauration based mostly connected line values. Knowing these transverse-level functionalities expands your information manipulation toolkit.

Careless of the implement you take, the underlying rules stay accordant. Figuring out the parts for rows, columns, and values is cardinal to palmy pivoting. The circumstantial syntax and disposable functionalities whitethorn disagree, however the center conception stays the aforesaid.

Champion Practices and Concerns

For optimum pivoting, see these factors: information cleanliness, due aggregation features, and dealing with lacking values. Cleanable information ensures close outcomes, piece due aggregation prevents accusation failure. Addressing lacking values done imputation oregon removing is important for dependable investigation. Pursuing these practices ensures the integrity and effectiveness of your pivoted dataframes.

  • Cleanable your information earlier pivoting.
  • Take the due aggregation relation for pivot_table.
  1. Place the columns for rows (scale).
  2. Choice the columns for columns.
  3. Find the values to populate the array.

Effectual information visualization frequently requires reshaping your information. Pivoting helps immediate analyzable accusation intelligibly and concisely, facilitating a deeper knowing of underlying developments and patterns. Detect much astir precocious information manipulation methods successful our information investigation usher.

Often Requested Questions

Q: What’s the quality betwixt pivot and pivot_table?

A: pivot is utilized for elemental pivoting with out aggregation. pivot_table handles aggregations once location are duplicate entries for a fixed scale/file operation.

[Infographic Placeholder]

Mastering the creation of pivoting dataframes empowers you to extract significant insights from your information. Whether or not you’re analyzing income tendencies, person behaviour, oregon technological datasets, this almighty method is an indispensable implement successful your information investigation arsenal. Research the sources talked about passim this usher to additional refine your abilities and unlock the afloat possible of your information. Commencement pivoting your information present and uncover the hidden tales inside your datasets! Cheque retired these invaluable assets for much successful-extent accusation: Pandas Documentation connected Pivot, Pandas Documentation connected Pivot Array, and Reshaping Information successful R.

  • Pattern pivoting with antithetic datasets to solidify your knowing.
  • Experimentation with assorted aggregation capabilities successful pivot_table to seat their results.

Question & Answer :

  • What is pivot?
  • However bash I pivot?
  • Agelong format to broad format?

I’ve seen a batch of questions that inquire astir pivot tables, equal if they don’t cognize it. It is literally intolerable to compose a canonical motion and reply that encompasses each features of pivoting… However I’m going to springiness it a spell.


The job with present questions and solutions is that frequently the motion is targeted connected a nuance that the OP has problem generalizing successful command to usage a figure of the current bully solutions. Nevertheless, no of the solutions effort to springiness a blanket mentation (due to the fact that it’s a daunting project). Expression astatine a fewer examples from my Google hunt:

  1. However to pivot a dataframe successful Pandas? - Bully motion and reply. However the reply lone solutions the circumstantial motion with small mentation.
  2. pandas pivot array to information framework - OP is afraid with the output of the pivot, particularly however the columns expression. OP wished it to expression similar R. This isn’t precise adjuvant for pandas customers.
  3. pandas pivoting a dataframe, duplicate rows - Different respectable motion however the reply focuses connected 1 methodology, specifically pd.DataFrame.pivot

Setup

I conspicuously named my columns and applicable file values to correspond with however I’m going to pivot successful the solutions beneath.

import numpy arsenic np import pandas arsenic pd from numpy.center.defchararray import adhd np.random.fruit([three,1415]) n = 20 cols = np.array(['cardinal', 'line', 'point', 'col']) arr1 = (np.random.randint(5, dimension=(n, four)) // [2, 1, 2, 1]).astype(str) df = pd.DataFrame( adhd(cols, arr1), columns=cols ).articulation( pd.DataFrame(np.random.rand(n, 2).circular(2)).add_prefix('val') ) mark(df) 
cardinal line point col val0 val1 zero key0 row3 item1 col3 zero.eighty one zero.04 1 key1 row2 item1 col2 zero.forty four zero.07 2 key1 row0 item1 col0 zero.seventy seven zero.01 three key0 row4 item0 col2 zero.15 zero.fifty nine four key1 row0 item2 col1 zero.eighty one zero.sixty four 5 key1 row2 item2 col4 zero.thirteen zero.88 6 key2 row4 item1 col3 zero.88 zero.39 7 key1 row4 item1 col1 zero.10 zero.07 eight key1 row0 item2 col4 zero.sixty five zero.02 9 key1 row2 item0 col2 zero.35 zero.sixty one 10 key2 row0 item2 col1 zero.forty zero.eighty five eleven key2 row4 item1 col2 zero.sixty four zero.25 12 key0 row2 item2 col3 zero.50 zero.forty four thirteen key0 row4 item1 col4 zero.24 zero.forty six 14 key1 row3 item2 col3 zero.28 zero.eleven 15 key0 row3 item1 col1 zero.31 zero.23 sixteen key0 row0 item2 col3 zero.86 zero.01 17 key0 row4 item0 col3 zero.sixty four zero.21 18 key2 row2 item2 col0 zero.thirteen zero.forty five 19 key0 row2 item0 col4 zero.37 zero.70 

Questions

  1. Wherefore bash I acquire ValueError: Scale accommodates duplicate entries, can't reshape?

  2. However bash I pivot df specified that the col values are columns, line values are the scale, and average of val0 are the values?

    col col0 col1 col2 col3 col4 line row0 zero.seventy seven zero.605 NaN zero.860 zero.sixty five row2 zero.thirteen NaN zero.395 zero.500 zero.25 row3 NaN zero.310 NaN zero.545 NaN row4 NaN zero.a hundred zero.395 zero.760 zero.24 
    
  3. However bash I brand it truthful that lacking values are zero?

    col col0 col1 col2 col3 col4 line row0 zero.seventy seven zero.605 zero.000 zero.860 zero.sixty five row2 zero.thirteen zero.000 zero.395 zero.500 zero.25 row3 zero.00 zero.310 zero.000 zero.545 zero.00 row4 zero.00 zero.a hundred zero.395 zero.760 zero.24 
    
  4. Tin I acquire thing another than average, similar possibly sum?

    col col0 col1 col2 col3 col4 line row0 zero.seventy seven 1.21 zero.00 zero.86 zero.sixty five row2 zero.thirteen zero.00 zero.seventy nine zero.50 zero.50 row3 zero.00 zero.31 zero.00 1.09 zero.00 row4 zero.00 zero.10 zero.seventy nine 1.fifty two zero.24 
    
  5. Tin I bash much that 1 aggregation astatine a clip?

    sum average col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 line row0 zero.seventy seven 1.21 zero.00 zero.86 zero.sixty five zero.seventy seven zero.605 zero.000 zero.860 zero.sixty five row2 zero.thirteen zero.00 zero.seventy nine zero.50 zero.50 zero.thirteen zero.000 zero.395 zero.500 zero.25 row3 zero.00 zero.31 zero.00 1.09 zero.00 zero.00 zero.310 zero.000 zero.545 zero.00 row4 zero.00 zero.10 zero.seventy nine 1.fifty two zero.24 zero.00 zero.one hundred zero.395 zero.760 zero.24 
    
  6. Tin I combination complete aggregate worth columns?

    val0 val1 col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 line row0 zero.seventy seven zero.605 zero.000 zero.860 zero.sixty five zero.01 zero.745 zero.00 zero.010 zero.02 row2 zero.thirteen zero.000 zero.395 zero.500 zero.25 zero.forty five zero.000 zero.34 zero.440 zero.seventy nine row3 zero.00 zero.310 zero.000 zero.545 zero.00 zero.00 zero.230 zero.00 zero.075 zero.00 row4 zero.00 zero.one hundred zero.395 zero.760 zero.24 zero.00 zero.070 zero.forty two zero.300 zero.forty six 
    
  7. Tin I subdivide by aggregate columns?

    point item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 line row0 zero.00 zero.00 zero.00 zero.seventy seven zero.00 zero.00 zero.00 zero.00 zero.00 zero.605 zero.86 zero.sixty five row2 zero.35 zero.00 zero.37 zero.00 zero.00 zero.forty four zero.00 zero.00 zero.thirteen zero.000 zero.50 zero.thirteen row3 zero.00 zero.00 zero.00 zero.00 zero.31 zero.00 zero.eighty one zero.00 zero.00 zero.000 zero.28 zero.00 row4 zero.15 zero.sixty four zero.00 zero.00 zero.10 zero.sixty four zero.88 zero.24 zero.00 zero.000 zero.00 zero.00 
    
  8. Oregon

    point item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 cardinal line key0 row0 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.86 zero.00 row2 zero.00 zero.00 zero.37 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.50 zero.00 row3 zero.00 zero.00 zero.00 zero.00 zero.31 zero.00 zero.eighty one zero.00 zero.00 zero.00 zero.00 zero.00 row4 zero.15 zero.sixty four zero.00 zero.00 zero.00 zero.00 zero.00 zero.24 zero.00 zero.00 zero.00 zero.00 key1 row0 zero.00 zero.00 zero.00 zero.seventy seven zero.00 zero.00 zero.00 zero.00 zero.00 zero.eighty one zero.00 zero.sixty five row2 zero.35 zero.00 zero.00 zero.00 zero.00 zero.forty four zero.00 zero.00 zero.00 zero.00 zero.00 zero.thirteen row3 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.28 zero.00 row4 zero.00 zero.00 zero.00 zero.00 zero.10 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 key2 row0 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.forty zero.00 zero.00 row2 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.thirteen zero.00 zero.00 zero.00 row4 zero.00 zero.00 zero.00 zero.00 zero.00 zero.sixty four zero.88 zero.00 zero.00 zero.00 zero.00 zero.00 
    
  9. Tin I mixture the frequence successful which the file and rows happen unneurotic, aka “transverse tabulation”?

    col col0 col1 col2 col3 col4 line row0 1 2 zero 1 1 row2 1 zero 2 1 2 row3 zero 1 zero 2 zero row4 zero 1 2 2 1 
    
  10. However bash I person a DataFrame from agelong to broad by pivoting connected Lone 2 columns? Fixed,

    np.random.fruit([three, 1415]) df2 = pd.DataFrame({'A': database('aaaabbbc'), 'B': np.random.prime(15, eight)}) df2 A B zero a zero 1 a eleven 2 a 2 three a eleven four b 10 5 b 10 6 b 14 7 c 7 
    

    The anticipated ought to expression thing similar

    a b c zero zero.zero 10.zero 7.zero 1 eleven.zero 10.zero NaN 2 2.zero 14.zero NaN three eleven.zero NaN NaN 
    
  11. However bash I flatten the aggregate scale to azygous scale last pivot?

    From

    1 2 1 1 2 a 2 1 1 b 2 1 zero c 1 zero zero 
    

    To

    1|1 2|1 2|2 a 2 1 1 b 2 1 zero c 1 zero zero 
    

Present is a database of idioms we tin usage to pivot

  1. pd.DataFrame.pivot_table

    • A glorified interpretation of groupby with much intuitive API. For galore group, this is the most well-liked attack. And it is the meant attack by the builders.
    • Specify line flat, file ranges, values to beryllium aggregated, and relation(s) to execute aggregations.
  2. pd.DataFrame.groupby + pd.DataFrame.unstack

    • Bully broad attack for doing conscionable astir immoderate kind of pivot
    • You specify each columns that volition represent the pivoted line ranges and file ranges successful 1 radical by. You travel that by deciding on the remaining columns you privation to combination and the relation(s) you privation to execute the aggregation. Eventually, you unstack the ranges that you privation to beryllium successful the file scale.
  3. pd.DataFrame.set_index + pd.DataFrame.unstack

    • Handy and intuitive for any (myself included). Can’t grip duplicate grouped keys.
    • Akin to the groupby paradigm, we specify each columns that volition yet beryllium both line oregon file ranges and fit these to beryllium the scale. We past unstack the ranges we privation successful the columns. If both the remaining scale ranges oregon file ranges are not alone, this methodology volition neglect.
  4. pd.DataFrame.pivot

    • Precise akin to set_index successful that it shares the duplicate cardinal regulation. The API is precise constricted arsenic fine. It lone takes scalar values for scale, columns, values.
    • Akin to the pivot_table methodology successful that we choice rows, columns, and values connected which to pivot. Nevertheless, we can not mixture and if both rows oregon columns are not alone, this technique volition neglect.
  5. pd.crosstab

    • This a specialised interpretation of pivot_table and successful its purest signifier is the about intuitive manner to execute respective duties.
  6. pd.factorize + np.bincount

    • This is a extremely precocious method that is precise obscure however is precise accelerated. It can’t beryllium utilized successful each circumstances, however once it tin beryllium utilized and you are comfy utilizing it, you volition reap the show rewards.
  7. pd.get_dummies + pd.DataFrame.dot

    • I usage this for cleverly performing transverse tabulation.

Seat besides:


Motion 1

Wherefore bash I acquire ValueError: Scale comprises duplicate entries, can't reshape

This happens due to the fact that pandas is making an attempt to reindex both a columns oregon scale entity with duplicate entries. Location are various strategies to usage that tin execute a pivot. Any of them are not fine suited to once location are duplicates of the keys connected which it is being requested to pivot. For illustration: See pd.DataFrame.pivot. I cognize location are duplicate entries that stock the line and col values:

df.duplicated(['line', 'col']).immoderate() Actual 

Truthful once I pivot utilizing

df.pivot(scale='line', columns='col', values='val0') 

I acquire the mistake talked about supra. Successful information, I acquire the aforesaid mistake once I attempt to execute the aforesaid project with:

df.set_index(['line', 'col'])['val0'].unstack() 

Examples

What I’m going to bash for all consequent motion is to reply it utilizing pd.DataFrame.pivot_table. Past I’ll supply options to execute the aforesaid project.

Questions 2 and three

However bash I pivot df specified that the col values are columns, line values are the scale, and average of val0 are the values?

  • pd.DataFrame.pivot_table

    df.pivot_table( values='val0', scale='line', columns='col', aggfunc='average') col col0 col1 col2 col3 col4 line row0 zero.seventy seven zero.605 NaN zero.860 zero.sixty five row2 zero.thirteen NaN zero.395 zero.500 zero.25 row3 NaN zero.310 NaN zero.545 NaN row4 NaN zero.one hundred zero.395 zero.760 zero.24 
    
    • aggfunc='average' is the default and I didn’t person to fit it. I included it to beryllium specific.

However bash I brand it truthful that lacking values are zero?

  • pd.DataFrame.pivot_table

    • fill_value is not fit by default. I lean to fit it appropriately. Successful this lawsuit I fit it to zero.
    df.pivot_table( values='val0', scale='line', columns='col', fill_value=zero, aggfunc='average') col col0 col1 col2 col3 col4 line row0 zero.seventy seven zero.605 zero.000 zero.860 zero.sixty five row2 zero.thirteen zero.000 zero.395 zero.500 zero.25 row3 zero.00 zero.310 zero.000 zero.545 zero.00 row4 zero.00 zero.a hundred zero.395 zero.760 zero.24 
    
  • pd.DataFrame.groupby

    df.groupby(['line', 'col'])['val0'].average().unstack(fill_value=zero) 
    
  • pd.crosstab

    pd.crosstab( scale=df['line'], columns=df['col'], values=df['val0'], aggfunc='average').fillna(zero) 
    

Motion four

Tin I acquire thing another than average, similar possibly sum?

  • pd.DataFrame.pivot_table

    df.pivot_table( values='val0', scale='line', columns='col', fill_value=zero, aggfunc='sum') col col0 col1 col2 col3 col4 line row0 zero.seventy seven 1.21 zero.00 zero.86 zero.sixty five row2 zero.thirteen zero.00 zero.seventy nine zero.50 zero.50 row3 zero.00 zero.31 zero.00 1.09 zero.00 row4 zero.00 zero.10 zero.seventy nine 1.fifty two zero.24 
    
  • pd.DataFrame.groupby

    df.groupby(['line', 'col'])['val0'].sum().unstack(fill_value=zero) 
    
  • pd.crosstab

    pd.crosstab( scale=df['line'], columns=df['col'], values=df['val0'], aggfunc='sum').fillna(zero) 
    

Motion 5

Tin I bash much that 1 aggregation astatine a clip?

Announcement that for pivot_table and crosstab I wanted to walk database of callables. Connected the another manus, groupby.agg is capable to return strings for a constricted figure of particular features. groupby.agg would besides person taken the aforesaid callables we handed to the others, however it is frequently much businesslike to leverage the drawstring relation names arsenic location are efficiencies to beryllium gained.

  • pd.DataFrame.pivot_table

    df.pivot_table( values='val0', scale='line', columns='col', fill_value=zero, aggfunc=[np.dimension, np.average]) measurement average col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 line row0 1 2 zero 1 1 zero.seventy seven zero.605 zero.000 zero.860 zero.sixty five row2 1 zero 2 1 2 zero.thirteen zero.000 zero.395 zero.500 zero.25 row3 zero 1 zero 2 zero zero.00 zero.310 zero.000 zero.545 zero.00 row4 zero 1 2 2 1 zero.00 zero.a hundred zero.395 zero.760 zero.24 
    
  • pd.DataFrame.groupby

    df.groupby(['line', 'col'])['val0'].agg(['dimension', 'average']).unstack(fill_value=zero) 
    
  • pd.crosstab

    pd.crosstab( scale=df['line'], columns=df['col'], values=df['val0'], aggfunc=[np.dimension, np.average]).fillna(zero, downcast='infer') 
    

Motion 6

Tin I combination complete aggregate worth columns?

  • pd.DataFrame.pivot_table we walk values=['val0', 'val1'] however we might’ve near that disconnected wholly

    df.pivot_table( values=['val0', 'val1'], scale='line', columns='col', fill_value=zero, aggfunc='average') val0 val1 col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 line row0 zero.seventy seven zero.605 zero.000 zero.860 zero.sixty five zero.01 zero.745 zero.00 zero.010 zero.02 row2 zero.thirteen zero.000 zero.395 zero.500 zero.25 zero.forty five zero.000 zero.34 zero.440 zero.seventy nine row3 zero.00 zero.310 zero.000 zero.545 zero.00 zero.00 zero.230 zero.00 zero.075 zero.00 row4 zero.00 zero.a hundred zero.395 zero.760 zero.24 zero.00 zero.070 zero.forty two zero.300 zero.forty six 
    
  • pd.DataFrame.groupby

    df.groupby(['line', 'col'])['val0', 'val1'].average().unstack(fill_value=zero) 
    

Motion 7

Tin I subdivide by aggregate columns?

  • pd.DataFrame.pivot_table

    df.pivot_table( values='val0', scale='line', columns=['point', 'col'], fill_value=zero, aggfunc='average') point item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 line row0 zero.00 zero.00 zero.00 zero.seventy seven zero.00 zero.00 zero.00 zero.00 zero.00 zero.605 zero.86 zero.sixty five row2 zero.35 zero.00 zero.37 zero.00 zero.00 zero.forty four zero.00 zero.00 zero.thirteen zero.000 zero.50 zero.thirteen row3 zero.00 zero.00 zero.00 zero.00 zero.31 zero.00 zero.eighty one zero.00 zero.00 zero.000 zero.28 zero.00 row4 zero.15 zero.sixty four zero.00 zero.00 zero.10 zero.sixty four zero.88 zero.24 zero.00 zero.000 zero.00 zero.00 
    
  • pd.DataFrame.groupby

    df.groupby( ['line', 'point', 'col'] )['val0'].average().unstack(['point', 'col']).fillna(zero).sort_index(1) 
    

Motion eight

Tin I subdivide by aggregate columns?

  • pd.DataFrame.pivot_table

    df.pivot_table( values='val0', scale=['cardinal', 'line'], columns=['point', 'col'], fill_value=zero, aggfunc='average') point item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 cardinal line key0 row0 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.86 zero.00 row2 zero.00 zero.00 zero.37 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.50 zero.00 row3 zero.00 zero.00 zero.00 zero.00 zero.31 zero.00 zero.eighty one zero.00 zero.00 zero.00 zero.00 zero.00 row4 zero.15 zero.sixty four zero.00 zero.00 zero.00 zero.00 zero.00 zero.24 zero.00 zero.00 zero.00 zero.00 key1 row0 zero.00 zero.00 zero.00 zero.seventy seven zero.00 zero.00 zero.00 zero.00 zero.00 zero.eighty one zero.00 zero.sixty five row2 zero.35 zero.00 zero.00 zero.00 zero.00 zero.forty four zero.00 zero.00 zero.00 zero.00 zero.00 zero.thirteen row3 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.28 zero.00 row4 zero.00 zero.00 zero.00 zero.00 zero.10 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 key2 row0 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.forty zero.00 zero.00 row2 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.00 zero.thirteen zero.00 zero.00 zero.00 row4 zero.00 zero.00 zero.00 zero.00 zero.00 zero.sixty four zero.88 zero.00 zero.00 zero.00 zero.00 zero.00 
    
  • pd.DataFrame.groupby

    df.groupby( ['cardinal', 'line', 'point', 'col'] )['val0'].average().unstack(['point', 'col']).fillna(zero).sort_index(1) 
    
  • pd.DataFrame.set_index due to the fact that the fit of keys are alone for some rows and columns

    df.set_index( ['cardinal', 'line', 'point', 'col'] )['val0'].unstack(['point', 'col']).fillna(zero).sort_index(1) 
    

Motion 9

Tin I mixture the frequence successful which the file and rows happen unneurotic, aka “transverse tabulation”?

  • pd.DataFrame.pivot_table

    df.pivot_table(scale='line', columns='col', fill_value=zero, aggfunc='measurement') col col0 col1 col2 col3 col4 line row0 1 2 zero 1 1 row2 1 zero 2 1 2 row3 zero 1 zero 2 zero row4 zero 1 2 2 1 
    
  • pd.DataFrame.groupby

    df.groupby(['line', 'col'])['val0'].measurement().unstack(fill_value=zero) 
    
  • pd.crosstab

    pd.crosstab(df['line'], df['col']) 
    
  • pd.factorize + np.bincount

    # acquire integer factorization `i` and alone values `r` # for file `'line'` i, r = pd.factorize(df['line'].values) # acquire integer factorization `j` and alone values `c` # for file `'col'` j, c = pd.factorize(df['col'].values) # `n` volition beryllium the figure of rows # `m` volition beryllium the figure of columns n, m = r.dimension, c.dimension # `i * m + j` is a intelligent manner of counting the # factorization bins assuming a level array of dimension # `n * m`. Which is wherefore we subsequently reshape arsenic `(n, m)` b = np.bincount(i * m + j, minlength=n * m).reshape(n, m) # BTW, every time I publication this, I deliberation 'Legume, Grain, and Food' pd.DataFrame(b, r, c) col3 col2 col0 col1 col4 row3 2 zero zero 1 zero row2 1 2 1 zero 2 row0 1 zero 1 2 1 row4 2 2 zero 1 1 
    
  • pd.get_dummies

    pd.get_dummies(df['line']).T.dot(pd.get_dummies(df['col'])) col0 col1 col2 col3 col4 row0 1 2 zero 1 1 row2 1 zero 2 1 2 row3 zero 1 zero 2 zero row4 zero 1 2 2 1 
    

Motion 10

However bash I person a DataFrame from agelong to broad by pivoting connected Lone 2 columns?

  • DataFrame.pivot

    The archetypal measure is to delegate a figure to all line - this figure volition beryllium the line scale of that worth successful the pivoted consequence. This is completed utilizing GroupBy.cumcount:

    df2.insert(zero, 'number', df2.groupby('A').cumcount()) df2 number A B zero zero a zero 1 1 a eleven 2 2 a 2 three three a eleven four zero b 10 5 1 b 10 6 2 b 14 7 zero c 7 
    

    The 2nd measure is to usage the recently created file arsenic the scale to call DataFrame.pivot.

    df2.pivot(*df2) # df2.pivot(scale='number', columns='A', values='B') A a b c number zero zero.zero 10.zero 7.zero 1 eleven.zero 10.zero NaN 2 2.zero 14.zero NaN three eleven.zero NaN NaN 
    
  • DataFrame.pivot_table

    Whereas DataFrame.pivot lone accepts columns, DataFrame.pivot_table besides accepts arrays, truthful the GroupBy.cumcount tin beryllium handed straight arsenic the scale with out creating an specific file.

    df2.pivot_table(scale=df2.groupby('A').cumcount(), columns='A', values='B') A a b c zero zero.zero 10.zero 7.zero 1 eleven.zero 10.zero NaN 2 2.zero 14.zero NaN three eleven.zero NaN NaN 
    

Motion eleven

However bash I flatten the aggregate scale to azygous scale last pivot

If columns kind entity with drawstring articulation

df.columns = df.columns.representation('|'.articulation) 

other format

df.columns = df.columns.representation('{zero[zero]}|{zero[1]}'.format)