代码之家  ›  专栏  ›  技术社区  ›  Ujjawal Pandey

如何为矢量化数据帧创建行CSV?

  •  1
  • Ujjawal Pandey  · 技术社区  · 2 年前

    我试图做的基本上是从日志文件的处理文件中提取关键字,并创建这些关键字的矢量化数据帧。但当我将数据帧写入CSV时,文字在列中,其各自的值在第二行。 虽然 I want the words to be in rows and their value in second column.

    试验py公司 :

    import re
    import pandas as pd
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS
    
    def removeNumbers(list):
       #doing something
    
    def processFiles(filename):
       #doing something
    
    def readFile(fileName):
       #doing something
    
    # Build our text
    processFiles("log.txt")
    text = readFile("processedFile.txt")
    
    
    vectorizer = CountVectorizer()
    
    matrix = vectorizer.fit_transform([text])
    
    counts = pd.DataFrame(matrix.toarray(),
                          columns=vectorizer.get_feature_names_out())
    
    
    
    counts.to_csv("keywords_count.csv")
    

    关键词计数。csv 看起来像这样:

    ,accept,accepted,action,add,address,agent,allocated,api,api_action_sender,api_reader,apihandle,apiinitialize,apiterminate,appl,associate,attempt,available,bd,bdfb,broken,ceased,check_signals,chose,cksm,cl,clcat,client,close,code,complete,conf,configuration,connection,connfd,constructing,control,creating,ctcd,delresp,dereg,deregistering,does,dreg_process,dst,dump,edci,engine,entering,entity,entity_initialize,entries,entry,event,event_establishsessionsend,event_timert_expire,exist,exists,exit,exiting,expect,expired,failed,fc,file,filter,flg,flow,flow_timer_start,flow_timer_stop,forward,gateway,handle,home,hop,if,ifaeddrg_byaddr,ifidx,image,images,index,inf,info,informational,init_policyapi,initialization,initialized,install,interface,ioctl,ip,len,level,lih,link,list,local,locate_configfile,log,loopback,mailbox,mailbox_register,mailslot,mailslot_create,mailslot_send,mailslot_sitter,main,mcast_add,module,msg,necessary,new,node,obj,old,open_socket,operation,os,outgoing,papi_debug,papilogfunc,papiuservalue,path,pathdelta,pathed,pathtear,pipe,policy,process,proterr,proto,qoshandle,qoshd,qosmgr,qosmgr_request,qosmgr_response,query,querying,rapi,raw,rc,read_physical_netif,readbuffer,ready,reason,received,reentering,reg_process,registered,registering,registerwithpolicyapi,registration,remove,req,request,reservation,response,result,resv,resvdelta,resved,resvresp,return,returned,route,router_forward_getoi,rpapi_getpolicydata,rpapi_getspecdata,rpapi_reg_unregflow,rsv,rsvp,rsvp_action_nhop,rsvp_api_open,rsvp_event,rsvp_event_establishsession,rsvp_event_mapsession,rsvp_event_propagate,rsvp_explode_packet,rsvp_flow_statemachine,rsvp_hop,rsvp_parse_objects,rsvpd,rsvpfindactionname,rsvpfindservicedetailsonactname,rsvpgettspec,rsvpputactionname,rsvpremactionname,rthdl,send,sender,sender_withdraw,sending,service,sess,session,sessioned,setsockopt,settcpimage,sigalrm,signal,sigterm,socket,source,specified,src,start,started,state,status,stop,stopped,style,successful,supported,tc,tcp,tcpcs,term,term_policyapi,terminate,terminated,terminator,timer,tout,tr,trace,traffic,traffic_action_oif,traffic_reader,ttl,type,udp,unregistered,unregisterfrompolicyapi,user,using,vlink,warning,wf,writing
    0,1,1,1,1,18,1,28,8,1,6,1,3,2,1,1,2,4,2,1,1,1,1,1,4,1,3,1,1,1,1,1,1,2,1,9,2,22,2,1,1,1,2,3,3,2,5,2,20,7,7,1,7,31,1,6,1,6,1,17,1,6,4,8,1,2,4,4,12,7,2,7,7,1,4,1,2,7,1,1,7,7,147,2,14,1,8,1,18,9,5,4,1,4,2,1,1,1,1,1,24,23,20,27,9,7,3,4,1,2,2,2,1,4,1,2,1,1,1,3,1,1,7,1,2,4,2,2,10,1,3,2,1,2,4,4,6,1,1,4,4,8,12,1,2,12,9,3,1,1,3,2,2,1,4,3,2,6,4,1,20,1,1,1,17,35,11,3,12,4,38,8,1,4,1,7,1,4,26,4,8,2,3,3,3,3,3,1,1,1,1,9,3,3,10,4,4,2,6,8,1,6,12,1,3,4,9,26,2,5,2,4,10,1,2,2,1,1,8,2,2,1,2,6,1,119,2,2,3,4,5,14,1,3,1,1,1,4,4,1
    
    1 回复  |  直到 2 年前
        1
  •  1
  •   Corralien    2 年前

    转换数据帧:

    counts.T.to_csv("keywords_count.csv")